Azure Synapse Analytics
-
Expose File Properties as additional columns to load in SQL DW
It would be helpful if some of the properties of a BLOB file could be exposed in the external table such as: Last Modified, Filename, Size. It should be optional and could be defined in the EXTERNAL FILE FORMAT definition. Each Property should be appended to the end of the result set of an external table in the order given in the definition.
For instance, if we are loading a directory of several files we can read the filename and derive a new column. For example, we might have a directory with 20160504.csv and 20160505.csv and the contents of the…
144 votesThanks for your suggestion. We are looking into this scenario for a future release. 7633537
-
Provide File Format to read and load JSON and XML data from Storage
Able to read the JSON and XML file as part of external system OR provide functionality like Extractor in Data Lake analytics so we can write our own Extractor to read respective data format.
106 votes -
[In preview] More flexible date formats in Polybase
This feedback item is currently in preview here:
https://docs.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest
Polybase currently has very limited date format support. All of the acceptable date formats have dashes or slashes. For example, I can't define a date format of yyyyMMdd since it doesn't have slashes or dashes. Please allow any date format I can express.
Also, please allow more than one date format per external table. For example, what if I have a date column as yyyyMMdd and then a different column as yyyyMMddHHmm. With only being able to define one date format in the CREATE EXTERNAL FILE FORMAT statement I think I…
82 votes -
Polybase to treat \N as NULL (or allow custom null support)
I was trying to load external file into SQL DW table using polybase. This external file has \N as value for null columns. For example :
2,"C1008",1,\N,1,1,"2016-02-09 23:13:02","2014-02-21 20:11:36","2016-02-08 17:48:59","2008-07-16 14:17:40",947
I defined external data format as below:
CREATE EXTERNAL FILE FORMAT TextFileFormat
WITH
( FORMATTYPE = DELIMITEDTEXT
, FORMATOPTIONS ( FIELD_TERMINATOR = ',', STRING_DELIMITER = ''
, USE_TYPE_DEFAULT = FALSE
));
when i try to load this record into DW table, I am getting below error:
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected…
47 votesThank you for the request. This item is currently on our backlog and under review. We will update the item when the status changes.
-
Support for Large Objects in External Tables
While you can read and write to blob storage with large objects like varchar(max) now you can't read them via external tables.
E.g. If you define a field with varchar(max) in an external table you can't use it you need to use BCP or SSIS to get it into a physical table that is a heap or supported indexed table.
Msg 102048, Level 16, State 1, Line 56
Columns with large object types are not supported in external tables.40 votesSorry for the confusion, there was a regression during deployment in the Large Object support that caused a rollback of the functionality. We are actively working on getting a fix.
-
Polybase - Give the ability to specify a file name when outputting external tables and consolidate into a single file
Creating an external table will result in multiple files being generated in blob storage. It would be great if the multiple files could be consolidated into a single file and be able to configure the file name.
There are multiple files that contain 0 bytes. It would be better to have a single file and be able to specify the file name.
This would help with idempotency to delete the file if I wanted to re-create the external table.32 votesThank you for your request! This item is on our backlog and under review. We will update the item when the status changes.
-
support for fixed width flat files in Polybase
Polybase currently only supports delimited flat files. It should support fixed width flat files.
29 votesThanks for the suggestion. We are looking into this scenario for a future release. 6891268
-
Polybase - fix file format type default for decimals
Currently it's not possible to load a file with empty decimal value and convert it to 0 when loading data.
This will load fine:
1;2.1;"zzzz"This will fail:
1;;"zzzz"According to the documentation( https://msdn.microsoft.com/en-us/library/dn935026.aspx) when USETYPEDEFAULT = TRUE is set in the file format definition empty value should be converted to 0. While it works for Integers, it doesn't work for decimals. USETYPEDEFAULT = FALSE will work fine for decimals (but of course the value will be loaded as NULL).
Code:
CREATE EXTERNAL FILE FORMAT textfileformatraw
WITH
(
FORMATTYPE = DELIMITEDTEXT, …19 votesThank you for reporting this issue. We have identified this as a product defect and plan to fix this behavior in an upcoming release. 7297383.
-
"COPY INTO" - Support for Parameterized Source Location
In an incremental load scenarios, data files always land in a new folder in ADLS each time.
Currently the “COPY INTO” command does not support “Source Location” to be parameterized , it uses a hardcoded source folder location. Please add the ability to use a SQL string variable for the source folder location.
10 votes -
Ability set resultset caching on or off by user
Add the ability to turn off resultset caching by user. So we can turn it off for our ETL user which is moving large amounts of data throughout the day. Allowing it to stay on for the report user account.
8 votes -
Remove the storage requirement of "Allow trusted Microsoft services" when loading with MSI
Using MSI authentication with COPY is required when loading from a storage account that limits access only to a set of virtual network subnets via Private Endpoints or Service Endpoints. The MSI is able to bypass the network configurations when "Allow trusted Microsoft services" is enabled. Details here: https://techcommunity.microsoft.com/t5/azure-synapse-analytics/how-to-set-up-private-link-with-high-throughput-data-ingestion/ba-p/1594440.
Ideally loading should access the storage account following the storage account's network configurations (i.e. using the Private endpoint)
3 votes -
Azure Synapse - Query on demand/reservation slot.
Add another SKU similar to OnDemand query to reserve query compute units, acts like query on demand but no need to load the file.
Would be beneficial for predictable data - similar to BQ reserved slots3 votes -
Allow for the option for polybase to truncate columns
Allow for the option for polybase to truncate columns
3 votes -
copy into doesn't support "safe" implicit conversions from parquet files
currently the copy into command requires datatypes to match exactl.y
when a column is implicitly convertable without a loss of data the import will fail for example if I load a date data type from a parquet file into a datetime or datetime2 the copy into will return an error.
similar behavior exists for decimal datatypes loading into fields with greater scale (goign from 2 decimal points to 5 for example) and loading a non string field into a varchar for example polybase will allow you to load a date field from parquet into a varchar field copy into will…
2 votes -
Allow COPY to use a Timezone offset for time and datetime columns to adjust ingested data to UTC
having COPY with an Timezone offset parameter for time and datetime columns would allow to adjust ingested data to UTC without the need to post process all the datetime and time columns in a separate step.
2 votes -
Azure Synapse should support auto partition and very big partition numbers.
We have a big DW table for patient detail(>10 billions row). Each day this table has around ~5% update in rows, which always happens in group column [memberId]+[batchId]+[year] together. As the combination number could exceed 100K, if we can split table in such partition then bulk copy and partition switch in/out would be very useful for our case, unfortunately now the partition number limitation <15000 is so small and cannot apply in our case.
Continue on this ask, Synapse should support the feature of partition automatically on separate value for columns, as it's so annoying to define a 100K partition…2 votes -
support high concurrent polybase loading
now the concurrent limit is very low, that not allowing customer to load data parallelly using polybase.
1 vote -
Support for CLR Functions within Synapse
We have a concept and/or story where we want to encrypt specific fields using DataBricks and we want to use the same encryption functions within synapse.
1 vote
- Don't see your idea?