Add support for "Delta Lake" file format in Azure Data Lake Store / HDFS
Today we can query data stored in parquet files on ADLS. It would be fantastic to extend this to support the new "Delta Lake" file format recently open-sourced by the DataBricks team ( see https://delta.io )
This would allow us to take advantage of ACID guarantees that the delta format brings to the data lake.
This is super important to get in place!
Felipe Rosa commented
+1, this would be hugely helpful to us.
Euan Garden commented
(Linux Foundation) Delta Lake in Spark has been supported since Private Preview was released in Nov of 2019. We are working on support in the other compute engines.
Bear in mind we are currently constrained by the feature gaps in the OSS version.
On my point of view to guarantee data consistency, time travel and acid trasations use deltatable format.
So make sense synapse support to read deltatable "format" (json log files indicate which parquet data are valid)
Delta-Lake management in Synapse and data factory are highly anticipated developments on our side as well.
The azure team needs to be swift in their response to this issue. Its a very much a good practice in data brick environment to use delta format.
This should be supported by sql data warehouse(or sql synapse analytics) as an external file format.
Thanks, hoping for a swift reply and a solution.
PETRANCURI, DARRYLL commented
This is hugely important, and frankly with all the work that has been done on integration with Apache Spark, I'm really surprised this isn't in the roadmap at this time. It's not enough to be able to perform direct queries against Parquet. If you consider all the power and capabilities that Delta provides for a simplified data lake and data lake ETL pipeline, it's a must have.
Along with this I feel it's critical to add a Delta Sink to Azure Data Factory as well.
Saumyakumar Suhagiya commented
Any update on this?