Polybase: PARTITIONED BY functionality when creating external tables.
I have a large number of files which are read with Hive using a partitioning scheme. PARTITIONED BY functionality, which is so commonly used in HIVE is missing from polybase.
Thank you for your request. I would like to understand if this request concerns SQL DW or SQL Server implementation of PolyBase?
Jason Horner commented
Any update on this it would be a major enhancement we are currently forced to workaround by use a complex set of dynamic sql that repoints the data source to the current days partition.. adding a hint to tell polybase the partitioning scheme would be extreme;y help to support predicate pushdown and partition elimination
Tom Petz commented
We use this capability in ADLA and have a need for it in SDW as well. Without this capability it makes scaleable loading and unloading of data from/to a data lake difficult.
While we need both the read by partition is more important than the write.
Scott Person commented
Why is this still marked as "NEED-FEEDBACK"? It's in the SDW section and was indicated as being SDW in May (see below).
In a recent web talk Kuber Sharma recommended using Polybase as a data lake interface. This recommendation should be reconsidered as without partitioning it isn't viable. All data lakes partition.
Scott Person commented
Even though Al was a bit harsh - understandably so if he's blocked, I agree. This is basic functionality for anyone using data stored in Hadoop or anything even Hadoop like. You all should take a look at what AWS is doing with Glue. It's a shared meta-store. You create a table once and you can read it in several different DB engines. A table in Glue can be queried in Redshift (SQL DW), EMR (HDInsight), and Athena (Azure ain't got anything even close). When implemented well, you wouldn't even need to create the external tables in SQL DW. They'd already be there when you cranked up the cluster.
Hugo PT commented
In our corp it's a must have feature for Polybase that is missing. On large result sets we are making additional code to generate one table/partition, with a view to union results adding hardcoded the partitioned fields. It's extra work, extra risk of failure, etc. For smaller result sets we just use SSIS over Hive, less work and safer.
This concerns SQL DW
Rob Poidomani commented
I would vote for SQL DW. Having polybase able to elimate partitions of data without having to deal with multiple external tables, etc. would make life so much easier to avoid performance issues on large, append only historical datasets. Either that, or make it able to connect to a hive metastore that can.
al turker commented
I can't believe they create a utility saying we have Hadoop support and they don't have the partition function.
Congrats Windows, you did it again, because of your utility's lack of service on the most common functionality that any Hadoop utility must have, I have wasted 2 days, and my project is roadblocked now. Bravo