How can we improve Microsoft Azure Data Lake?

Dynamic partition creation

We currently have to manually create vertical partition buckets. This is fine when we’re creating a fixed number of partition buckets and know what they are upfront. However, when dealing with data with an unknown number of partition buckets, this poses a much bigger challenge and requires some external automation (e.g. a PowerShell script) to add new partitions ahead of inserting data that doesn’t belong to any existing partition bucket.

It would be great if ADL/U-SQL had an option to dynamically create/add new partition buckets so that we’re not forced to ignore or move data to an ‘unknown’ catchall bucket (via the INTEGRITY clause). The expected behaviour would be similar to what happens in Apache Hive after setting the ‘hive.exec.dynamic.partition.mode’ parameter to ‘nonstrict’ (i.e. hive.exec.dynamic.partition.mode=nonstrict).

43 votes
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Michael Amadi shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

2 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Daniel Otykier commented  ·   ·  Flag as inappropriate

    On a related note, we need the ability to dynamically TRUNCATE partitions, for example by specifying a rowset with a list of partition buckets to be truncated.

Feedback and Knowledge Base