How can we improve Microsoft Azure Data Lake?

Multiple Indexes on ADL-A Table

Many of our datasets get processed three times to support different query patterns (for example: lookup by filename, lookup by filehash, and lookup by hostname). This means data is stored in triplicate, and processed in triplicate.

We sometimes build views to hide the multiple copies, but the indexing scheme needs to be selected by the caller somehow so such obfuscation is of limited value.

26 votes
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Mike Daly shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

1 comment

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Mike R commented  ·   ·  Flag as inappropriate

    Hi Mike, thanks for filing your request. This is an important performance item on our backlog. Of course such secondary indices will still store parts of the data in additional copies inside the index. Making the index built-in however gives the query optimizer more knowledge and allows the query to be written without having to know the name of the "index" table.

Feedback and Knowledge Base