How can we improve Microsoft Azure SQL Data Warehouse ?

hash distribute fact table on multiple columns

A large fact table often has multiple join keys to multiple dimension table, so while it can be opitmised for one join it always creates data movement for other joins. The hash distribution should accept multiple columns so all joins are optimised and create no data movement for dimension joins.

64 votes
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Anonymous shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

3 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Kin Tang commented  ·   ·  Flag as inappropriate

    MPP Databases don't work like this. Other MPP database such as Netezza will not work if you have multiple columns HASH but only JOIN on one of the column. Do to the fact the data is distributed/stored using the HASH of all of columns for FACT while Dimension is distributed/stored using a different HASH of one column, the data ends up being not co-located.
    In order get the benefit of the distribution, both table in the JOIN need to be co-located thus using the same HASH key for distribution. If Fact and Dimension table are distributed using different HASH columns. It will end up calling DMS every execution because the tables are not co-located.
    Small Dimension table should be replicated to minimize data movement.
    Reference; Designed systems using Netezza, AWS RedShift and Azure SQL Data Warehouse.

  • Fraser commented  ·   ·  Flag as inappropriate

    Seconded, I had assumed this functionality existed, although there is the work around of simply concatenating the feilds

Feedback and Knowledge Base