hash distribute fact table on multiple columns
A large fact table often has multiple join keys to multiple dimension table, so while it can be opitmised for one join it always creates data movement for other joins. The hash distribution should accept multiple columns so all joins are optimised and create no data movement for dimension joins.
Multi-column distribution will soon be supported in Azure Synapse SQL Analytics.
Kin Tang commented
MPP Databases don't work like this. Other MPP database such as Netezza will not work if you have multiple columns HASH but only JOIN on one of the column. Do to the fact the data is distributed/stored using the HASH of all of columns for FACT while Dimension is distributed/stored using a different HASH of one column, the data ends up being not co-located.
In order get the benefit of the distribution, both table in the JOIN need to be co-located thus using the same HASH key for distribution. If Fact and Dimension table are distributed using different HASH columns. It will end up calling DMS every execution because the tables are not co-located.
Small Dimension table should be replicated to minimize data movement.
Reference; Designed systems using Netezza, AWS RedShift and Azure SQL Data Warehouse.
Seconded, I had assumed this functionality existed, although there is the work around of simply concatenating the feilds
Glen Swan commented