Enable distribution elimination in the generated query plan if the query predicate contains the distribution key column as a filter
Say we have the following table:
CREATE TABLE [dbo].[Foo]
[columnWithGuids] [uniqueidentifier] NULL,
[column1] [int] NULL,
[column2] [bigint] NULL,
[ingressDay] [date] NULL
DISTRIBUTION = HASH ( [columnWithGuids] ),
CLUSTERED COLUMNSTORE INDEX,
[ingressDay] RANGE RIGHT FOR VALUES (N'2016-01-01T00:00:00.000', N'2016-01-02T00:00:00.000')
And we have a query:
SUM([colum1]) AS [Total]
WHERE [columnWithGuids] = '40300bab-03aa-8d51-7ecb-904dfaf47ec7'
Currently, Azure SQL DW does not eliminate distributions to interrogate based on a predicate. Which means, that in the above case, even though only 1 distribution will always have the data we're looking for, it will query all of the distributions (60) to get the data.
This would incur unnecessary I/O on the other distribution nodes, and in a high utilization environment, these are operations that can be avoided at query plan generation time to improve performance.
This would be similar to how table partitioning helps in eliminating which partitions are queried for data based on the partition column predicate in the query.
Ralph Kemperdick commented
This seems to be a very valuable optimization and might allow a number of side effects. Wouldn't it potentially allow more concurrent sessions, as fewer conections abe beeing utilized?