How can we improve Microsoft Azure SQL Data Warehouse ?

Enable distribution elimination in the generated query plan if the query predicate contains the distribution key column as a filter

Say we have the following table:

CREATE TABLE [dbo].[Foo]

(

[columnWithGuids] [uniqueidentifier] NULL,

[column1] [int] NULL,

[column2] [bigint] NULL,

[ingressDay] [date] NULL

)

WITH

(

DISTRIBUTION = HASH ( [columnWithGuids] ),

CLUSTERED COLUMNSTORE INDEX,

PARTITION

(

[ingressDay] RANGE RIGHT FOR VALUES (N'2016-01-01T00:00:00.000', N'2016-01-02T00:00:00.000')

)

)

And we have a query:

SELECT

SUM([colum1]) AS [Total]

,[ingressDay]

FROM [dbo].[Foo]

WHERE [columnWithGuids] = '40300bab-03aa-8d51-7ecb-904dfaf47ec7'

GROUP BY

[ingressDay];


Currently, Azure SQL DW does not eliminate distributions to interrogate based on a predicate. Which means, that in the above case, even though only 1 distribution will always have the data we're looking for, it will query all of the distributions (60) to get the data.

This would incur unnecessary I/O on the other distribution nodes, and in a high utilization environment, these are operations that can be avoided at query plan generation time to improve performance.

This would be similar to how table partitioning helps in eliminating which partitions are queried for data based on the partition column predicate in the query.

14 votes
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Rajeev shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

1 comment

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Ralph Kemperdick commented  ·   ·  Flag as inappropriate

    This seems to be a very valuable optimization and might allow a number of side effects. Wouldn't it potentially allow more concurrent sessions, as fewer conections abe beeing utilized?

Feedback and Knowledge Base