Azure Synapse Analytics
-
[In Preview] Hash distribute fact table on multiple columns
A large fact table often has multiple join keys to multiple dimension table, so while it can be opitmised for one join it always creates data movement for other joins. The hash distribution should accept multiple columns so all joins are optimised and create no data movement for dimension joins.
118 votesThis feature is currently in preview and will be available to all users in coming months.
-
Enable Dynamic compute - scaling up the DW should not kill any existing query
Scaling up/down the warehouse kills all ongoing sessions.
Adding or removing compute capacity should be truly dynamic. This means that when I add more compute - by scaling up - the DW should be able to absorb that additional compute capacity without killing existing queries.
Maybe that compute can be brought into play for the new queries that get submitted post addition of the compute.Thanks,
Mayank76 votes -
Support table variables
Prior to data analysis it is required to collect some initial info and store it in a table format. Creating and dropping a table or a temp table takes too much time (seconds) since it has to be distributed over multiple nodes. Having table variables would solve this performance issue.
57 votes -
Resource class override
Having a fixed resource class assigned to a user is too rigid. Most queries can be executed efficiently with a small resource class, but certain operations (e.g. rebuild index) should be run with a larger RC. I need to switch between several user accounts just to be able to have appropriate resources for whatever operation or query I need to run. Users assigned to xlargerc can only execute a single query at a time.
Would be great to be able to set/change the resource class used for executing each statement. Alternatively, being able to execute as a different user could…
39 votesWe announced workload isolation via workload groups for public preview at Ignite in Nov., 2019. Workload groups allow you to create your own custom resource classes (among other things). Check out workload classification that allows you to assign requests by more than just login information too!
-
Support rebuilding DW indexes with Online = ON so the table is accessible during the rebuild
This works:
ALTER INDEX ALL ON VulnerabilityScope REBUILD
But, when the rebuild takes place, the entire table becomes locked from any query--even selecting one row. Users complain when the table is inaccessible during the 30 minute rebuild.This currently returns an error:
ALTER INDEX ALL ON VulnerabilityScope REBUILD WITH (ONLINE = ON)
Parse error at line: 1, column: 48: Incorrect syntax near 'ONLINE'.Please support rebuilding DW indexes with Online = ON
35 votesThank you for the feedback folks. For now, you can use Alter Index reorganize as the online operation when rebuilding your CCI . We will reach out when this is on our roadmap. Please describe your scenario for this capability below and continue to vote. Thanks for your patience.
-
Support for updating statistics on a single partition
This would increase update statistics performance for users who are currently loading data into a single partition and updating statistics afterwards.
Otherwise with growing data size, updating statistics takes longer after every load.
33 votes -
Better Database & Query Monitoring through Azure Portal
Enable a better experience when it comes to monitoring and analysis of queries and DWH/query performance. Currently the SQL DW blade in the Azure Portal does only show very basic information about queries and their execution as well as DTUs with no possibility to customize the charts or the (query) lists. Only the last 50 queries are shown in this list, there are no overall reports and statistics (long running queries), there is no possibility to filter this list or show other metrics. Please enable a similar experience for monitoring of query and DB performance like SQL DB offers (Query…
33 votesWe are actively improving our monitoring experience. We have future improvements on our road map such as Query Data Store and integrating with Azure Monitor (logs and metrics) for near real time troubleshooting in the Azure portal. If anyone has any other feedback, please elaborate on your scenario on this thread and have your team continue to vote. Thank you for your continued support!
-
Expose DMVs that display the actual execution plan (sqlplan) of a query on a node after the query executes
Currently, DBCC PWD_SHOWEXECUTIONPLAN only displays the estimated query plan of the step being executed on a compute node, while the query is running. There is no way to see the actual execution plan of the query step on the node.
This prevents us from seeing the details of the query being executed such as the stats on how many rows were estimated vs actual rows returned, or rowgroups eliminated by the ColumnStore index, or any gross under/over estimates in the row counts owing to missing/bad statistics.
31 votesThank you for voting for this feature! We are aware of this scenario and are looking into ways of supporting this and improving our query troubleshooting experience. In the meantime, stay tuned for an update and please continue voting for this feature.
-
Enable distribution elimination in the generated query plan if the query predicate contains the distribution key column as a filter
Say we have the following table:
CREATE TABLE [dbo].[Foo]
(
[columnWithGuids] [uniqueidentifier] NULL,
[column1] [int] NULL,
[column2] [bigint] NULL,
[ingressDay] [date] NULL
)
WITH
(
DISTRIBUTION = HASH ( [columnWithGuids] ),
CLUSTERED COLUMNSTORE INDEX,
PARTITION
(
[ingressDay] RANGE RIGHT FOR VALUES (N'2016-01-01T00:00:00.000', N'2016-01-02T00:00:00.000')
)
)
And we have a query:
SELECT
SUM([colum1]) AS [Total]
,[ingressDay]
FROM [dbo].[Foo]
WHERE [columnWithGuids] = '40300bab-03aa-8d51-7ecb-904dfaf47ec7'
GROUP BY
[ingressDay];
Currently, Azure SQL DW does not eliminate distributions to interrogate based on a predicate. Which means, that in the above case, even though only 1 distribution will always have the data we're looking for, it will…
18 votes -
Functionality to upload data to ADW where upload queries do not count in 32 concurrent queries limit
Today we use PDW/APS and when we upload data to PDW using DWLoader, it does not count towards 32 concurrent query limitations, this makes sure that users have all of query window to use datawarehouse.
However in ADW, when we are ingesting data with Azure data Factory or manually using Polybase, since technically its a query, it counts towards 32 concurrent query limit.
Would be great to have a feature similar to PDW.16 votesThank you for the request. This item is on our backlog. We will update the item when the state changes.
-
Performance with respect to dynamic SQL
In our solutions we load data from csv-files into SQL DW internal tables using polybase/external tables.
Per roll-out we face the challenge that the schema (columns) in the csv may differ for the same table. Therefore we implemented a logic to lookup the column name and data type from a dictionary table and we create the external and internal table schemes dynamically. As we have round about 500 tables with up to 20 columns per table automating this process is the only way to go.
The issue we face is that compared to an on premise SQL Server the dynamic…
16 votesIs this the looping execution time or the actual time of creating the tables?
-
CREATE SCHEMA causes pervasive blocking
When CREATE SCHEMA is issued it is blocked by active work and blocks (most) all new work.
CREATE SCHEMA should be new transparent to the operations of a database.
13 votesThank you for your request. This item is currently on our backlog. We will update the thread when the status changes
-
Provide resource usage by query (CPU, Memory, TempDB, I/O)
Provide actual resource usage data by query detailing CPU, Memory (allocated/Used), TempDB usage, and I/O.
This will provide metrics to make measurable performance improvements and make better DWU and Workload Management configuration management decisions.7 votes -
Workflow capability within Azure Data warehouse to run parallel queries.
To maximise the parallel execution and MPP architecture, it will be great to have a workflow capability that will allow users to run queries in parallel against a defined Resource Class.
I appreciate this could be against the SQL engine design but looking at the architecture we might be able to do something like thisworkflow-Parallel (ResourceClassUser = XlargeRCuser, OptionalAllocMaxMemory = 0.5)
{Exec SPTranform1;
Exec SPTranform2;
Exec SPTranform3;
Exec SPTranform4;
}where 4 sprocs will run in parallel with defined RC and max memory allocation in percentage/ratio to the total max available for the RC,…
7 votesThanks for the suggestion. We announced workload isolation with workload groups for preview at Ignite Nov., 2019. Workload groups give you a lot more flexibility to define resource usage and thus concurrency. Please have a look at workload groups and share more feedback.
-
7 votes
We announced some of these capabilities for preview at Ignite this Nov., 2019. Workload groups allow for reserving and capping resources. You can also configure a query timeout to cancel a query.
-
Allow IDENTITY columns to be used in Hash Distribution
Since May 2020 the ability to update data in the distribution column has been GA, shouldn't this enable using IDENTITY columns in table distributions?
When using IDENTITY surrogate keys it would be ideal to distribute on the column (i.e. distribute on ProductSkey across all tables).
6 votes -
Gen2 not all it was touted to be
Our organization was assured that Gen2 would do wonders for our big data performance. It didn't. We were told that Gen2 adds in data caching that will make things run faster. It doesn't. To make use of data caching, we added clustered columnstore indexes to our tables that have an excess of 100 million records, and our data analytics now run a few minutes longer, not shorter; our result sets are too large to cache and reuse. Looking at the data warehouse usage stats, compute used doesn't exceed 40 DWUs, but the only way to get faster runtimes is to…
6 votes -
Workload Isolation in Azure Synapse
Workload isolation in Synapse currently reserve resource per request on the basis of REQUESTMINRESOURCEGRANTPERCENT and REQUESTMAXRESOURCEGRANTPERCENT which did not differentiate between a high cost query and low cost query ( short query) . We want a mechanism which can allocate the resource in a workload on the basis of Query cost this will help in optimal resource utilization within a workload group.
5 votes -
Ignore update when there are no changes
In DW project, usually there are scenarios where full data load of dimension happens. Even if there are no attributes changes, the timestamp of the records get change and that causes cascading impact in the DW system, wherein dependent facts and dimension are full processed. This is very resource intensive task and causes entire DW facts recreation.
To avoid this, it would be good to have a T-SQL syntax wherein it checks the required attributes between the incoming record and the destination and updates if and only if the attribute is change. eg:
UPDATE Emp UPDATE WHEN Name, DOB CHANGED …
4 votes -
Provide insights about resource usage in Synapse Serverless
Currently it's fully intransparent for users of Synapse Serverless how much resources are used to execute a query and when/why requests are queued because current resources are used-up and other queries have to be finished until the new query can be started. That leads to situations where very fast commands don't execute for a long time just because another command is eating up resources. How can we understand why a command is queued?
3 votes
- Don't see your idea?