Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

T-SQL Common Table Expression "Materialize" Option

T-SQL Common Table Expression "Materialize" Option

I have run into a couple performance and results issues using Common Table Expressions (CTEs).

The first case is where I coded a data retrieval as a single, large query with a CTE that is referenced multiple times in the query. The CTE has an expensive query. I was able to determine, via execution plans, etc., that the results of the CTE were being recalculated multiple times, resulting in slow execution. I rewrote the data retrieval as multiple steps where the query of the CTE was first executed into a table variable, and I replaced the CTE references in my query with the table variable. The data retrieval ran much faster.

The second case involves a table that has a varchar column that stores user-typed input. The system knows for each row whether the user-typed input should be interpreted as text, numbers or dates. We wrote a query like this:

SELECT tt.TableID, CONVERT(int, tt.UserInput)
FROM dbo.TheTable tt
INNER JOIN dbo.Interpretation i
WHERE tt.IntprID = i.IntprID
AND i.Type = 'integer';

When we run the query we get the error "Conversion failed when converting the varchar value '4/7/2006 12:05:40 PM' to data type int." This is presumably because the optimizer is choosing to run the CONVERT statement before it has computed the join result.

I envision a T-SQL enhancement that would solve both of these problems. It would be to allow database developers to write a T-SQL common table expression something like this:

WITH MyCTE [(Col, …n)] [WITH {[UNIQUE] CLUSTERED INDEX (Col, …n)} [, INDEX(Col, …n), …n]]AS (
select ExpressionA, ExpressionB, …n
from MyTableSource
)
...

This would force the SQL Server Database Engine to compute and store the results of the CTE query in the equivalent of a table variable (or whatever), and subsequent references to the CTE name will be serviced from the stored results. Like any table or join hint, WITH INDEX should be only used as a last resort when default performance is unacceptable.

For the first case, it would essentially do what I did by explicitly creating and populating a table variable. It would limit the execution of the expensive query to one time.

For the second case, one could rewrite the query like this:

WITH TheCTE (TableID, UserInput) WITH CLUSTERED INDEX (TableID) AS (
SELECT tt.TableID, tt.UserInputCONVERT(int, tt.UserInput)
FROM dbo.TheTable tt
INNER JOIN dbo.Interpretation i
WHERE tt.IntprID = i.IntprID
AND i.Type = 'integer';
)
SELECT TableID, CONVERT(int, UserInput)
FROM TheCTE;

In this example, the rows where the UserInput is something other than integer have been forced to be filtered out before the CONVERT function is executed, and we will not get the "Conversion failed..." error.

Giving this suggestion serious consideration would be greatly appreciated.

93 votes
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

JediSQL shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

5 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
An error occurred while saving the comment
  • O E commented  ·   ·  Flag as inappropriate

    I've intermittently been forced to materialize a CTE in order to make a stored procedure easier for the SQL engine to parallelize.

  • JediSQL commented  ·   ·  Flag as inappropriate

    I have another use case. We frequently get XML data from Web services, and we use the xml data type functions to parse the XML to tabular format. Joins between @MyXML.value('(Root/Element)[1]','datatype') and table columns can be very slow. Shredding the XML into an indexed table variable or temp table can really improve performance. Materialized CTEs would help with this, too.

  • Mighty-O commented  ·   ·  Flag as inappropriate

    As I'm sure you already tried your second case maybe be mitigated a few ways and I think this may help in better understanding the source of the conversion issue. I suspect is the most common way:
    1. Nesting your results set.

    SELECT v.TableID, UserInput=CONVERT(int, v.UserInput)
    FROM (
    SELECT tt.TableID, tt.UserInput
    FROM dbo.TheTable tt
    INNER JOIN dbo.Interpretation i
    WHERE tt.IntprID = i.IntprID
    AND i.Type = 'integer'
    ) v;

    I believe that would mitigate your conversion error. I didn't try in your example but have used similar many-many times.

    The cause I believe revolves around efficient conversion, the optimizer drills down and applies the conversion as an expression, applying it on the read side of the sequence. By nesting the result I believe we push the expression to the outer side (the results set). I could be wrong and I am sure one of the engine guys and gals can correct me on this.

    To your first example, I'd start by saying everything revolves around trade-offs and an outside determinism, like having to author a single statement query being often the root of the problem not what the optimizer did with the mess it was sent. Mitigation then comes in the form of aggressive cover and filtered indexes or as you noted breaking the query into silos for processing.

    Put it another way, I agree CTE's can be hit and miss particularly as the number of internal temporary objects are employed. Remember a lot of sorting, spooling, yada-yada-yada goes into the plan the more likely a CTE's going to get tripped up. I humbly would note a plan performance dramatically different as a CTE, vs. breaking it into steps, points to an under optimized structure (e.g. cover indexes should equalize out the performance and may result in the CTE being "faster").

    I am finding a reliable test of complicated (aberrant) construction can be improved by constructing given query as CTE and seeing just how messy it becomes. Once you see where the break down occurs you may be able to design filtered / cover indexes to satisfy the request. Then you get to decide on the frequency, e.g. is it worth adding the cover index(s) or simply authoring the query in steps (divide and conquer).

  • Mark Freeman commented  ·   ·  Flag as inappropriate

    This is something I really miss from Oracle. I often have to refactor between using CTE and temp table because "it depends". Materializing CTEs would likely eliminate the need for that. It would be a lot simpler to refactor if all I had to do was add or remove an OPTION.

  • Nobody commented  ·   ·  Flag as inappropriate

    Second usecase has been problematic - and prone to classic 'runs for months then breaks' Errors- until 2012 with the advent of TRY_CONVERT...

Feedback and Knowledge Base