SELECT-powered variable assignment result invalid when selecting from Information_Schema.COLUMNS and other conditions
Product: Microsoft SQL Server
Platform: Darwin 17.5.0 Darwin Kernel Version 17.5.0: Fri Apr 13 19:32:32 PDT 2018; root:xnu-4570.51.2~1/RELEASE_X86_64 x86_64
Virtual Env: Docker 18.03.1-ce, build 9ee9f40 on
Description: Below is the characterization of a behavior defect involving assignment of a local variable from INFORMATION_SCHEMA.COLUMNS and the code necessary for reproduction. A simple/naive workaround is offered.
WHEN a local variable is being assigned to within a SELECT statement
... AND the SELECT statement is against INFORMATION_SCHEMA.COLUMNS
... AND a WHERE equality predicate on TABLE_NAME is present
... AND an ORDER BY clause on ORDINAL_POSITION is present
... AND the variable's assignment value is a COALESCE() function
... AND the COALESCE() function first parameter is the concatentation between
the variable being assigned and a REPLACE() function
... AND the REPLACE() function is making an arbitrary replacement with a replacement parameter of COLUMN_NAME
... AND the COALESCE() function second parameter is COLUMN_NAME
THEN a concatenated list of table COLUMN_NAMES should be assigned to the variable.
The variable is assigned the last column name in the SELECT's result set.
Solomon Rutzky commented
Hi Russell. It seems that "aggregate concatenation" (i.e. SELECT @x = @x + something) is not actually supported by SQL Server. The recommendation is to use one of the following:
1) FOR XML
Please see this answer on DBA.StackExchange for details: https://stackoverflow.com/a/15163136/577765
That being said, I think there are some minor changes that you could make to what you are doing that might "fix" the unexpected optimization issue:
1) In the example you provided, I declared a variable:
DECLARE @Dummy NVARCHAR(1) = N'';
and then concatenated that with the first item in the COALESCE (since that one will always be evaluated):
REPLACE( N'x', N'x', c.COLUMN_NAME + @Dummy)
I attached it to the column name because that is most likely the item that the optimizer is confused about, and the goal here is to force it to evaluate that expression each time.
2) Even better than "fixing" the COALESCE would be removing it entirely. It's not really helping, adds complication to the query, and is an extra IF branch being evaluated per each row. Just do:
SELECT @list = @list + REPLACE( N'x', N'x', c.COLUMN_NAME)
In which case you don't need the "+ @Dummy". All you need is an extra step after that query to remove a delimiter, assuming you were concatenating one into the string. If you add the delimiter to the beginning of the string (counter-intuitive to most folks, but does make this easier), then you can simply use SUBSTRING to grab all but the first character (since the "length" parameter doesn't care if you request more characters than exist):
SET @list = SUBSTRING(@list, 2, 80000);
Here you don't need to calculate the length of the string :-)
For anyone using SQL Server 2017 or newer, you can also use the new TRIM() built-in function to remove the extra delimiter from the beginning or end of the string:
SET @list = TRIM(N',' FROM @list);
Using your demo script, I was able to reproduce the problem on SQL Server 2017 CU14, and either one of those suggestions above "fixed" the problem.
1) I would avoid INFORMATION_SCHEMA entirely. Is best to go directly to the system catalog views: sys.objects, sys.tables, sys.columns, etc.
2) All meta-data should be NVARCHAR, so your variables should also be NVARCHAR and string literals prefixed with an uppercase "N".
Take care, Solomon..