Support standard escape sequences in strings (e.g. unescape "\t" into CHAR(9), "\n" into CHAR(10), etc)
Most languages allow for a limited number of escape sequences in strings, such as: "\n" for newline (0x0A), and "\t" for tab (0x09). Currently we need to concatenate those values in as either CHAR(0x0A) / CHAR(10) or assign that to a CHAR variable and concatenate the variable in. For example:
DECLARE @CRLF CHAR(2) = CHAR(0x0D) + CHAR(0x0A), -- same as CHAR(13) + CHAR(10)
@TAB CHAR(1) = CHAR(0x09); -- same as CHAR(9)
PRINT 'first part' + @TAB + 'middle' + @TAB + 'end.' + @CRLF
+ 'Now on next line.' + @CRLF;
That concatenation is a lot more unwieldy than what most other languages (yes, even SQL in other RDBMSs) support, which would be equivalent to:
PRINT 'first part\tmiddle\tend\r\nNow on next line.\r\n';
But, this isn't merely syntactic sugar (though that is certainly a large part of this). There are other benefits to fewer concatenations:
1) fewer points of failure: code will be more readable, hence more maintainable, and it will be harder to get lost in starting and ending string literals, which also reduces chances of mixing VARCHAR and NVARCHAR literals, so fewer implicit conversions. Oh, and sometimes people get confused about newlines vs CRLF and do CHAR(13) only when they should have done CHAR(10) only (for newline).
2) Reduced development time: how many people remember that a tab is CHAR(9)? Far fewer than those that remember that "\t" becomes a tab. How much time to people spend looking up these codes and/or not wanting to look them up again but yet mis-remembering and wasting more time on using the wrong one?
3) Faster execution: string concatenation is not especially fast, especially if you have a lot of them. It is well known that they do not scale well, and T-SQL does not have an alternate like .NET has StringBuilder. Having these handled as escape sequences fits cleanly into the query parsing step.
Sure, we can all create a UDF that translates these, which solves most of the problems, BUT this is such a common thing that it will be pretty much ALL systems that would benefit. Regular Expressions are commonly requested, yet this is even more common than RegEx. Besides, T-SQL scalar UDFs do not scale well prior to the auto-inlining introduced in SQL Server 2019, yet this is something that can be back-ported to all currently supported versions via Service Pack or CU. Of course, since it is possible (even if unlikely) that current code has such sequences and does not expect any translations, a session-level, or possibly instance-level (if database-scoped level is not available for all currently supported versions) config option: PROCESSSTRINGESCAPES or PROCESSESCAPESEQUENCES or ENABLEESCAPESEQUENCES or maybe just ESCAPESEQUENCES, or maybe use STRINGESCAPES in place of ESCAPE_SEQUENCES in those previous suggestions, something to that effect.
Please handle the following, initially taken from https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/#string-escape-sequences :
' — Single quote — 0x0027
\ — Backslash — 0x005C
\a — Alert — 0x0007
\b — Backspace — 0x0008
\f — Form feed — 0x000C
\n — New line — 0x000A
\r — Carriage return — 0x000D
\t — Horizontal tab — 0x0009
\v — Vertical tab — 0x000B
\u — Unicode escape sequence (UTF-16) — \uHHHH (range: 0000 - FFFF; example: \u00E7 = "ç")
\U — Unicode escape sequence (UTF-32) — \U00HHHHHH (range: 000000 - 10FFFF; example: \U0001F47D = "👽")
\x — 8-bit escape for VARCHAR — \xHH (range: 00 - FF; example: \xE7 = "ç" for code page 1252)
1) Yes, T-SQL already has an escape sequence for a single-quote: two single-quotes. Still, the ' syntax is consistent with most other languages (i.e. well known) and is more readable, hence less error-prone code, especially when working with nested strings.
2) Escape for double-quote (i.e. " ) is not needed
3) Escape for CHAR(0) (i.e. "\0" ) is not really needed, especially when "\x00" works just as well, and "\0" does not provide any convenience of not requiring the user to remember the underlying ASCII value.
4) Not sure how useful "\a", "\b", "\f", and "\v" are, outside of maintaining consistency with other languages (and not being entirely useless like " is, and \0 mostly is).
5) "\u" alleviates the need to concatenate NCHAR(0x17D8) when it could be expressed inline as "\u17D8".
6) "\U" not only alleviates the need to concatenate NCHAR(0x1D11F) when it could be expressed inline as "\U0001D11F", there is even more benefit when not in a database that has a default collation that supports Supplementary Characters, in which case you currently need to concatenate the surrogate pair via "NCHAR(0xD834) + NCHAR(0xDD1F)" in order to get the 𝄟 character. This would allow for easily creating Supplementary Characters in DBs using older collations.
7) "\x" should not follow the C# approach of allowing 1 - 4 hex digits, but instead follow what nearly all other languages do and require always 2 (and only 2) hex digits. This would be for injecting the specified byte into the string based on the string's initial encoding. Meaning, "\x00" - "\x7F" should produce the same character in all collations, but "\xE7" might produce a different character based on the code page specified by the DB's default collation. Using this escape in an NVARCHAR literal would use the ISO-8859-1 character set, similar to using the undocumented
UNCOMPRESS() function ( https://sqlquantumleap.com/2019/02/26/what-does-the-undocumented-uncompress-function-do/ ). Meaning, in a database with a default collation of Hebrew100BIN2, both of the following queries:
SELECT CHAR(0xE7), UNCOMPRESS(0xE7);
SELECT '\xE7', N'\xE7';
ח ... ç
For more background on why I am making this request, please see: Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters) ( https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/ ).