Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

UTF-8 in SQL 2019: Binary UTF-8 Collations are missing

In SQL Server 2019, CPT 2 there are no binary versions of the new UTF-8 Collations. Why not? There does not seem to be any good reason for this omission.

1) It doesn't matter that the Code Page is the same for all UTF-8 Collations (Code Page = 65001): the same is true for all Unicode-only Collations (Code Page = 0) and there are multiples of those (not sure why though; it would make more sense to have a single "Common" / "Invariant" for "UnicodeOnly_BIN2" and "UTF8_BIN2").

2) It doesn't matter that binary Collations don't truly support Supplementary Characters: they don't for NVARCHAR, not even for Japanese_XJIS_140_BIN2 (remember, version 140 Collations implicitly support Supplementary Characters, except for the binary Collations). Just try this:

SELECT CONVERT(TEXT, N'a' COLLATE Japanese_XJIS_140_BIN2);
-- a

You don't get the error saying that the TEXT datatype can't be used with Collations that support Supplementary Characters (like you do if attempting to use "Japanese_XJIS_140_CI_AS").

3) Not having a binary Collation for the UTF8 Collations is inconsistent with all other Collations (even 2 of the SQL Server Collations have "_BIN2" versions!).

No need to add "_BIN" versions as that is just a waste of time and energy. But the "_BIN2" versions (at least) need to exist.

P.S. Please see "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?" ( https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/ ) for a detailed analysis of the new UTF-8 feature.

1 vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Solomon Rutzky shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    3 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...
      • Solomon Rutzky commented  ·   ·  Flag as inappropriate

        Hi Pedro. Yes, I will email you. But, to have it stated for other readers, the use cases would be (generally):

        1) whatever the current use cases are for anyone using a binary Collation for anything (getting string functions to work with characters that don't equate to anything, like CHAR(0) or some others, columns of alphanumeric codes that don't need linguistic rules and would benefit from the better performance of a binary Collation, etc)

        2) Now that CTP 2.1 adds the ability to select UTF-8 Collations as the Instance-level Collation, having the ability to continue using a binary Collation while also making use of the new UTF-8 encoding (for whatever questionable benefit it might provide). Point being, folks do use binary Collations at the Instance-level. If any of those folks want to continue doing so but also want to use UTF-8 for VARCHAR data, then they cannot do both at the same time, and for no stated (let alone "good") reason.

      Feedback and Knowledge Base