UTF-8 in SQL 2019: Binary UTF-8 Collations are missing
In SQL Server 2019, CPT 2 there are no binary versions of the new UTF-8 Collations. Why not? There does not seem to be any good reason for this omission.
1) It doesn't matter that the Code Page is the same for all UTF-8 Collations (Code Page = 65001): the same is true for all Unicode-only Collations (Code Page = 0) and there are multiples of those (not sure why though; it would make more sense to have a single "Common" / "Invariant" for "UnicodeOnly_BIN2" and "UTF8_BIN2").
2) It doesn't matter that binary Collations don't truly support Supplementary Characters: they don't for NVARCHAR, not even for Japanese_XJIS_140_BIN2 (remember, version 140 Collations implicitly support Supplementary Characters, except for the binary Collations). Just try this:
SELECT CONVERT(TEXT, N'a' COLLATE Japanese_XJIS_140_BIN2);
You don't get the error saying that the TEXT datatype can't be used with Collations that support Supplementary Characters (like you do if attempting to use "Japanese_XJIS_140_CI_AS").
3) Not having a binary Collation for the UTF8 Collations is inconsistent with all other Collations (even 2 of the SQL Server Collations have "_BIN2" versions!).
No need to add "_BIN" versions as that is just a waste of time and energy. But the "_BIN2" versions (at least) need to exist.
P.S. Please see "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?" ( https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/ ) for a detailed analysis of the new UTF-8 feature.
Solomon Rutzky commented
I just checked CTP 2.2 and there are still no BIN2 UTF-8 Collations.
Solomon Rutzky commented
Hi Pedro. Yes, I will email you. But, to have it stated for other readers, the use cases would be (generally):
1) whatever the current use cases are for anyone using a binary Collation for anything (getting string functions to work with characters that don't equate to anything, like CHAR(0) or some others, columns of alphanumeric codes that don't need linguistic rules and would benefit from the better performance of a binary Collation, etc)
2) Now that CTP 2.1 adds the ability to select UTF-8 Collations as the Instance-level Collation, having the ability to continue using a binary Collation while also making use of the new UTF-8 encoding (for whatever questionable benefit it might provide). Point being, folks do use binary Collations at the Instance-level. If any of those folks want to continue doing so but also want to use UTF-8 for VARCHAR data, then they cannot do both at the same time, and for no stated (let alone "good") reason.
Pedro Lopes commented
Hello, can you provide more insight into your use cases? You can please email email@example.com. Reading through the blog provided didn't provide that insight. Thank you!