SSMS Always Encrypted wizard makes incorrect collation change for UTF8 columns
When you run the Always Encrypted wizard in SSMS 18 preview 6, and you select a column that has the collation FinnishSwedish100CSASSCUTF8 and then select a deterministic encryption, there is a warning triangle in the State column. That's normal, but the pop-up text says "The collation will be changed from FinnishSwedish100CSASSCUTF8 to FinnishSwedish100CsAcScUtf8" instead of the expected FinnishSwedish100_BIN2. If you don't observe this and continue, the wizard later fails, exactly with the message the collation is incorrect.

1 comment
-
Solomon Rutzky commented
Hi Erland. I think this bug is misstated. Given that:
a) there are no binary UTF-8 collations ( https://feedback.azure.com/forums/908035-sql-server/suggestions/35583976-utf-8-in-sql-2019-binary-utf-8-collations-are-mis )
and
b) a BIN2 collation is required for AE Deterministic encryption ( https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine?view=sql-server-2017#feature-details )
I would say that the bug is that the Always Encrypted wizard allows you to select a VARCHAR column using a UTF8 collation. There is a minor, related documentation bug since this is not explicitly stated as being a known restriction in the SQL Server 2019 Release Notes ( https://docs.microsoft.com/en-us/sql/sql-server/sql-server-ver15-release-notes?view=sql-server-ver15#utf-8-collations ). There are a few things missing from that list, actually.
I have not tested this yet myself, but if you are allowed to do deterministic encryption on an NVARCHAR column using a UTF8 collation because it would pick the BIN2 version of that collation, that makes sense as it is the same change as it would have been if the original collation was Finnish_Swedish_100_CS_AS_SC (going along with your example).
However, in your example, I believe it is incorrect to say that it's expected to go from Finnish_Swedish_100_CS_AS_SC_UTF8 to Finnish_Swedish_100_BIN2 since that would force a code page conversion (from 65001 to 1252) which could easily require change the bytes in those columns (prior to encryption).
The resolution here would be to:
1) update SSMS to disallow VARCHAR columns using a UTF8 collation from applying a deterministic encryption, and
2) if / when UTF-8 BIN2 collations are released, then update SSMS again to once again allow selecting VARCHAR columns using a UTF8 collation for deterministic encryption (assuming, of course, that the UI correctly chooses the UTF8 BIN2 collation--and hopefully there is only 1 of those so hopefully that will be an easy choice ;-)
I have added this issue to the CTP 2.2 Update section of my UTF-8 analysis: https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/#UpdateCTP22