Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

UTF-8 in SQL 2019: NVARCHAR to VARCHAR(MAX) can get Msg 8152 "String or binary data would be truncated"

In SQL Server 2019 CTP 2, you can receive a "String or binary data would be truncated" error when converting (implicitly or explicitly) an NVARCHAR value into VARCHAR(MAX) if the Unicode string contains and Code Points that require 3 bytes in UTF-8 AND the size (in bytes) of the NVARCHAR value is not large enough to contain the UTF-8 version of the string once the 2-byte UTF-16 characters become 3-byte UTF-8 characters. Hence, this is only a potential problem when Code Points U+0800 through U+FFFF (decimal 2048 - 65535) are present in the NVARCHAR value.

This does not affect VARCHAR(1 - 4000) for some reason; only VARCHAR(MAX).

For example, NVARCHAR(2) uses 4 bytes. Converting an NVARCHAR(2) into VARCHAR(MAX), either as a column set to a "_UTF8" Collation, or as a variable (assuming that the database's default Collation is a "_UTF8" Collation), will allow for a maximum of 4 bytes in the resulting UTF-8 encoding of those same characters. Any combination of characters that requires more than 4 bytes will get this error (if storing to a column) or silent truncation (if storing to a variable).

Consider the following example:

-------------------------------------------
CREATE DATABASE [UTF8] COLLATE Latin1_General_100_CI_AS_SC_UTF8;

USE [UTF8];

SELECT DATALENGTH(N'ῢ') AS [UTF16bytes], -- 2
DATALENGTH('ῢ') AS [UTF8bytes]; -- 3

CREATE TABLE #Utf8BugTest_1 ([Something] VARCHAR(50)
COLLATE Latin1_General_100_CI_AS_SC_UTF8 NOT NULL);

INSERT INTO #Utf8BugTest_1 ([Something]) VALUES (N'ῢ'); -- works

INSERT INTO #Utf8BugTest_1 ([Something]) VALUES ('ῢ'); -- works

INSERT INTO #Utf8BugTest_1 ([Something])
SELECT N'ῢ'
FROM (VALUES (1)) tmp(d); -- works

INSERT INTO #Utf8BugTest_1 ([Something])
VALUES (CONVERT(NVARCHAR(1), N'ῢ')); -- works

INSERT INTO #Utf8BugTest_1 ([Something])
VALUES (CONVERT(NVARCHAR(MAX), N'ῢ')); -- works

--------------------

CREATE TABLE #Utf8BugTest_1max ([Something] VARCHAR(MAX)
COLLATE Latin1_General_100_CI_AS_SC_UTF8 NOT NULL);

-- SELECT * FROM #Utf8BugTest_1max;

INSERT INTO #Utf8BugTest_1max ([Something]) VALUES (N'ῢ');
/*
Msg 8152, Level 16, State 2, Line XXXXX
String or binary data would be truncated.
*/

INSERT INTO #Utf8BugTest_1max ([Something]) VALUES ('ῢ'); -- works

INSERT INTO #Utf8BugTest_1max ([Something])
SELECT N'ῢ'
FROM (VALUES (1)) tmp(d);
/*
Msg 8152, Level 16, State 2, Line XXXXX
String or binary data would be truncated.
*/

INSERT INTO #Utf8BugTest_1max ([Something]) VALUES (CONVERT(NVARCHAR(1), N'ῢ'));
/*
Msg 8152, Level 16, State 2, Line XXXXX
String or binary data would be truncated.
*/

INSERT INTO #Utf8BugTest_1max ([Something]) VALUES (CONVERT(NVARCHAR(2), N'ῢ'));
-- works!

INSERT INTO #Utf8BugTest_1max ([Something]) VALUES (CONVERT(NVARCHAR(2), N'ῢῢ'));
/*
Msg 8152, Level 16, State 2, Line XXXXX
String or binary data would be truncated.
*/

CREATE TABLE #Utf8BugTest_1src ([Utf16] NVARCHAR(1) NOT NULL);
INSERT INTO #Utf8BugTest_1src ([Utf16]) VALUES (N'ῢ');
SELECT * FROM #Utf8BugTest_1src;

INSERT INTO #Utf8BugTest_1max ([Something])
SELECT [Utf16]
FROM #Utf8BugTest_1src
/*
Msg 8152, Level 16, State 2, Line XXXXX
String or binary data would be truncated.
*/

--------------------

DECLARE @Utf8BugTest VARCHAR(10) = N'ῢ';
SELECT DATALENGTH(@Utf8BugTest), @Utf8BugTest;
-- 3 ῢ

DECLARE @Utf8BugTest_Max VARCHAR(MAX) = N'ῢ';
SELECT DATALENGTH(@Utf8BugTest_Max), @Utf8BugTest_Max;
-- 0

DECLARE @Utf8BugTest_Max2 VARCHAR(MAX) = N'ῢ۞';
SELECT DATALENGTH(@Utf8BugTest_Max2), @Utf8BugTest_Max2;
-- 3 ῢ
-- only takes first character since NVARCHAR string is 4 bytes but UTF-8
-- is 3 + 2, and only 1 byte left for second character which requires 2.

DECLARE @Utf8BugTest_Max3 VARCHAR(MAX) = N'ῢa';
SELECT DATALENGTH(@Utf8BugTest_Max3), @Utf8BugTest_Max3;
-- 4 ῢa
-- NVARCHAR string is 4 bytes and UTF-8 is 3 + 1, so both fit in.

1 vote
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Solomon Rutzky shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

3 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...

Feedback and Knowledge Base