Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

nvarchar only allocates 2 bytes per character. len() reports string lengths incorrectly

nvarchar(length) is supposed to store Unicode strings as UTF-8-encoded. UTF-8 can have character lengths of up to 4 bytes. Attempting to store 4-byte characters in an nvarchar field only results in half the expected string length available. Strings can be truncated.

Similarly, the len() function for strings like this returns twice the actual length in characters.

0 votes
Sign in
(thinking…)
Password icon
Signed in as (Sign out)

We’ll send you updates on this idea

AdminMicrosoft SQL Server (Product Manager, Microsoft Azure) shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

1 comment

Sign in
(thinking…)
Password icon
Signed in as (Sign out)
Submitting...
  • Solomon Rutzky commented  ·   ·  Flag as inappropriate

    This issue is just a misunderstanding of how SQL Server handles Unicode. Until _very_ recently, when SQL Server 2019 (currently only at CTP 2) added support for UTF-8, it only ever used UTF-16 LE, which is the encoding used by NCHAR / NVARCHAR / NTEXT / string data within the XML type. UTF-16 is a fixed, 2-byte per "unit" encoding. Supplementary Characters are comprised of two of these 2-byte units, known as a Surrogate Pair, and as such, require 4 bytes. Still, NVARCHAR(X) will allocate X * 2 bytes. If you need Supplementary Characters, you need to double X to reserve the required amount of bytes. This makes sense enough, though, since again, a Supplementary Character _is_ two surrogate Code Points. LEN will show the actual number of UCS-2 Code Points when not using a Supplementary Character-Aware Collation (meaning, it will only see the two individual surrogate Code Points, and so the Supplementary Character will register having a LEN of 2). When using a Supplementary Character-Aware Collation Collation, LEN will report the number of UTF-16 Code Points (so a Supplementary Character will register having a LEN of 1). DATALENGTH always shows the exact number of bytes being used.

    -------------------------------------

    This item is a non-issue and should be closed.

Feedback and Knowledge Base