Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

BULK INSERT does not work Utf-8 support enabled for system locale

In Windows 10 (build 1803) you can go the Control Panel and select the Region applet. Here you can go to the Administrative tab and in the second half of that tab, you can change the System Locale. In this dialog, there is a checkbox "Beta: utf8 support". If you check this, reboot Windows, you find that any attempt to use BULK INSERT fails with

Msg 2775, Level 17, State 12, Line 3
The code page 65001 is not supported by the server.

For instance

create table ttt(namn nvarchar(24) NOT NULL, col2 char(3), col3 char(3) NOT NULL)
go
BULK INSERT ttt FROM 'C:\temp\slask.bcp'
go
BULK INSERT ttt FROM 'C:\temp\slask.bcp'
WITH (CODEPAGE = 1252)

(Tested on SQL 2017 RTM)

There is detail which makes this bug more serious that it may seem at first. To wit, the script INSTALL\msdb110_upgrade.sql which is run on any updates, including GDRs, and this script includes a BULK INSERT statement. Which fails, and then causes the upgrade to fail, and since it happens in master, SQL Server will not start. See https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b22b950d-a363-4422-b907-f6f09c97a2ae/sqlserver-2017-developer-edition-engine-service-stopped-working-after-the-update-kb293803?forum=sqldatabaseengine for a case where this happened in the real world.

Thus, this is not somehing that can be "fixed in the next released", but needs to be addressed with regards to servicing in all versions. But that could be a Setup check so that Setup never starts if the system locale has UTF-8 support enabled.

8 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Erland Sommarskog shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    1 comment

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...
      • Solomon Rutzky commented  ·   ·  Flag as inappropriate

        Hello Erland. I have been able to reproduce this, and have some additional details to share that might help the SQL Server team locate and fix the issue. It certainly seems to be a code-path intended to verify some aspect(s) of the environment, based on the fact that the input file isn't even required to exist in order to get this error (though if it does exist, then it will be checked to verify if it has a Byte Order Mark (BOM) in order to determine if the "DataFileType" setting is correct, and if not it will tell you how completely wrong you are and that it is smarter and will do whatever it was going to do anyway). This bug is also present in "OPENROWSET(BULK...)". BCP (which uses ODBC v13) works just fine: "-w" for UTF-16 files, and "-c -C 65001" for UTF-8 files.

        -------------------------------------------------------------------------------
        CREATE TABLE #BulkInsertFail ([Col1] ROWVERSION); -- wouldn't work anyway
        BULK INSERT #BulkInsertFail FROM 'bob';
        /*
        -- I only get this first error on SQL Server 2019 CTP 2.2
        Msg 4860, Level 16, State 1, Line XXXXX
        Cannot bulk load. The file "bob" does not exist or you don't have file access rights.

        -- I get this error on both SQL Server 2017 and 2019
        Msg 2775, Level 17, State 12, Line XXXXX
        The code page 65001 is not supported by the server.
        */

        -- The following file has a BOM indicating UTF-16LE, read correctly by Notepad++
        BULK INSERT #BulkInsertFail FROM 'C:\TEMP\UTF16LE-Import.txt';
        -- The code page 65001 is not supported by the server.

        SELECT * FROM OPENROWSET(BULK 'bob', SINGLE_BLOB ) tab(col);
        -- The code page 65001 is not supported by the server.
        -- Using SINGLE_CLOB or SINGLE_NCLOB did not help

        -- The following file has a UTF-8 BOM, but it doesn't help
        BULK INSERT #BulkInsertFail FROM 'C:\TEMP\UTF8-Import.txt'
        WITH (DATAFILETYPE = 'Char', CODEPAGE = '65001');
        -- The code page 65001 is not supported by the server.
        -------------------------------------------------------------------------------

        And for BULK INSERT, I did try every combination of DATAFILETYPE = 'CHAR' | 'WIDECHAR' | 'NATIVE' | 'WIDENATIVE' and/or CODEPAGE = 'ACP' | 'OEM' | 'RAW' | '1252' | '65001'. Yes, several of those options wouldn't work anyway; I was just trying to see if I could cause a change in behavior. But, the only change was merely that sometimes I would get an informational message along the lines of: "Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature." All this proves is that the FileType verification happens before the environmental check that blows up.

        I tested on:
        1) Microsoft SQL Server 2019 (CTP2.2) - 15.0.1200.24
        2) Microsoft SQL Server 2017 (RTM-CU12) (KB4464082) - 14.0.3045.24

        And just FYI: SQLCLR file system access does correctly default to UTF-8, can correctly detect the encoding from the BOM (if present), and can use any specified encoding if no BOM is present.

      Feedback and Knowledge Base