Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

Restoring databases into newer versions of SQL Server will create missing Internal Tables with incorrect collation

When restoring a backed-up database into a version of SQL Server that is newer _and_ the newer version contains internal tables that did not exist in the older version (where the DB being restored came from), SQL Server will create those missing internal tables as part of the database upgrade process. BUT, in creating those tables, the collation used for all but a few tables using "Latin1_General_BIN" is the default collation of the DB being restored (i.e. DATABASE_DEFAULT). However, for any database created in a version of SQL Server that includes those internal tables, those tables all use a collation of "SQL_Latin1_General_CP1_CI_AS". I have checked many different versions of SQL Server and there is no variation on this: there is only one collation used for these internal tables (except for the few columns using "Latin1_General_BIN").

Please change the table creation process to include a "COLLATE SQL_Latin1_General_CP1_CI_AS" clause. (Ideally it would be "COLLATE Latin1_General_100_CI_AS_KS_WS_SC", but I know that won't happen)

Reproducing this situation is fairly easy. Below is what I did (copied from my answer -- linked at the end -- to someone questioning why these internal tables had a different collation than expected):

===================================================
I backed up a DB from SQL Server 2012 LocalDB that had a collation of Latin1_General_100_CS_AS_KS_SC. I then restored that into SQL Server 2019 (the instance-level collation being UTF8_BIN2). Checking the internal tables I saw that the 3 queue_messages_* tables and the filestream_tombstone_2073058421 table (all of which existed in SQL Server 2012) still had a collation of Latin1_General_BIN. BUT, the sqlagent_* tables and plan_persist_* tables (none of which existed in SQL Server 2012), all had a collation of Latin1_General_100_CS_AS_KS_SC (same as the DB that was restored).

I then backed up a DB from SQL Server 2014 LocalDB that had a collation of Latin1_General_100_CS_AS_KS_WS_SC. I then restored that into SQL Server 2019 (the instance-level collation being UTF8_BIN2). Checking the internal tables I saw that the 3 queue_messages_* tables and the filestream_tombstone_2073058421 table (all of which existed in SQL Server 2012) still had a collation of Latin1_General_BIN. BUT, this time the sqlagent_* tables (which did exist in SQL Server 2014), all had a collation of SQL_Latin1_General_CP1_CI_AS. The plan_persist_* tables (only 2 of the 6 existed in SQL Server 2014), all had a collation of Latin1_General_100_CS_AS_KS_WS_SC (again, same as the DB that was restored).

I changed the compatibility mode of the DB that was restored from SQL Server 2012 to be "SQL Server 2019 (150)", but that did not fix the collation issue.
===================================================

You can easily see the issue via the following query:

-------------------------------------------------------------------
SELECT OBJECT_SCHEMA_NAME(col.[object_id]) AS [SchemaName],
tab.[name] AS [TableName],
col.[name] AS [ColumnName],
col.collation_name,
col.[system_type_id],
col.[object_id],
'---' AS [---],
tab.*
FROM sys.columns col
INNER JOIN sys.internal_tables tab
ON col.[object_id] = tab.[object_id]
WHERE col.collation_name IS NOT NULL
ORDER BY tab.[name], col.[name];
-------------------------------------------------------------------

I am not sure if this behavior produces any errors or problems, but it's definitely inconsistent with databases created in the newer version of SQL Server, where the internal tables are a part of that version and always have an unchangeable collation of "SQL_Latin1_General_CP1_CI_AS" (which is sad given how many versions beyond SQL Server 2008 we are now, and tables _starting_ in SQL Server 2014 and newer are still using this pre-2000 collation instead of the correct collation: "Latin1_General_100_CI_AS_KS_WS_SC").

P.S. This bug was discovered in the following Q&A on DBA.StackExchange:

SQL Server 2016 Collation and Internal table Glitch? ( https://dba.stackexchange.com/a/232046/30859 )

P.P.S. This behavior (i.e. creating the columns with the DATABASE_DEFAULT collation, most likely through not specifying a collation, yet always having the pre-existing iternal tables using "SQL_Latin1_General_CP1_CI_AS") indicates that Microsoft developers are using this ancient and deficient "SQL_Latin1_General_CP1_CI_AS" collation. How is this even possible? The SQL Server collations weren't even "current" when SQL Server 2000 was released; they were included for backwards compatibility. Why is Microsoft _not_ using Windows collations? It has been 18 years (the most recent internal tables were developed in 2018) and NINE versions since the Windows collations were includede (2000, 2005, 2008, 2008 R2, 2012, 2014, 2016, 2017, and now 2019). Please stop using SQL Server collations. PLEASE.

P.P.P.S. If there is resistance to moving away from the SQL Server collations that you should not be using anymore, isn't this bug proof enough that there is nothing to fear? ALL databases set to a collation that is _not_ "SQL_Latin1_General_CP1_CI_AS" that are restored into a newer version that includes internal tables that were not available in that earlier version (and hence missing from the DB being restored) are using whatever random collation that the DB was set to, and are _not_ using "SQL_Latin1_General_CP1_CI_AS". So, if this is the first time you are hearing about this behavior, that implies that nobody has reported "strange behavior after restoring into a newer version" that could be traced back to these tables (and a collation mismatch error, for example, would be difficult to not notice). This situation has been around for at least 7 years now (if not longer). Hence, the SQL Server customer base has inadvertently QAed this move away from "SQL_Latin1_General_CP1_CI_AS", and so far so good, right? Of course, it is hard to tell how many customers are a) upgrading DBs that are not using "SQL_Latin1_General_CP1_CI_AS", AND b) are using any of the features that make use of these internal tables. Still, I am assuming that the customer-base is large enough (especially with having the free Express edition) that this issue would have come up by now, even if only posted to general forums (and not reported to MS tech support), but I can't find anything related to this specific issue.

2 votes
Sign in
(thinking…)
Password icon
Signed in as (Sign out)

We’ll send you updates on this idea

Solomon Rutzky shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

0 comments

Sign in
(thinking…)
Password icon
Signed in as (Sign out)
Submitting...

Feedback and Knowledge Base