Upvotes: 109
<=-=Aug 21 2008 6:49PM=-=>
Hi Erland,
Thanks for your suggestion. We are considering adding support for UTF8 in the next version of SQL Server. It is not clear at this point if it will be a new type or integrate it with existing types. We understand the pain in terms of integrating with UTF8 data and we are looking at ways to effectively resolve it.
—
Umachandar, SQL Programmability Team
<=-=Aug 31 2008 5:48AM=-=>
UTF-8 strings have the same “value space” as UCS-2 strings. I strongly feel that this should be an option on the existing NCHAR/NVARCHAR types. Possible even a database-level default for this option. New data types would complicate interaction with applications when the use case is merely to reduce the storage requirement in half. This is merely a storage option — should be possible to flip this flag without introducing any changes or incompatibilities with regards to existing applications.
Another use case is interpreting binary data as Unicode by way of UTF-8. For example, with the following syntaxes:
CAST)
- or -
UTF8BytesToNVARCHAR(@Binary)
Currently these kinds of things can be done by SQLCLR UDF, but it would be better if these were in T-SQL because they are so important. This means that @Binary should be converted to Unicode using the rules of UTF-8. The result of this expression would actually be a UCS-2 NVARCHAR string, as I am assuming that strings would stay UCS-2 in memory, like they do in .NET. The UTF-8 would just be an on-disk thing.
<=-=Aug 31 2008 5:53AM=-=>
One more thing… when the data type is NVARCHAR, the system could actually look at the string and determine whether UTF-8 or UCS-2 storage is more efficient and then use that as the underlying storage. No need for interaction from the user.
<=-=Jul 30 2009 6:03AM=-=>
In addition to being CAST-able / CONVERT-able, it could be effectively implemented with the existing datastructures by the SET statements within TSQL. For example:
SET TEXT ENCODING {format | @format}
It’s analogous to the existing statement SET DATEFORMAT. As I understand it, it only concerns the output of the text, having little effect on storage format.
<=-=Jul 16 2010 5:55AM=-=>
This is a absolute must have for any RDBMS. Please add this in the coming release of SQL Server.
<=-=Sep 17 2010 5:08AM=-=>
OSQL should also support UTF-8 as encoding for SQL files.
Visual Studio optionally automatically converts SQL files containing UNICODE characters into UTF-8. If it does that, OSQL does not read nor execute those files correctly.
I suggest to use one of the standard MS text file reader API calls to read SQL files in OSQL. These automatically detect which file encoding is used and extract correct characters from the file stream.
Axel Dahmen
www.axeldahmen.de
<=-=Sep 17 2010 5:10AM=-=>
BULK INSERT should support below mentioned automatic file encoding detection as well.
<=-=Mar 21 2011 2:57PM=-=>
To some extent this requestion has been addressed in SQL 2008 R2 with the addition of Unicode compression. That addresses the space issue. Unfortunately, this is only an Enterprise Edition feature.
There is still need to support UTF-8 in tools and BULK INSERT, and maybe also in TDS to save network bandwidth. But I leaning more and more towards that there should be any new data type, collation or whatever to support UTF-8. It is very difficult to think of a solution that would not bring confusion and make SQL Server more difficult to use.
<=-=Jul 4 2011 2:24AM=-=>
I am facing a very big issue related to the fact that SQL Server does not support UTF-8 at collation level like Sybase and Oracle does.
Storing encoded data only based on a specific collation means that SQL server should be explicitly told that we are sending some Unicode data in order to store national character set. This effectively means that we cannot currently tell SQL server to treat all data as Unicode by default which is possible by setting character set to UTF-8, UTF-16, etc in Oracle and Sybase.
The issue is when we try and internationalize some large old applications which uses inline queries, it’s very difficult to port the changed application to SQL Server due to the amount to change involved and just in case you want to avoid the change, you will have to make so many complex string processing logic, that application becomes unusable.
I request Microsoft to provide the feature to set collation setting to formats like UTF-8 so that we solve many such related issues .
Thanks,
Kishor
(kishorjha@msn.com)
<=-=Nov 14 2011 2:37PM=-=>
It is 3 years later since this request was put in. Is there a new Microsoft answer? Anything in the pipeline?
<=-=Jan 1 2012 10:14PM=-=>
Over three years ago and the utf-8 issue appears still not to be addressed. I was hoping to turn the trial into a purchase like many others. But like them, looks like I’m still stuck with the other guy. There is no way I am rewriting 7 years worth of programming to adapt to Microsoft I am only one person that doesn’t matter. ( How many others feel this way.)
<=-=May 14 2012 6:42PM=-=>
We’ve ran into this too. VARCHAR cannot be converted to NVARCHAR to enable Unicode support for the obvious reason. Our usage model says we could convert to VARCHAR UTF8 and expect it to work.
<=-=May 8 2016 3:43PM=-=>
Any chance of an update on this one? This is now almost 8 years old…
<=-=Jul 7 2017 6:30AM=-=>
Seven years since it was introduced, I still can’t store the toilet emoji as text in SQL Server. ??