C# – Why are the Chinese characters not displayed correctly in c# string


I am storing Chinese and English text in an SQL Server 2005 database and displaying it on a webpage, but the Chinese is not being displayed correctly.
I have been reading about the subject and have done the following:

  • used N before the text in my INSERT statement
  • set the field type to nvarchar
  • set the charset of the page to UTF-8

Chinese characters are being displayed in the page correctly when I insert them directly into the page i.e. don't get them from the database

These are the characters that should be displayed:全澳甲流确诊病例已破100

This is what is displayed when the text is retrieved from the database: 全澳甲æµç¡®è¯Šç—…ä¾‹å·²ç ´1001

This seems to be something that is related to how strings are handled in c# because the Chinese can get retrieved and displayed correctly in classic asp

Is there anything else I need to do to get the data out of the database, into a string and output correctly on an aspx page?

Best Solution

So far the information is:

  1. You are using direct SQL INSERT script to insert into the database.
  2. The data appears broken in database.

The problem might lie in two places:

  1. In your INSERT statement, did you prefix the insert value with N?

    INSERT INTO #tmp VALUES (N'全澳甲流确诊病例已破100')

  2. If you prefix the value with N, does the String object hold the correct data?

    String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"

Here I assume value is a String object.

Does this String object hold the correct Chinese characters?

Try print out its value and see.


Let's assume the INSERT query is constructed as below:

String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"

I assume value holds the Chinese character.

Did you assign the Chinese characters into value directly? Like

String value = "全澳甲流确诊病例已破100";

The above code shall work. However, if you have done any intermediate processing, it will cause problem.

I did a localized TC project before; the previous architect had done several encoding conversions which are necessary in ASP; but they will create problem in .NET:

  String value = "全澳甲流确诊病例已破100";
  Encoding tc = Encoding.GetEncoding("BIG5");
  byte[] bytes = tc.GetBytes(value);
  value = Encoding.Unicode.GetString(bytes);

The above conversions are unnecessary. In .NET, simply direct assignment will work:

  String value = "全澳甲流确诊病例已破100";

That is because String constants and the String object itself are Unicode compliant.

The framework library, such as File IO, when reading a file which is not encoded in Unicode, they will convert the foreign encoding to Unicode; in other words, the framework will do this dirty job for you. You do not need to perform manual encoding conversion most of time.

Update: Understood that ASP is used to insert data into an SQL server.

I have written a small piece of ASP to insert some Chinese chars into SQL database and it works.

I have a database named "trans" and I created a table "temp" inside. The ASP page is encoded in UTF-8.

<head title="Untitled">
<meta http-equiv="content-type" content="text/html";charset="utf-8">
<script language="vbscript" runat="server">

If Request.Form("Button1") = "Submit" Then

    SqlQuery = "INSERT INTO trans..temp VALUES (N'" + Request.Form("Text1") + "')"

    Set cn = Server.CreateObject("ADODB.Connection")
    cn.Provider = "sqloledb"
    cn.Properties("Data Source").Value = *********
    cn.Properties("Initial Catalog").Value = "TRANS"
    cn.Properties("User ID").Value = "sa"
    cn.Properties("Password").Value = **********
    cn.Properties("Persist Security Info").Value = False


    Set cn = Nothing

    Response.Write SqlQuery
End If

<form name="form1" method="post" action="input.asp">
    <input name="Text1" type="text" />
    <input name="Button1" value="Submit" type="submit" />

The table is defined as belows in my database:

 create table temp (data NVARCHAR(100))

Submit the ASP page several times and my table contains proper Chinese data:

select * from trans..temp


Hope this can help.