Sql – How should I store short text strings into a SQL Server database


varchar(255), varchar(256), nvarchar(255), nvarchar(256), nvarchar(max), etc?

256 seems like a nice, round, space-efficient number. But I've seen 255 used a lot. Why?

What's the difference between varchar and nvarchar?

Best Solution

In MS SQL Server (7.0 and up), varchar data is represented internally with up to three values:

  • The actual string of characters, which will be from 0 to something over 8000 bytes (it’s based on page size, the other columns stored for the row, and a few other factors)
  • Two bytes used to indicate how long the data string is (which produces a value from 0 to 8000+)
  • If the column is nullable, one bit in the row’s null bitmask (so the null status of up to eight nullable columns can be represented in one byte)

The important part is that two-byte data length indicator. If it was one byte, you could only properly record strings of length 0 to 255; with two bytes, you can record strings of length 0 to something over 64000+ (specifically, 2^16 -1). However, the SQL Server page length is 8k, which is where that 8000+ character limit comes from. (There's data overflow stuff in SQL 2005, but if your strings are going to be that long you should just go with varchar(max).)

So, no matter how long you declare your varchar datatype column to be (15, 127, 511), what you will actually be storing for each and every row is:

  • 2 bytes to indicate how long the string is
  • The actual string, i.e. the number of characters in that string

Which gets me to my point: a number of older systems used only 1 byte to store the string length, and that limited you to a maximum length of 255 characters, which isn’t all that long. With 2 bytes, you have no such arbitrary limit... and so I recommend picking a number that makes sense to the (presumed non-technically oriented) user. , I like 50, 100, 250, 500, even 1000. Given that base of 8000+ bytes of storage, 255 or 256 is just as efficient as 200 or 250, and less efficient when it comes time to explain things to the end users.

This applies to single byte data (i.e. ansii, SQL_Latin1*_*General_CP1, et. al.). If you have to store data for multiple code pages or languages using different alphabets, you’ll need to work with the nvarchar data type (which I think works the same, two bytes for number of charactesr, but each actual character of data requires two bytes of storage). If you have strings likely to go over 8000, or over 4000 in nvarchar, you will need to use the [n]varchar(max) datatypes.

And if you want to know why it is so very important to take up space with extra bytes just to track how long the data is, check out http://www.joelonsoftware.com/articles/fog0000000319.html