What is Collation?

Collation is a set of rules dictating how each group of characters within SQL Server is treated. A list of collations and their definitions can be found using the following query.

SELECT name, description
FROM sys.fn_helpcollations()
WHERE COLLATIONPROPERTY(name, 'CodePage') = 0;

There are two types of collations which will appear in the results of the query listed above. These are SQL collations and Windows Collations.
To further break down collations, they are made up of the following options.

  • Prefix: This establishes whether or not it is a SQL Server or Windows Collation.
  • EXAMPLE:
    Omitted (Windows Collation)
    SQL_ (SQL Server Collation)

  • SortRules: This establishes the alphabet and language used.
  • EXAMPLE: ‘Latin1_General'(English)

  • Code Page: Identifies the code page used by the collation. A code page is used to determine the mapping between a character and its byte representation.
  • EXAMPLE: ‘CP1251’ (Code page 1251)

  • Case Sensitivity: This establishes whether it is case sensitive or insensitive.
  • EXAMPLES:
    ‘CI’ (Case Insensitive; A is the same as a; A=a)
    ‘CS’ (Case Sensitive; A is not the same as a; A!=a)

  • Accent Sensitivity: This establishes whether it is accent sensitive or insensitive.
  • EXAMPLES:
    ‘AI’ (Accent Insensitive; a is the same as à; a=à)
    ‘AS’ (Accent Sensitive; A is not the same as à; a!=à )

  • KanatypeSensitve: This establishes whether SQL Server considers the kana characters Hiragana and Katakana to be equal characters or not. By default, SQL Server collations have this set to insensitive.
  • EXAMPLES:
    Omitted (Kanatype Insensitive)
    KS (Kanatype Sensitive)

  • Width Sensitivity: This establishes whether SQL Server considers single bit characters equal to double bit characters. By default, SQL Server collations have this set to insensitive.
  • EXAMPLES:
    Omitted (Width Insensitive;A6 is the same as A6; A6=A6 )
    WS (Width Sensitive; A6 is not the same as A6; A6!=A6 )

  • BIN: This establishes the binary sort order.
  • EXAMPLES:
    ‘BIN’ (binary sort for Unicode Data) or
    ‘BIN2’ (binary code point comparison sort for Unicode Data)

    Example: SQL_Latin1_General_CP1_CI_AS

  • Prefix: SQL_ – SQL Collation
  • Sort Rules: Latin1_General – English
  • Code Page: CP1 – Code Page 1252
  • Case Sensitivity: Insensitive
  • Accent Sensitivity: Sensitive
    Example: SQL_Ukrainian_CP1251_CI_AS

  • Prefix: SQL_ – SQL Collation
  • Sort Rules: Ukrainian
  • Code Page: Code Page 1251
  • Case Sensitivity: Insensitive
  • Accent Sensitivity: Sensitive
    Example: SQL_Latin1_General_CP437_BIN

  • Prefix: SQL_ – SQL Collation
  • Sort Rules: Latin1_General – English
  • Code Page: CP1 – Code Page 437
  • BIN: binary sort for Unicode Data

Why is it important?

The most notable answer to this is language. SQL Server is used by a wide variety of people across the world. It is very important to configure SQL Server to accept commands and characters for the language and alphabet needed.

It is also important to set sensitivity levels at a database level correctly as SQL server will treat certain words as different or the same depending on the setting. If case sensitivity is set, ‘Test’ is not the same word as ‘test’. This will result in them being treated separately. If a table set to be case sensitive is queried for the word ‘Test’, only results with that casing would appear. This would also cause issues in queries if proper formatting isn’t followed. This applies to using accents as well if the query is accent sensitive. The following two queries would not be the same because the table names are capitalized differently.

SELECT * FROM Test
SELECT * FROM test

If the table were called “Test,” The second query would actually fail with the error “Invalid object name ‘test’

SQL Server Collation

If the database is set to case sensitivity is enabled, multiple tables with the same name can also be created without error. Because of this, it’s even more important to following naming schemas.

It is important to test any collation changes in a test environment before doing any of the previously stated steps. Minor misformatting such as one uncapitalized table name could cause processes that already exist to start failing. It could also cause select statements to return the wrong values.

Using Case Insensitivity:

SQL Server Collation
Using Case Sensitivity:
SQL Server Collation

How is it set?

Server Level

This is done during the installation. Consideration should be taken before changing this setting as it is difficult to alter once set. The options chosen here will determine the default database level collation as well.

SQL Server Collation

Database Level

A database can have its own collation separate from the server level. This setting overrides the server level setting. For any columns that currently exist and the collation is not set to “Database Default,“ the change at the database layer does not affect them but any new columns created will be created with the new setting at the database level unless otherwise specified.

The following might need to be dropped and recreated in order to change a database level collation.

  • A computed column
  • An index
  • Distribution statistics, either generated automatically or by the CREATE STATISTICS statement
  • A CHECK constraint
  • A FOREIGN KEY constraint

Create Database with Collation:
T-SQL:

CREATE DATABASE [Test] ON  PRIMARY 
( NAME = N'Test', FILENAME = N'D:\DATA\Test.mdf' )
 LOG ON 
( NAME = N'Test_log', FILENAME = N'L:\Logs\Test_log.ldf') 
COLLATE SQL_Latin1_General_CP1_CS_AS

SSMS:

  1. Right Click Databases and select ‘New Database.’
  2. SQL Server Collation

  3. Fill in settings as needed.
  4. Open ‘Options’ tab and select collation.
  5. SQL Server Collation

Alter Database Level Collation:

T-SQL:

USE [master]
GO
ALTER DATABASE [AdventureWorks2012] COLLATE SQL_Latin1_General_CP1_CS_AS
GO

SSMS:

Open properties by right clicking on properties. Once there, access the ‘Options’ tab.

SQL Server Collation

Column Level

Collations can be altered on a column level as well. This can be done during or after table creation. This setting overrides the server and table level collation.

The following might need to be dropped and recreated in order to change a column-level collation.

  • A computed column
  • An index
  • Distribution statistics, either generated automatically or by the CREATE STATISTICS statement
  • A CHECK constraint
  • A FOREIGN KEY constraint

T-SQL set on creation:

CREATE TABLE dbo.Test  
  (
	TestKey   int PRIMARY KEY,  
  	TestCol    varchar(10) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL  
  );  
GO  

T-SQL to alter after:

ALTER TABLE dbo.Test ALTER COLUMN TestCol  
            varchar(10)COLLATE SQL_Latin1_General_CI_AS NOT NULL;  
GO  


SSMS:

  1. Locate the table you would like to alter and expand the table and columns.
  2. Right-click any column and select ‘Modify’.
  3. SQL Server Collation

  4. Locate Collation under Table Designer and click on the ellipsis in that column.
  5. SQL Server Collation

  6. Select the collation needed.

Temporary Collation T-SQL

Column Level Collation can be temporarily set within a query in the where or order by clause. This is useful when collation specific data is needed. The following example of its use in queries.

--Create Test Table
DECLARE @Test TABLE
(
testChar nvarchar(10)
)

INSERT @Test VALUES ('Test'),('test'),('TEST'),('TeSt')

--Basic Select
SELECT testChar
FROM @Test

--Capital First letter only
SELECT testChar
FROM @Test
where testChar COLLATE SQL_Latin1_General_CP1_CS_AS = 'Test'

--All lower Case
SELECT testChar
FROM @Test
where testChar COLLATE SQL_Latin1_General_CP1_CS_AS = 'test'

--ORDER BY Case
SELECT testChar
FROM @Test
ORDER BY testChar COLLATE SQL_Latin1_General_CP1_CS_AS

Final Thoughts

It is recommended to plan out collation settings ahead to time to prevent the need for changes later on. If changes are needed, this alters how SQL Server reads characters. Because of this, it is important to test these changes before adding them to a production environment. It could result in process failures and incorrect search results.

It is also helpful to plan out processes ahead of time to work on instances that have strict collations set just in case collation changes are needed in the future. This will prevent errors from happening, and it will make the switch go a lot smoother.