Can SQL Server SQL_Latin1_General_Cp1_Ci_As Be Safely Converted to Latin1_General_Ci_As

Can SQL Server SQL_Latin1_General_CP1_CI_AS be safely converted to Latin1_General_CI_AS?

There is more info on this MSDN forum:

http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/

Which states:

You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.

Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent-
sensitive, kanatype-insensitive, width-insensitive

SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive,
accent-sensitive, kanatype-insensitive, width-insensitive for Unicode
Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9

Cannot resolve the collation conflict between SQL_Latin1_General_CP1_CI_AS and Latin1_General_CI_AS in the equal to operation

You have a mismatch of two different collations in your table. You can check what collations each column in your table(s) has by using this query:

SELECT
col.name, col.collation_name
FROM
sys.columns col
WHERE
object_id = OBJECT_ID('YourTableName')

Collations are needed and used when ordering and comparing strings. It's generally a good idea to have a single, unique collation used throughout your database - don't use different collations within a single table or database - you're only asking for trouble....

Once you've settled for one single collation, you can change those tables / columns that don't match yet using this command:

ALTER TABLE YourTableName
ALTER COLUMN OffendingColumn
VARCHAR(100) COLLATE Latin1_General_CI_AS NOT NULL

Marc

UPDATE: to find the fulltext indices in your database, use this query here:

SELECT
fti.object_Id,
OBJECT_NAME(fti.object_id) 'Fulltext index',
fti.is_enabled,
i.name 'Index name',
OBJECT_NAME(i.object_id) 'Table name'
FROM
sys.fulltext_indexes fti
INNER JOIN
sys.indexes i ON fti.unique_index_id = i.index_id

You can then drop the fulltext index using:

DROP FULLTEXT INDEX ON (tablename)

Equivalent of COLLATE SQL_Latin1_General_CP1_CI_AS in Teradata

Teradata supports only two basic character set LATIN and UNICODE, in your case company varchar(6) LATIN probably matches.

Sorting rules can be changed on session level (most end users don't know about this and simply use the default assigned to them by the DBA).

Case-insensitiv comparison is the default for Teradata-mode sessions (most systems run in this mode).

SQL_Latin1_General_CP1_CS_AS vs Latin1_General_CS_AS

This question and answers provides details on these collations and how they relate:

The only real difference is the sort order for some characters differs between the two collations. Character representation does not appear to differ.

The only thing I would add is that SQL_Latin1_General_CP1_CS_AS is the default collation for the English (United States) locale, and Latin1_General_CS_AS is the default collation for any other English locale, such as English (United Kingdom).

How can I convert SQL_Latin1_ General Collate Using an SSIS Package?

Your error message looks like your UNION is in a single SQL statement?
e.g SELECT ... UNION ... SELECT

As Shiva described, within your SSIS Data Flow, you can use two (or more) OLE DB Source objects and combine them with a "Union All" transformation. Each OLE DB Source object will have just a single SELECT with no UNION clause. This design also has advantages in design/maintenance (unions on column names, no need to fill every column from every source) and runtime throughput.

I think you also need to use this technique on the OLE DB Source objects.
http://blog.stevienova.com/2009/04/16/ssis-pulling-data-from-a-non-default-collation-db-to-a-default-collation-db/

SQL Server COLLATION ISSUE

Your tempdb and database have different collation settings. This often happens when servers are upgraded from a version using the older collation defaults.

You could create the temp table with the SQL_Latin1_General_CP1_CI_AS collation.

There are few differences between SQL_Latin1_General_CP1_CI_AS and Latin1_General_CI_AS. They both use the same Locale (1033) and the same Code Page (1252).

The SQL_ Collations have been deprecated for some time and should be avoided if possible.

Issues with missing collate in SQL Server

Working across databases is a common source of collation issues. For example, suppose I have two databases, DB1 and DB2. DB1's default collation is Latin1_General_CI_AS, and DB2's is SQL_Latin1_General_CP1_CI_AS.

If I do the following I will get a collation conflict:

use DB1;
go

create or alter function dbo.foo() returns varchar(10) as begin
return 'foo';
end
go

use DB2
go

declare @bar varchar(10) = 'bar' + DB1.dbo.foo();

The error is:

Implicit conversion of varchar value to varchar cannot be performed because the collation of the value is unresolved due to a collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in add operator.

To resolve this error, I can force the collation I want it to use for my expression:

set @bar = 'bar' + DB1.dbo.foo() collate SQL_Latin1_General_CP1_CI_AS;

The way explicitly setting collation works in expressions can be unintuitive. For example, this would also have worked:

declare @bar varchar(10) = 'bar' collate sql_latin1_general_cp1_ci_as  + DB1.dbo.foo();

That's weird, I'm in DB2, and DB2 already has a collation of sql_latin1_general_cp1_ci_as, so why would setting the literal 'bar' to that make any difference? Well, because I'm being explicit.

In your code you are doing the same kind of thing as I am in my sample problem. You are declaring the variable @SQL_STR and assigning a value to it. You then try to append the result of ETL1.dbo.BI_GetSybaseIQDateExtent to that variable. That function is in a different database. It is highly likely that the database you are in, and database ETL1, have different collations.

To resolve the problem, set an explicit collation on your string concatenation, instead of inside the strings being concatenated (which is what you are doing now). To make it easier to read you can do it right up front:

SET @SQL_STR = '' collate latin1_general_ci_as + '
SELECT
A_REC,
B_REC,
CALL_SOURCE,
-- ...

Choosing a binary collation that can differentiate between 'ss' and 'ß' for nvarchar column in Sql Server

A few things about Collations:

  1. The SQL_ Collations were deprecated as of SQL Server 2000 (yes, 2000). If you can avoid using them, you should (but that doesn't mean go changing a bunch of things if there is no pressing need to!).

    The issue with the SQL_ Collations is really only related to VARCHAR (i.e. non-Unicode) data as NVARCHAR (i.e. Unicode) data uses the rules from the OS. But the rules for sorting and comparison for VARCHAR data, unfortunately, use a simple mapping and do not include the more complex linguistic rules. This is why ss and ß do not equate when stored as VARCHAR using the same SQL_Latin1_General_CP1_CI_AS Collation. These deprecated Collations also are not able to give a lower weight to dashes when used in the middle of a word. The non-SQL_ Collations (i.e. Windows Collations) use the same rules for both VARCHAR and NVARCHAR so the VARCHAR handling is more robust, more consistent with NVARCHAR.

  2. The _BIN Collations were deprecated as of SQL Server 2005. If you can avoid using them, you should (but that doesn't mean go changing a bunch of things if there is no pressing need to!).

    The issue with the _BIN Collations is rather subtle as it only affects sorting. Comparisons are the same between _BIN and _BIN2 Collations due to them being compared at the byte level (hence no linguistic rules). BUT, due to SQL Server (and Windows / PCs) being Little Endian, entities are stored in reverse byte order. This becomes apparent when dealing with double-byte "characters", which is what NVARCHAR data is: UTF-16 Little Endian. This means that Unicode Code Point U+1216 has a hex/binary representation of 0x1216 on Big Endian systems, but is stored as 0x1612 on Little Endian systems. To come full circle so that the importance of this last point will (hopefully) become obvious: the _BIN Collations will compare byte by byte (after the first character) and hence see U+1216 as being 0x16 and then 0x12, while the _BIN2 Collations will compare code point by code point and hence see U+1216 as being 0x12 and then 0x16.

  3. This particular column is NVARCHAR (a VARCHAR column using SQL_Latin1_General_CP1_CI_AS would not equate ss and ß) and so for just this column alone, there is no difference between SQL_Latin1_General_CP437_BIN2 and SQL_Latin1_General_CP850_BIN2 due to Unicode being a single, all-inclusive character set.

  4. For VARCHAR data, there would be a difference since they are different code pages (437 and 850), and both of those are different than the one that you are using now (CP1 == code page 1252).

  5. While using a binary Collation is often overkill, in this case it might be necessary given that there is only one locale / culture that does not equate ß with ss: Hungarian. Using a Hungarian Collation might have some linguistic rules that you don't want (or at least wouldn't expect), so the binary Collation seems to be the better choice here (just not any of the 4 you are asking about :-). Just keep in mind that by using a binary Collation, not only are you giving up all linguistic rules, but you also lose the ability to equate different versions of the same character, such as A (Latin Capital Letter A U+0041) and (Fullwidth Latin Capital Letter A U+FF21).

    Use the following query to see what Collations are non-binary and do not equate these characters:

    DECLARE @SQL NVARCHAR(MAX) = N'DECLARE @Counter INT = 1;';

    SELECT @SQL += REPLACE(N'
    IF(N''ß'' COLLATE {Name} <> N''ss'' COLLATE {Name})
    BEGIN
    RAISERROR(N''%4d. {Name}'', 10, 1, @Counter) WITH NOWAIT;
    SET @Counter += 1;
    END;
    ', N'{Name}', col.[name]) + NCHAR(13) + NCHAR(10)
    FROM sys.fn_helpcollations() col
    WHERE col.[name] NOT LIKE N'SQL[_]%'
    AND col.[name] NOT LIKE N'%[_]BIN%'
    ORDER BY col.[name]

    --PRINT @SQL;
    EXEC (@SQL);

So:

  • If you are going to use a binary Collation, use something like Latin1_General_100_BIN2.
  • You do not need to change the Collation of the entire DB and all of its tables. That is a lot of work, and the only "built-in" mechanism to do it is undocumented (i.e. unsupported).
  • If you were to change the Database's default Collation, that affects name resolution of Database-scoped items such as tables, columns, indexes, functions, stored procedures, etc. Meaning: you would need to regress 100% of the application that touches the database, as well as all SQL Server Agent jobs, etc. that touch this database.
  • If most / all of the queries that use this column need ß with ss to be seen as different, then go ahead and alter the column to use Latin1_General_100_BIN2. This will likely require dropping the following dependent objects and then recreating after the ALTER TABLE:

    • Indexes
    • Unique Constraints
    • Foreign Key Constraints

    HINT: Be sure to check the current NULL / NOT NULL setting of the column and specify that in the ALTER TABLE ... ALTER COLUMN ... statement so that it does not get changed.

  • If only some queries need this different behavior, then override just those comparison operations with the COLLATE clause, on a per-condition basis (e.g. WHERE tab.[ThisColumn] LIKE N'%ss%' COLLATE Latin1_General_100_BIN2). The COLLATE keyword should only be needed on one side (of the operator) as Collation Precedence will apply it to the other side.

For more info on working with strings and collations, please visit: Collations Info



Related Topics



Leave a reply



Submit