Can SQL Server SQL_Latin1_General_CP1_CI_AS be safely converted to Latin1_General_CI_AS?
There is more info on this MSDN forum:
http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/
Which states:
You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.
Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent-
sensitive, kanatype-insensitive, width-insensitiveSQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive,
accent-sensitive, kanatype-insensitive, width-insensitive for Unicode
Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9
Cannot resolve the collation conflict between SQL_Latin1_General_CP1_CI_AS and Latin1_General_CI_AS in the equal to operation
You have a mismatch of two different collations in your table. You can check what collations each column in your table(s) has by using this query:
SELECT
col.name, col.collation_name
FROM
sys.columns col
WHERE
object_id = OBJECT_ID('YourTableName')
Collations are needed and used when ordering and comparing strings. It's generally a good idea to have a single, unique collation used throughout your database - don't use different collations within a single table or database - you're only asking for trouble....
Once you've settled for one single collation, you can change those tables / columns that don't match yet using this command:
ALTER TABLE YourTableName
ALTER COLUMN OffendingColumn
VARCHAR(100) COLLATE Latin1_General_CI_AS NOT NULL
Marc
UPDATE: to find the fulltext indices in your database, use this query here:
SELECT
fti.object_Id,
OBJECT_NAME(fti.object_id) 'Fulltext index',
fti.is_enabled,
i.name 'Index name',
OBJECT_NAME(i.object_id) 'Table name'
FROM
sys.fulltext_indexes fti
INNER JOIN
sys.indexes i ON fti.unique_index_id = i.index_id
You can then drop the fulltext index using:
DROP FULLTEXT INDEX ON (tablename)
Equivalent of COLLATE SQL_Latin1_General_CP1_CI_AS in Teradata
Teradata supports only two basic character set LATIN and UNICODE, in your case company varchar(6) LATIN
probably matches.
Sorting rules can be changed on session level (most end users don't know about this and simply use the default assigned to them by the DBA).
Case-insensitiv comparison is the default for Teradata-mode sessions (most systems run in this mode).
SQL_Latin1_General_CP1_CS_AS vs Latin1_General_CS_AS
This question and answers provides details on these collations and how they relate:
The only real difference is the sort order for some characters differs between the two collations. Character representation does not appear to differ.
The only thing I would add is that SQL_Latin1_General_CP1_CS_AS is the default collation for the English (United States) locale, and Latin1_General_CS_AS is the default collation for any other English locale, such as English (United Kingdom).
How can I convert SQL_Latin1_ General Collate Using an SSIS Package?
Your error message looks like your UNION is in a single SQL statement?
e.g SELECT ... UNION ... SELECT
As Shiva described, within your SSIS Data Flow, you can use two (or more) OLE DB Source objects and combine them with a "Union All" transformation. Each OLE DB Source object will have just a single SELECT with no UNION clause. This design also has advantages in design/maintenance (unions on column names, no need to fill every column from every source) and runtime throughput.
I think you also need to use this technique on the OLE DB Source objects.
http://blog.stevienova.com/2009/04/16/ssis-pulling-data-from-a-non-default-collation-db-to-a-default-collation-db/
SQL Server COLLATION ISSUE
Your tempdb
and database have different collation settings. This often happens when servers are upgraded from a version using the older collation defaults.
You could create the temp table with the SQL_Latin1_General_CP1_CI_AS
collation.
There are few differences between SQL_Latin1_General_CP1_CI_AS
and Latin1_General_CI_AS
. They both use the same Locale (1033) and the same Code Page (1252).
The SQL_
Collations have been deprecated for some time and should be avoided if possible.
Issues with missing collate in SQL Server
Working across databases is a common source of collation issues. For example, suppose I have two databases, DB1
and DB2
. DB1
's default collation is Latin1_General_CI_AS
, and DB2
's is SQL_Latin1_General_CP1_CI_AS
.
If I do the following I will get a collation conflict:
use DB1;
go
create or alter function dbo.foo() returns varchar(10) as begin
return 'foo';
end
go
use DB2
go
declare @bar varchar(10) = 'bar' + DB1.dbo.foo();
The error is:
Implicit conversion of varchar value to varchar cannot be performed because the collation of the value is unresolved due to a collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in add operator.
To resolve this error, I can force the collation I want it to use for my expression:
set @bar = 'bar' + DB1.dbo.foo() collate SQL_Latin1_General_CP1_CI_AS;
The way explicitly setting collation works in expressions can be unintuitive. For example, this would also have worked:
declare @bar varchar(10) = 'bar' collate sql_latin1_general_cp1_ci_as + DB1.dbo.foo();
That's weird, I'm in DB2
, and DB2
already has a collation of sql_latin1_general_cp1_ci_as
, so why would setting the literal 'bar'
to that make any difference? Well, because I'm being explicit.
In your code you are doing the same kind of thing as I am in my sample problem. You are declaring the variable @SQL_STR
and assigning a value to it. You then try to append the result of ETL1.dbo.BI_GetSybaseIQDateExtent
to that variable. That function is in a different database. It is highly likely that the database you are in, and database ETL1, have different collations.
To resolve the problem, set an explicit collation on your string concatenation, instead of inside the strings being concatenated (which is what you are doing now). To make it easier to read you can do it right up front:
SET @SQL_STR = '' collate latin1_general_ci_as + '
SELECT
A_REC,
B_REC,
CALL_SOURCE,
-- ...
Choosing a binary collation that can differentiate between 'ss' and 'ß' for nvarchar column in Sql Server
A few things about Collations:
The
SQL_
Collations were deprecated as of SQL Server 2000 (yes, 2000). If you can avoid using them, you should (but that doesn't mean go changing a bunch of things if there is no pressing need to!).The issue with the
SQL_
Collations is really only related toVARCHAR
(i.e. non-Unicode) data asNVARCHAR
(i.e. Unicode) data uses the rules from the OS. But the rules for sorting and comparison forVARCHAR
data, unfortunately, use a simple mapping and do not include the more complex linguistic rules. This is whyss
andß
do not equate when stored asVARCHAR
using the sameSQL_Latin1_General_CP1_CI_AS
Collation. These deprecated Collations also are not able to give a lower weight to dashes when used in the middle of a word. The non-SQL_
Collations (i.e. Windows Collations) use the same rules for bothVARCHAR
andNVARCHAR
so theVARCHAR
handling is more robust, more consistent withNVARCHAR
.The
_BIN
Collations were deprecated as of SQL Server 2005. If you can avoid using them, you should (but that doesn't mean go changing a bunch of things if there is no pressing need to!).The issue with the
_BIN
Collations is rather subtle as it only affects sorting. Comparisons are the same between_BIN
and_BIN2
Collations due to them being compared at the byte level (hence no linguistic rules). BUT, due to SQL Server (and Windows / PCs) being Little Endian, entities are stored in reverse byte order. This becomes apparent when dealing with double-byte "characters", which is whatNVARCHAR
data is: UTF-16 Little Endian. This means that Unicode Code Point U+1216 has a hex/binary representation of 0x1216 on Big Endian systems, but is stored as 0x1612 on Little Endian systems. To come full circle so that the importance of this last point will (hopefully) become obvious: the_BIN
Collations will compare byte by byte (after the first character) and hence see U+1216 as being 0x16 and then 0x12, while the_BIN2
Collations will compare code point by code point and hence see U+1216 as being 0x12 and then 0x16.This particular column is
NVARCHAR
(aVARCHAR
column usingSQL_Latin1_General_CP1_CI_AS
would not equatess
andß
) and so for just this column alone, there is no difference betweenSQL_Latin1_General_CP437_BIN2
andSQL_Latin1_General_CP850_BIN2
due to Unicode being a single, all-inclusive character set.For
VARCHAR
data, there would be a difference since they are different code pages (437 and 850), and both of those are different than the one that you are using now (CP1
== code page 1252).While using a binary Collation is often overkill, in this case it might be necessary given that there is only one locale / culture that does not equate
ß
withss
: Hungarian. Using a Hungarian Collation might have some linguistic rules that you don't want (or at least wouldn't expect), so the binary Collation seems to be the better choice here (just not any of the 4 you are asking about :-). Just keep in mind that by using a binary Collation, not only are you giving up all linguistic rules, but you also lose the ability to equate different versions of the same character, such asA
(Latin Capital Letter A U+0041) andA
(Fullwidth Latin Capital Letter A U+FF21).Use the following query to see what Collations are non-binary and do not equate these characters:
DECLARE @SQL NVARCHAR(MAX) = N'DECLARE @Counter INT = 1;';
SELECT @SQL += REPLACE(N'
IF(N''ß'' COLLATE {Name} <> N''ss'' COLLATE {Name})
BEGIN
RAISERROR(N''%4d. {Name}'', 10, 1, @Counter) WITH NOWAIT;
SET @Counter += 1;
END;
', N'{Name}', col.[name]) + NCHAR(13) + NCHAR(10)
FROM sys.fn_helpcollations() col
WHERE col.[name] NOT LIKE N'SQL[_]%'
AND col.[name] NOT LIKE N'%[_]BIN%'
ORDER BY col.[name]
--PRINT @SQL;
EXEC (@SQL);
So:
- If you are going to use a binary Collation, use something like
Latin1_General_100_BIN2
. - You do not need to change the Collation of the entire DB and all of its tables. That is a lot of work, and the only "built-in" mechanism to do it is undocumented (i.e. unsupported).
- If you were to change the Database's default Collation, that affects name resolution of Database-scoped items such as tables, columns, indexes, functions, stored procedures, etc. Meaning: you would need to regress 100% of the application that touches the database, as well as all SQL Server Agent jobs, etc. that touch this database.
If most / all of the queries that use this column need
ß
withss
to be seen as different, then go ahead and alter the column to useLatin1_General_100_BIN2
. This will likely require dropping the following dependent objects and then recreating after theALTER TABLE
:- Indexes
- Unique Constraints
- Foreign Key Constraints
HINT: Be sure to check the current NULL / NOT NULL setting of the column and specify that in the
ALTER TABLE ... ALTER COLUMN ...
statement so that it does not get changed.- If only some queries need this different behavior, then override just those comparison operations with the
COLLATE
clause, on a per-condition basis (e.g.WHERE tab.[ThisColumn] LIKE N'%ss%' COLLATE Latin1_General_100_BIN2
). TheCOLLATE
keyword should only be needed on one side (of the operator) as Collation Precedence will apply it to the other side.
For more info on working with strings and collations, please visit: Collations Info
Related Topics
Recursive Query Used for Transitive Closure
In SQL Is There a Difference Between Count(*) and Count(<Fieldname>)
Pros and Cons of Autoincrement Keys on "Every Table"
Are Determinants and Candidate Keys Same or Different Things
What Column Should the Clustered Index Be Put On
Remove Trailing Empty Space in a Field Content
Checking for Time Range Overlap, the Watchman Problem [Sql]
SQL 2005 Split Comma Separated Column on Delimiter
Execute Query on SQL Server Analysis Services with Ironpython
Can There Be Constraints with the Same Name in a Db
How to Edit Values of an Insert in a Trigger on SQL Server
How to Create an Alias of Database in SQL Server
What Is the Purpose of Putting an 'N' in Front of Function Parameters in Tsql
SQL Server Sub Query with a Comma Separated Resultset
Conditional Where Clause with Case Statement in Oracle
Difference Between N'String' VS U'String' Literals in Oracle
Oracle - Best Select Statement for Getting the Difference in Minutes Between Two Datetime Columns