Generally, Are String (Or Varchar) Fields Used as Join Fields

Generally, are string (or varchar) fields used as join fields?

It's certainly possible to use a varchar as a key field (or simply something to join on). The main problems with it are based on what you normally store in a varchar field; mutable data. Strictly speaking, it's not advisable to have key fields change. A person's name, telephone number, even their SSN can all change. However, the employee with internal ID 3 will always be ID 3, even if there are two John Smiths.

Second, string comparison is dependent on a number of nit-picky details, such as culture, collation, whitespace translation, etc. that can break a join for no immediately-apparent reason. Say you use a tabspace character \t for a certain string you're joining on. Later, you change your software to replace \t with 3 spaces to reduce character escapes in your raw strings. You have now broken any functionality requiring a string with escaped tabs to be matched to an identical-looking, but differently-composed, string.

Lastly, even given two perfectly identical strings, there is a slight performance benefit to comparing two integer numbers than comparing two strings. Integer comparison is effectively constant-time. String comparison is linear at best, based on the length of the string.

SQL Server 2008 R2 Inner Join Fails to match varchar fields because it contains special characters

If you need to JOIN on two fields using LIKE that contain [, you'll need to escape the opening bracket since the bracket is a wildcard character. For any wildcard characters, you can use brackets [] - this will search for any single character within the specified range.

For example, this should work in your scenario using REPLACE:

select * 
from yourtable t inner join
yourtable2 t2 on t.field like replace(t2.field,'[', '[[]')
  • SQL Fiddle Demo

Alternatively, for your sample data, you have no reason to use LIKE as = would work the same.

Join two tables in mssql with column int and varchar having comma seperated values

Here is a general solution which should work on any standard database:

SELECT DISTINCT
t1.ID
FROM tblemployee t1
INNER JOIN tbllead t2
ON ',' + t2.leademployees + ',' LIKE '%,' + CONVERT(varchar(10), t1.ID) + ',%'
ORDER BY
t1.ID;

This query uses a trick I first saw being used by @CL , which can best be explained by showing some data. The first row of tbllead has this data:

1,2,6,7

The comparison for the inner numbers and outer numbers is different, due to the presence/abseence of one of the commas on either side. But if we concantenate commas around this string, we get the following:

,1,2,6,7,

Now, we can just compare every ID from the tblemployee table, surrounded by commas, against the list of employees on every record. That is, compare ,1, ,2, etc.

Here is a demo for SQL Server:

Demo

If you are using MySQL, the above query would change slightly. Also, there would be an even tighter option in MySQL using FIND_IN_SET. But the answer I gave is more useful in my view, because it can easily be applied regardless of the database.

Join based on part of a varchar field matching another field

Joining like that really means that your Transactions table structure is not good (there should be a separate column for a field you could use to join to Transaction_details).

Having said that, you can still achieve what you want. You can join when the description starts with BZ# and then join on what follows, taking the length of id to extract the number. This is better than trying to find a space in the description as there may not be any spaces.

Left join will do the rest and fill the fields from tran_details with nulls if there was no match.

select * 
from
transactions t
left join tran_details d
on t.person_id = d.person_id
and left(t.description, 3) = 'BZ#'
and substring(t.description, 4, len(cast(d.id as nvarchar(50)))) = d.id

I assumed that you want to join if BZ# is followed by the id from the detail table.

SQL Fiddle

SQL performance for string field vs multiple int/varchar fields

The primary question is whether you can run faster than scanning the entire table. The answer is "no" unless a small number of the booleans can be handled separately with Index(es).

Your WHERE bools LIKE '%a%c%d%' is a clever trick for ANDing any number of flags together. However, it will need to look at every row, and LIKE is slightly heavyweight.

INT(1) takes 4 bytes plus overhead. TINYINT is what you are fishing for; it takes 1 byte, plus overhead.

A SET with up to 64 bools is another technique. The coding is a bit clumsy, but it is rather efficient

INT UNSIGNED (for up to 32) or BIGINT UNSIGNED (for up to 64) flags is implemented similarly to SET and also takes up to 8 bytes. But the coding is rather clumsy. Let's number the bits starting with 0 in the least significant bit.

WHERE (bools & ( (1 << 0) | (1 << 2) | (1 << 3) ) ) = 
( (1 << 0) | (1 << 2) | (1 << 3) )

would check that bits 0, 2, and 3 are all set. (This is like your test for a,c,d.) A variety of ANDs and ORs are possible with this approach. (You could pre-compute those bit values-- 13 in this example. Or use a bit literal: 0b1101.)

The benefit of SET or bits in an INT is the 'speed' within each row. Still, all rows must be tested.

So, I recommend triaging your bools, etc, and decide what needs indexing and what can go into this combined column or in a combined JSON column for non-bools.

SQL Server - Better Way To Join Between VARCHAR and INT

I think you are on the right track. You just need the right expression for like:

SELECT POID, SKU,
(SELECT SUM(ShipQuantity) AS ShipQuantity
FROM tblVendorShippingInfo vsi
WHERE substring(vsi.PONum, 2, 6) = Convert(varchar(10), pod.POID) AND
vSKU = pod.SKU
) as QtyCount
FROM tblPODetail pod
WHERE PONum like '[A-Za-z][0-9][0-9][0-9][0-9][0-9][0-9]';

If you want to index this as an integer, then use a computed column:

alter table tblVendorShippingInfo add column POID_num as try_convert(int, substring(PONum, 2, 6);

create index idx_tblVendorShippingInfo_POID on tblVendorShippingInfo(POID_num);

try_convert() assumes SQL Server 2012+. You can do something similar with earlier versions of SQL Server.

Left join on 2 varchar fields not working

Try to execute the following query on both tables. This will tell you if there's any "hidden" difference between the tables (for example, blank characters, line breaks, etc.):

select Country, CAST(Country AS VARBINARY) AS BinaryCountry
from [ACC].[dbo].[Customer]
where Country = 'United Kingdom'

select Country, CAST(Country AS VARBINARY) AS BinaryCountry
from [Task Centre].[dbo].[CountryCodes]
where Country = 'United Kingdom'

The column BinaryCountry should show a different value, if the content of the Country-columns are not exactly the same. If that is the case, consider correcting the error in either table. Once you've made sure that the value is the same in both tables, your join should work just fine.

Edit: The problem turns out to be a non-breaking space character in the Task Centre-table. To workaround this, use the following in your join criteria:

ON A.Country = Replace(B.Country, CHAR(0xA0), ' ')

SQL: Join two tables with different type of columns

I would suggest normalizing the schema as @Sodmond suggested. However, if this is not an option, you could use find_in_set for the join condition - it will implicitly convert the int from table2 to a character:

SELECT t1.OrderID, RetailerName, OrderDate, Product
FROM table1 t1
JOIN table2 t2 ON FIND_IN_SET(t2.sku, t1.skus) > 0


Related Topics



Leave a reply



Submit