Comparing Comma Separated Values from Two Columns of Two Different Tables

Comparing comma separated values from two columns of two different tables

You could get the table(s) into first normal form and then compare the compounds that are stored in each row. A starting point could be:

{1} Tokenize each row, and write the tokens into a new table. Give each token its original ID plus a 3-letter prefix, indicating which table the token came from.
{2} Group the rows of the new ("normalized") table by ID, and perform a LISTAGG(). Perform a self join, and find matching "token groups".

{1} Tokenize, create table as select (CTAS)

create table tokens
as
select
ltrim( -- ltrim() and rtrim() remove leading/trailing spaces (blanks)
rtrim(
substr( N.wrapped
, instr( N.wrapped, ',', 1, T.pos ) + 1
, ( instr( N.wrapped, ',', 1, T.pos + 1 ) - instr( N.wrapped, ',', 1, T.pos ) ) - 1
)
)
) token
, N.id
from (
select ',' || name1 || ',' as wrapped, 'T1_' || to_char( id_t1 ) as id from t1 -- names wrapped in commas, (table)_id
union all
select ',' || name2 || ',' , 'T2_' || to_char( id_t2 ) from t2
) N join (
select level as pos -- (max) possible position of char in an existing token
from dual
connect by level <= (
select greatest( -- find the longest string ie max position (query T1 and T2)
( select max( length( name1 ) ) from t1 )
, ( select max( length( name2 ) ) from t2 )
) as pos
from dual
)
) T
on T.pos <= ( length( N.wrapped ) - length( replace( N.wrapped, ',') ) ) - 1
;

The inspiration to tokenize without using CONNECT BY came from this SO answer.

The contents of the TOKENS table will look something like this:

SQL> select * from tokens ;
TOKEN ID
ASCORBIC ACID T1_1
SODIUM HYDROGEN CARBONATE T1_2
CAFFEINE T1_3
PSEUDOEPHEDRINE HYDROCHLORIDE T1_4
PARACETAMOL T1_100
sodium hydroxide T1_110
POTASSIUM HYDROGEN CARBONATE T2_4
SODIUM HYDROGEN CARBONATE T2_5
PARACETAMOL PH. EUR. T2_6
CODEINE PHOSPHATE T2_7
DEXCHLORPHENIRAMINE MALEATE T2_8
DEXCHLORPHENIRAMINE MALEATE T2_10
PARACETAMOL T2_200
...

{2} GROUP BY, LISTAGG, self join

select
S1.id id1
, S2.id id2
, S1.tokengroup_T1
, S2.tokengroup_T2
from
(
select substr( id, 4, length( id ) - 3 ) id
, listagg( token, ' + ' ) within group ( order by token ) tokengroup_T1
from tokens
group by id
having substr( id, 1, 3 ) = 'T1_'
) S1
join
(
select substr( id, 4, length( id ) - 3 ) id
, listagg( token, ' + ' ) within group ( order by token ) tokengroup_T2
from tokens
group by id
having substr( id, 1, 3 ) = 'T2_'
) S2
on S1.tokengroup_T1 = S2.tokengroup_T2
;

-- result
ID1 ID2 TOKENGROUP_T1 TOKENGROUP_T2
4 10 DEXCHLORPHENIRAMINE MALEATE + PSEUDOEPHEDRINE HYDROCHLORIDE DEXCHLORPHENIRAMINE MALEATE + PSEUDOEPHEDRINE HYDROCHLORIDE
110 210 potassium carbonate + sodium hydroxide potassium carbonate + sodium hydroxide
1 4 ASCORBIC ACID + PARACETAMOL + POTASSIUM HYDROGEN CARBONATE ASCORBIC ACID + PARACETAMOL + POTASSIUM HYDROGEN CARBONATE
3 6 CAFFEINE + PARACETAMOL PH. EUR. CAFFEINE + PARACETAMOL PH. EUR.

When doing things this way, you can get the substances into (alphabetical) order, and you can also pick a "delimiter" that you like (we have used '+') here.

ALTERNATIVE

If all that is no use to you, or you think this is too complicated, then you could try using TRANSLATE(). In this case, I'd recommend stripping all spaces/blanks from your dataset (in a query - not altering the original data!) like so:

Query

select 
id1, id2
, name1, name2
from (
select
id_t1 id1
, id_t2 id2
, T1.name1 name1
, T2.name2 name2
from T1
join T2
on translate( replace( T1.name1, ' ', '' ), replace( T2.name2, ' ', '' ), '!' )
= translate( replace( T2.name2, ' ', '' ), replace( T1.name1, ' ', '' ), '!' )
) ;

Result

  ID1   ID2 NAME1                                                                NAME2                                                        
2 5 SODIUM HYDROGEN CARBONATE, SODIUM CARBONATE ANHYDROUS, CITRIC ACID SODIUM HYDROGEN CARBONATE, SODIUM CARBONATE ANHYDROUS
3 6 CAFFEINE, PARACETAMOL PH. EUR. PARACETAMOL PH. EUR.,CAFFEINE
100 10 PARACETAMOL, DEXTROMETHORPHAN, PSEUDOEPHEDRINE, PYRILAMINE DEXCHLORPHENIRAMINE MALEATE, PSEUDOEPHEDRINE HYDROCHLORIDE
110 210 sodium hydroxide, potassium carbonate sodium hydroxide, potassium carbonate

NOTE:
I've added the following rows to your sample data:

-- T1
110, 'sodium hydroxide, potassium carbonate'

-- T2
210, 'sodium hydroxide, potassium carbonate'
211, 'potassium hydroxide, sodium carbonate'

I found that it is easy to use TRANSLATE() in a way that gives you "false positives" ie the substances with ids 110, 210, and 211 will appear to "match". (In other words: I don't think this is the right tool for this job.)

DBFIDDLE here

(follow the link to see the sample tables and queries).

compare multiple columns of 2 tables which some columns having comma separated values in oracle

Try something like this (may need tweaking, only tested with your data set):

select result_
from (
select
case when
( AB1 = 'true' and AB2 = 'true' and AB3 = 'true' )
or
( BA1 = 'true' and BA2 = 'true' and BA3 = 'true' )
then mfpartno || '=' || cbn
end result_
from (
select
A.mfpartno
, B.cbn
, case when instr( A.cf1, B.cf1 ) > 0 or B.cf1 is null then 'true' else 'false' end AB1
, case when instr( A.cf2, B.cf2 ) > 0 or B.cf2 is null then 'true' else 'false' end AB2
, case when instr( A.cf3, B.cf3 ) > 0 or B.cf3 is null then 'true' else 'false' end AB3
, case when instr( B.cf1, A.cf1 ) > 0 or A.cf1 is null then 'true' else 'false' end BA1
, case when instr( B.cf2, A.cf2 ) > 0 or A.cf2 is null then 'true' else 'false' end BA2
, case when instr( B.cf3, A.cf3 ) > 0 or A.cf3 is null then 'true' else 'false' end BA3
from tablea A cross join tableb B
)
)
where result_ is not null
;

Result

        RESULT_ 
_______________
M1001=cbn100
M1002=cbn103

Explanation

The inner query uses a CROSS JOIN, which allows us to compare all possible combinations of rows. INSTR() is used for detecting whether a string of tableA contains a string from tableB (and vice versa). There is also a condition for treating NULLs as per your requirements. If we get 3 'true' results, then a "match" has been found.
DBfiddle here.

How to compare two comma-separated value in a column, and return only matching values in SQL

You can try the following solution.

It is using XQuery and its FLWOR expression.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(MAX), col2 VARCHAR(MAX));
INSERT INTO @tbl (col1, col2) VALUES
('John, james, steve', 'john, smith, will, james'),
('Mary, Lisa', 'Mary');
-- DDL and sample data population, end

DECLARE @separator CHAR(2) = ', ';

SELECT *
, REPLACE(TRY_CAST('<root>' +
'<source><r><![CDATA[' + REPLACE(col1, @separator, ']]></r><r><![CDATA[') +
']]></r></source>' +
'<target><r><![CDATA[' + REPLACE(col2, @separator, ']]></r><r><![CDATA[') +
']]></r></target>' +
'</root>' AS XML)
.query('
for $x in /root/source/r/text(),
$y in /root/target/r/text()
where lower-case($y) eq lower-case($x)
return data($x)
').value('.', 'NVARCHAR(MAX)'), SPACE(1), ',') AS result
FROM @tbl;

Output

+----+--------------------+--------------------------+------------+
| ID | col1 | col2 | result |
+----+--------------------+--------------------------+------------+
| 1 | John, james, steve | john, smith, will, james | John,james |
| 2 | Mary, Lisa | Mary | Mary |
+----+--------------------+--------------------------+------------+

Compare 2 columns value in a table and show the matching values in a new column . column values are comma seperated

Ideally, you should store each inidividual piece of information in separate row, so you should have two separate tables roles and userroles which are foreign-keyed on this one.

Be that as it may, this does not need cursors. You can use STRING_SPLIT and STRING_AGG to get the result you want very easily:

SELECT
t4.user_id,
t4.permission_id,
t4.roles,
t4.userroles,
matchingrolesinthisrow = (
SELECT STRING_AGG(r.value, ',')
FROM STRING_SPLIT(t4.roles, ',') r
JOIN STRING_SPLIT(t4.userroles, ',') ur ON ur.value = r.value
)
FROM #temp4 t4;

SQL Fiddle

If you are on an early version of SQL Server, you will have to use custom Table Valued Functions to do this.



Related Topics



Leave a reply



Submit