How to join tables on regex
A very odd way to join. Every match on one side is combined with every row from the other table ...
regexp_matches()
is probably the wrong function for your purpose. You want a simple regular expression match (~
). Actually, the LIKE
operator will be faster:
Presumably fastest with LIKE
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON msg.src_addr LIKE ('%38' || mnc.code || '%')
OR msg.dst_addr LIKE ('%38' || mnc.code || '%')
WHERE length(mnc.code) = 3;
In addition, you only want mnc.code
of exactly 3 characters.
With regexp match
You could write the same with regular expressions but it will most definitely be slower. Here is a working example close to your original:
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON (msg.src_addr || '+' || msg.dst_addr) ~ (38 || mnc.code)
AND length(mnc.code) = 3;
This also requires msg.src_addr
and msg.dst_addr
to be NOT NULL
.
The second query demonstrates how the additional check length(mnc.code) = 3
can go into the JOIN
condition or a WHERE
clause. Same effect here.
With regexp_matches()
You could make this work with regexp_matches()
:
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON EXISTS (
SELECT *
FROM regexp_matches(msg.src_addr ||'+'|| msg.dst_addr, '38(...)', 'g') x(y)
WHERE y[1] = mnc.code
);
But it will be slow in comparison.
Explanation:
Your regexp_matches()
expression just returns an array of all captured substrings of the first match. As you only capture one substring (one pair of brackets in your pattern), you will exclusively get arrays with one element.
You get all matches with the additional "globally" switch 'g'
- but in multiple rows. So you need a sub-select to test them all (or aggregate). Put that in an EXISTS
- semi-join and you arrive at what you wanted.
Maybe you can report back with a performance test of all three?
Use EXPLAIN ANALYZE
for that.
bigquery join tables using regex
is there any more efficient way?
If by "more efficient" you mean less verbose and easier to read / maintain - try below
#standardSQL
create temp function part(host string) as (
regexp_extract(host, r'.*req-([a-z0-9]{12})')
);
select *
from `config.tested` a
join `config.active` b
on part(a.host) = part(b.host)
or its variation
REGEX join in SQL BigQuery
With your sample data I was able to create a StandardSQL query to achieve the desired output. I have uploaded in BigQuery two tables one called chain_name
and the second table store_name
with the 1st and 2nd chunk of data you shared, respectively.
In order to CROSS JOIN the 2 tables analysing if brand_name
is inside store_name
(which may contain the the store's address in addition to the brand_name), I have used the HAVING clause, which filters out the rows that do not return TRUE when they are evaluated against a bool expression. Also, REGEX_CONTAINS() which returns true if the analysed expressions are a partial match.
Below is the query,
SELECT salesperson, chain_id, chain_name, brand_name, store_name FROM `project_id.dataset.store_name`
CROSS JOIN `project_id.dataset.chain_name`
GROUP BY 1,2,3,4,5
HAVING REGEXP_CONTAINS(lower(store_name),lower(brand_name))
ORDER BY chain_id
And the output,
Row salesperson chain_id chain_name brand_name store_name
1 pepito 28 chairs4all sitnice sitnice
2 pepito 28 chairs4all AwwwChairs! AwwwChairs! Seville
3 pepito 28 chairs4all sitnice SitNice nº12
4 jaimito 34 eatdrinkeat Drink & eat Drink & eat North road 23
5 jaimito 567 last One One 34
Notice that LOWER() is used in order to match the names even though there might be some lower/caption letters difference.
Regular expression in SQL join condition
Try like this:
Select * from TableA a
Inner join TableB b
On Right(a.Column1,3)= b.Column1
Also you can use CTE to get length of the right joined table data (say 233) which will be 3 in length and reuse this length in the RIGHT
function parameter where i have used 3
.
EDIT:
Without CTE i think this should work:
Select * from TableA a
Inner join TableB b
On Right(a.Column1,LEN(b.Column1))= b.Column1
SQL(ite) JOIN with regex within JOIN
How about just using the lower()
function?
SELECT *, upper(substr(t1.name, 1, 1) || lower(t1.name, 2) as NameWithCapAtFront
FROM table1 as t1 LEFT JOIN
table2 as t2
ON lower(t1.Name) = lower(t2.Name);
Admittedly, this lower cases the whole name, but that seems reasonable in this case.
Related Topics
Whats The Best Sqlite Data Type for a Long String
Sql Select Rows Containing Part of String
Query to Calculate Average Time Between Successive Events
How to Bulk Update with SQL Server
"Pivoting" a Table in SQL (I.E. Cross Tabulation/Crosstabulation)
In SQL Server, How to Convert Binary Strings to Binary
Merging Two Sqlite Databases Which Both Have Junction Tables
How to Concat Multiple Rows into One Column in SQL Server
How to Search New Line Char in Oracle Table
How to Set a Jdbc Timeout for a Single Query
Error While Uploading a Report
Are Multiple Joins Unavailable When Using Adodb to Query an Excel File in a Vba Procedure
How to Concatenate Multiple Rows' Fields in a Sap Hana Table