What are ways to match street addresses in SQL Server?
In order to do proper street address matching, you need to get your addresses into a standardized form. Have a look at the USPS postal standards here (I'm asssuming you're dealing with US addresses). It is by no means an easy process if you want to be able to deal with ALL types of US mail addresses. There is software available from companies like QAS and Satori Software that you can use to do the standardization for you. You'll need to export your addresses, run them through the software and then load the database with the updated addresses. There are also third party vendors that will perform the address standardization as well. It may be overkill for what you are trying to do but it's the best way to do it. if the addresses in your database are standardized you'll have a better chance of matching them (especially if you can standardize the input as well).
SQL Server comparing part of an address string, 2 columns
You can also try using Pearson's correlation coefficient
Perfect example for MS SQL is here.
Here is my quick assembled example:
--helper function to convert string to ASCII column
CREATE FUNCTION dbo.fn_StringToASCII
(
@text VARCHAR(MAX)
)
RETURNS @Result TABLE
(
POS INT,
NUM INT
)
AS
BEGIN
DECLARE @i INT
SET @i = 1
WHILE @i <= LEN(@text)
BEGIN
INSERT INTO @Result
(
POS,
NUM
)
VALUES
(@i, ASCII(SUBSTRING(@text, @i, 1)))
SET @i = @i + 1
END
RETURN;
END;
-- test example
CREATE TABLE test1(ID INT, ADDR1 VARCHAR(20));
CREATE TABLE test2(ID INT, ADDR2 VARCHAR(20));
INSERT INTO dbo.test1
(
ID,
ADDR1
)
VALUES
(1, '17 Wickham CT'),
(2, '6818 Chester DR'),
(3, '6217 Raymond RD'),
(4, 'TEST');
INSERT INTO dbo.test2
(
ID,
ADDR2
)
VALUES
(1, '17 S WICKHAM CT # 2'),
(2, '6801 CHESTER DR # A'),
(3, 'PO BOX 45581'),
(4, 'TEST');
--query with coeff
SELECT ISNULL(t1.ID, c2.ID) AS ID,
(AVG(c1.NUM * c2.NUM) - (AVG(c1.NUM) * AVG(c2.NUM))) / (STDEVP(c1.NUM) * STDEVP(c2.NUM)) AS Coeff
FROM dbo.test1 t1
CROSS APPLY dbo.fn_StringToASCII(LOWER(t1.ADDR1)) c1
RIGHT JOIN
(
SELECT t2.ID,
c2.*
FROM dbo.test2 t2
CROSS APPLY dbo.fn_StringToASCII(LOWER(t2.ADDR2)) c2
) c2
ON c2.ID = t1.ID
AND c2.POS = c1.POS
WHERE 1 = 1
GROUP BY ISNULL(t1.ID, c2.ID);
DROP TABLE dbo.test1
DROP TABLE dbo.test2
/*
results
ID Coeff
1 0.957280794307261
2 1.58310202187124
3 -0.397204343866094
4 0.987654320987654
*/
SQL Match City Name Inside Full Address?
Not ideal but might satisfy your requirements:
select Cust_id from customers
where address NOT LIKE '% London%';
[Note the added space: it assumes you will always precede the city name with a space. '%London%'
would match words containing London]
(It might be better if you had a normalised address, i.e. broken into street address, town, city, etc.))
Matching mailing addresses against a database
This is a problem that can be solved both by lat-long (use R-trees for quick 2-D closest neighbours! Comes as standard in MongoDB, but certainly availiable i Psql among others as well)
There's also the text matching, described here: SO:
What are ways to match street addresses in SQL Server?
There seems to be third party products availiable as well: SO: I need an address matching algorithm
If you want to combine these two approaches, look for the term "data fusion", which is a quite disparate collection of methods that essentially put higher weight to answers that are more certain, and bases the final answer on the aggregated certainty.
A description of some Harward Design GIS-project research could be of interest as well: http://www.gsd.harvard.edu/gis/manual/geocoding/
There's a list of all the cities in the world with their corresponding coordinates: http://www.maxmind.com/en/worldcities
Related Topics
How to Access Latest Row for Each Individual Identifier
Have Pl/Sql Outputs in Real Time
Update All SQL Null Values in Multiple Columns Using Column Level Where Clause
Extract Email Address from String Using Tsql
Sql 2008 Vs 2012 Error: Incorrect Syntax Near The Keyword 'Compute'
Sql - Select Max() and Accompanying Field
What Is The Purpose (Or Use Case) for an Outer Join in Sql
Order by Column1 If Column1 Is Not Null, Otherwise Order by Column2
Unique Date Range Fields in SQL Server 2008
Update X Set Y = Null Takes a Long Time
Dividing 2 Numbers in SQL Server
How to Delete Duplicate Records in Sql
How to Update SQL Table from Excel Directly
Order by a Field Being Equal to a Specific Value
Export Data Frame to SQL Server Using Rodbc Package