What Are Ways to Match Street Addresses in SQL Server

What are ways to match street addresses in SQL Server?

In order to do proper street address matching, you need to get your addresses into a standardized form. Have a look at the USPS postal standards here (I'm asssuming you're dealing with US addresses). It is by no means an easy process if you want to be able to deal with ALL types of US mail addresses. There is software available from companies like QAS and Satori Software that you can use to do the standardization for you. You'll need to export your addresses, run them through the software and then load the database with the updated addresses. There are also third party vendors that will perform the address standardization as well. It may be overkill for what you are trying to do but it's the best way to do it. if the addresses in your database are standardized you'll have a better chance of matching them (especially if you can standardize the input as well).

SQL Server comparing part of an address string, 2 columns

You can also try using Pearson's correlation coefficient

Perfect example for MS SQL is here.

Here is my quick assembled example:

--helper function to convert string to ASCII column
CREATE FUNCTION dbo.fn_StringToASCII
(
@text VARCHAR(MAX)
)
RETURNS @Result TABLE
(
POS INT,
NUM INT
)
AS
BEGIN
DECLARE @i INT
SET @i = 1
WHILE @i <= LEN(@text)
BEGIN
INSERT INTO @Result
(
POS,
NUM
)
VALUES
(@i, ASCII(SUBSTRING(@text, @i, 1)))
SET @i = @i + 1
END
RETURN;
END;

-- test example
CREATE TABLE test1(ID INT, ADDR1 VARCHAR(20));
CREATE TABLE test2(ID INT, ADDR2 VARCHAR(20));

INSERT INTO dbo.test1
(
ID,
ADDR1
)
VALUES
(1, '17 Wickham CT'),
(2, '6818 Chester DR'),
(3, '6217 Raymond RD'),
(4, 'TEST');

INSERT INTO dbo.test2
(
ID,
ADDR2
)
VALUES
(1, '17 S WICKHAM CT # 2'),
(2, '6801 CHESTER DR # A'),
(3, 'PO BOX 45581'),
(4, 'TEST');

--query with coeff
SELECT ISNULL(t1.ID, c2.ID) AS ID,
(AVG(c1.NUM * c2.NUM) - (AVG(c1.NUM) * AVG(c2.NUM))) / (STDEVP(c1.NUM) * STDEVP(c2.NUM)) AS Coeff
FROM dbo.test1 t1
CROSS APPLY dbo.fn_StringToASCII(LOWER(t1.ADDR1)) c1
RIGHT JOIN
(
SELECT t2.ID,
c2.*
FROM dbo.test2 t2
CROSS APPLY dbo.fn_StringToASCII(LOWER(t2.ADDR2)) c2
) c2
ON c2.ID = t1.ID
AND c2.POS = c1.POS
WHERE 1 = 1
GROUP BY ISNULL(t1.ID, c2.ID);

DROP TABLE dbo.test1
DROP TABLE dbo.test2

/*
results
ID Coeff
1 0.957280794307261
2 1.58310202187124
3 -0.397204343866094
4 0.987654320987654
*/

SQL Match City Name Inside Full Address?

Not ideal but might satisfy your requirements:

 select Cust_id from customers 
where address NOT LIKE '% London%';

[Note the added space: it assumes you will always precede the city name with a space. '%London%' would match words containing London]

(It might be better if you had a normalised address, i.e. broken into street address, town, city, etc.))

Matching mailing addresses against a database

This is a problem that can be solved both by lat-long (use R-trees for quick 2-D closest neighbours! Comes as standard in MongoDB, but certainly availiable i Psql among others as well)

There's also the text matching, described here: SO:
What are ways to match street addresses in SQL Server?

There seems to be third party products availiable as well: SO: I need an address matching algorithm

If you want to combine these two approaches, look for the term "data fusion", which is a quite disparate collection of methods that essentially put higher weight to answers that are more certain, and bases the final answer on the aggregated certainty.

A description of some Harward Design GIS-project research could be of interest as well: http://www.gsd.harvard.edu/gis/manual/geocoding/

There's a list of all the cities in the world with their corresponding coordinates: http://www.maxmind.com/en/worldcities



Related Topics



Leave a reply



Submit