Reverse- Geocoding: How to Determine the City Closest to a (Lat,Lon) with Bigquery SQL

Reverse- geocoding: How to determine the city closest to a (lat,lon) with BigQuery SQL?

This is the best performing query we've worked out so far:

WITH a AS (
# a table with points around the world
SELECT * FROM UNNEST([ST_GEOGPOINT(-70, -33), ST_GEOGPOINT(-122,37), ST_GEOGPOINT(151,-33)]) my_point
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)

SELECT my_point, city_name, subdivision_1_name, country_name, continent_name
FROM (
SELECT loc.*, my_point
FROM (
SELECT ST_ASTEXT(my_point) my_point, ANY_VALUE(my_point) geop
, ARRAY_AGG( # get the closest city
STRUCT(city_name, subdivision_1_name, country_name, continent_name)
ORDER BY ST_DISTANCE(my_point, b.latlon_geo) LIMIT 1
)[SAFE_OFFSET(0)] loc
FROM a, b
WHERE ST_DWITHIN(my_point, b.latlon_geo, 100000) # filter to only close cities
GROUP BY my_point
)
)
GROUP BY 1,2,3,4,5

Sample Image

SQL: Finding the closest Lat/Lon record on Google BigQuery

Below is for BigQuery Standard SQL

#standardSQL
SELECT AS VALUE ARRAY_AGG(STRUCT<id_a INT64, id_b STRING>(a.id, b.id) ORDER BY ST_DISTANCE(a.point, b.point) LIMIT 1)[OFFSET(0)]
FROM (SELECT id, ST_GEOGPOINT(lon, lat) point FROM `project.dataset.table_a`) a
CROSS JOIN (SELECT id, ST_GEOGPOINT(lon, lat) point FROM `project.dataset.table_b`) b
GROUP BY a.id

you can test, play with it using dummy data from your question as

#standardSQL
WITH `project.dataset.table_a` AS (
SELECT 1 id, 32.95 lat, 65.567 lon UNION ALL
SELECT 2, 33.95, 65.566
), `project.dataset.table_b` AS (
SELECT 'a' id, 32.96 lat, 65.566 lon UNION ALL
SELECT 'b', 33.96, 65.566
)
SELECT AS VALUE ARRAY_AGG(STRUCT<id_a INT64, id_b STRING>(a.id, b.id) ORDER BY ST_DISTANCE(a.point, b.point) LIMIT 1)[OFFSET(0)]
FROM (SELECT id, ST_GEOGPOINT(lon, lat) point FROM `project.dataset.table_a`) a
CROSS JOIN (SELECT id, ST_GEOGPOINT(lon, lat) point FROM `project.dataset.table_b`) b
GROUP BY a.id

with result

Row id_a    id_b     
1 1 a
2 2 b

Latitude and Longitude - find city & state using bigquery

Hope below quick example will give you good direction to start with

SELECT 
yourLon, yourLat, ROUND(distance) AS distance, city, state
FROM (
SELECT
yourLon, yourLat, distance, city, state,
ROW_NUMBER() OVER (PARTITION BY yourLon, yourLat ORDER BY distance) AS win
FROM
JS((
// input table
SELECT
points.yourLon AS yourLon,
points.yourLat AS yourLat,
cities.city AS city,
cities.state AS state,
cities.lon AS lon,
cities.lat AS lat
FROM (
SELECT city, state, lon, lat FROM
(SELECT 'acampo' AS city, 'ca' AS state, -121.23200000000001 AS lon, 38.1964 AS lat),
(SELECT 'adelanto ' AS city, 'ca' AS state, -117.48 AS lon, 34.5894 AS lat),
(SELECT 'agoura hills ' AS city, 'ca' AS state, -118.76 AS lon, 34.1445 AS lat),
(SELECT 'alameda ' AS city, 'ca' AS state, -122.26 AS lon, 37.7606 AS lat),
(SELECT 'alamo ' AS city, 'ca' AS state, -122.019 AS lon, 37.8525 AS lat),
(SELECT 'albany ' AS city, 'ca' AS state, -122.294 AS lon, 37.8882 AS lat),
(SELECT 'albion ' AS city, 'ca' AS state, -123.705 AS lon, 39.2141 AS lat),
(SELECT 'alhambra ' AS city, 'ca' AS state, -118.12999999999998 AS lon, 34.0899 AS lat),
(SELECT 'aliso viejo ' AS city, 'ca' AS state, -117.736 AS lon, 33.5761 AS lat),
(SELECT 'altadena ' AS city, 'ca' AS state, -118.14000000000001 AS lon, 34.1884 AS lat),
(SELECT 'anaheim ' AS city, 'ca' AS state, -117.92667647058823 AS lon, 33.83603382352941 AS lat),
(SELECT 'anderson ' AS city, 'ca' AS state, -122.299 AS lon, 40.469 AS lat),
(SELECT 'angels camp ' AS city, 'ca' AS state, -120.625 AS lon, 38.0746 AS lat),
(SELECT 'antelope ' AS city, 'ca' AS state, -121.38099999999999 AS lon, 38.6861 AS lat),
(SELECT 'antioch ' AS city, 'ca' AS state, -121.806 AS lon, 37.99594999999999 AS lat),
(SELECT 'anza ' AS city, 'ca' AS state, -116.743 AS lon, 33.5886 AS lat),
(SELECT 'apple valley ' AS city, 'ca' AS state, -117.19333333333333 AS lon, 34.47484444444444 AS lat),
(SELECT 'aptos ' AS city, 'ca' AS state, -121.877 AS lon, 36.9926 AS lat),
(SELECT 'arbuckle ' AS city, 'ca' AS state, -122.015 AS lon, 39.0326 AS lat),
(SELECT 'arcadia ' AS city, 'ca' AS state, -118.04199999999999 AS lon, 34.1325 AS lat)
) AS cities
CROSS JOIN (
SELECT yourLon, yourLat FROM
(SELECT -122 AS yourLon, 38 AS yourLat),
(SELECT -117 AS yourLon, 34 AS yourLat),
) AS points
) ,
// input columns
yourLon, yourLat, city, state, lon, lat,
// output schema
"[{name: 'yourLon', type: 'float'},
{name: 'yourLat', type: 'float'},
{name: 'city', type: 'string'},
{name: 'state', type: 'string'},
{name: 'distance', type: 'float'}]",
// function
"function(r, emit){
var R = 3959; // Radius of the earth in miles
var dLat = deg2rad(r.lat-r.yourLat);
var dLon = deg2rad(r.lon-r.yourLon);
var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(deg2rad(r.yourLat)) * Math.cos(deg2rad(r.lat)) *
Math.sin(dLon/2) * Math.sin(dLon/2);
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
emit({yourLon: r.yourLon, yourLat: r.yourLat, city: r.city, state: r.state, distance: R * c});

function deg2rad(deg) {
return deg * (Math.PI/180)
}
}"
)
)
WHERE win = 1

result is as below

yourLon yourLat distance    city    state    
-117.0 34.0 32.0 anza ca
-122.0 38.0 10.0 alamo ca

BigQuery SQL Geo function that return the coordinates of full address

Here is the response from BigQuery.

Thanks for feature request!

Geocoding is currently not supported by BigQuery directly, but
thanks for request, we'll consider it for future planning.

The typical solution is to do it outside of BigQuery,
e.g. in Dataflow job using Google Maps Geocoding API:
https://developers.google.com/maps/documentation/geocoding/start
or similar third-party APIs.

For some addresses the Open Street Map can be used for geocoding,
we host it at bigquery-public-data.geo_openstreetmap.planet_nodes.

How to JOIN in geography columns using ST_CONTAINS in Big query

Below is for BigQuery Standard SQL

If you want to preserve not matching addresses in output - you can use below

#standardSQL
WITH matched_addresses AS (
SELECT
address_id,
SLDLST
FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample` ssgolden
JOIN `geographies.tl_2018_sldl_X` sldl
ON ST_CONTAINS(sldl.geom, ST_GEOGPOINT(ssgolden.longitude, ssgolden.latitude))
)
SELECT * FROM matched_addresses UNION ALL
SELECT address_id, NULL
FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample`
WHERE NOT address_id IN (SELECT address_id FROM matched_addresses)

but if you interested in only matched - use below one

#standardSQL
WITH matched_addresses AS (
SELECT
address_id,
SLDLST
FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample` ssgolden
JOIN `geographies.tl_2018_sldl_X` sldl
ON ST_CONTAINS(sldl.geom, ST_GEOGPOINT(ssgolden.longitude, ssgolden.latitude))
)
SELECT * FROM matched_addresses

How to get the historical weather for any city with BigQuery?

Update 2019: For convenience

SELECT * 
FROM `fh-bigquery.weather_gsod.all`
WHERE name='SAN FRANCISCO INTERNATIONAL A'
ORDER BY date DESC

Updated daily - or report here if it doesn't

For example, to get the hottest days for San Francisco stations since 1980:

SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) active_until
FROM `fh-bigquery.weather_gsod.all`
WHERE name LIKE 'SAN FRANC%'
AND date > '1980-01-01'
GROUP BY 1,2
ORDER BY active_until DESC

Sample Image

Note that this query processed only 28MB thanks to a clustered table.

And similar, but instead of using the station name I'll use a location and a table clustered by the location:

WITH city AS (SELECT ST_GEOGPOINT(-122.465, 37.807))

SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) station_until
FROM `fh-bigquery.weather_gsod.all_geoclustered`
WHERE EXTRACT(YEAR FROM date) > 1980
AND ST_DISTANCE(point_gis, (SELECT * FROM city)) < 40000
GROUP BY name, state
HAVING EXTRACT(YEAR FROM station_until)>2018
ORDER BY ST_DISTANCE(ANY_VALUE(point_gis), (SELECT * FROM city))
LIMIT 5

Sample Image


Update 2017: Standard SQL and up-to-date tables:

SELECT TIMESTAMP(CONCAT(year,'-',mo,'-',da)) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM `bigquery-public-data.noaa_gsod.gsod2016`
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

Additional example, to show the coldest days in Chicago in this decade:

#standardSQL
SELECT year, FORMAT('%s%s',mo,da) day ,min
FROM `fh-bigquery.weather_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod201*` b
ON a.usaf=b.stn AND a.wban=b.wban
WHERE name='CHICAGO/O HARE ARPT'
AND min!=9999.9
AND mo<'03'
ORDER BY 1,2

To retrieve the historical weather for any city, first we need to find what station reports in that city. The table [fh-bigquery:weather_gsod.stations] contains the name of known stations, their state (if in the US), country, and other details.

So to find all the stations in Austin, TX, we would use a query like this:

SELECT state, name, lat, lon
FROM [fh-bigquery:weather_gsod.stations]
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
LIMIT 10

Sample Image

This approach has 2 problems that need to be solved:

  • Not every known station is present in that table - I need to get an updated version of this file. So don't give up if you don't find the station you are looking for here.
  • Not every station found in this file has been operating every year - so we need to find stations that have data during the year we are looking for.

To solve the second problem, we need to join the stations table with the actual data we are looking for. The following query looks for stations around Austin, and the column c looks at how many days during 2015 have actual data:

SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long
FROM [fh-bigquery:weather_gsod.gsod2015] a
JOIN [fh-bigquery:weather_gsod.stations] b
ON a.wban=b.wban
AND a.stn=b.usaf
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
GROUP BY 1,2
LIMIT 10

Sample Image

That's good! We found 4 stations with data for Austin during 2015.

Note that we had to treat "rain" in a special way: When a station doesn't monitor for rain, instead of null, it marks it as 99.99. Our query filters those values out.

Now that we know the stn and wban numbers for these stations, we can pick any of them and visualize the results:

SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM [fh-bigquery:weather_gsod.gsod2015]
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

Sample Image



Related Topics



Leave a reply



Submit