ElasticSearch map two sql tables with a foreign key
In the SQL world, everything is about keeping relationships between tables in such a way that data is never repeated twice (or as seldom as possible), hence the primary-key/foreign-key approach.
In the NoSQL world in general, and in Elasticsearch in particular, there are no explicit relationships between your indices and types. There are ways to create relationships between documents themselves (e.g. parent-child or nested objects), but there is no concept of JOIN.
Instead, the main idea is to denormalize your data in such a way that your documents will be "usable" to carry out your specific use cases. One should not be afraid of keeping redundant copies of data. More generally, you need to ask yourself the following two questions (among many others):
- what data/fields do you need to display to your users?
- what queries do you need to perform to retrieve the above data?
In the simple case you highlighted, I would definitely go with a document that contains the JOIN of your two tables:
{
"ID": 1,
"Name": "Episode 2",
"TVSeriesID": 4,
"TVSeriesName": "Friends"
}
It is no big deal that the title Friends
will be contained in 236 documents, the idea is that you can retrieve any episode and it contains all the data you need to know about it.
SQL: Count the number of times two tables share a foreign key
This gives you an overview over all chefs and waiters that worked together more than once. See this SQLfiddle for an example.
SELECT
chefs.id as chef,
waiters.id as waiter,
COUNT(chefs.id) AS cowork_count,
GROUP_CONCAT(kitchen.name) AS kitchens
FROM waiters
INNER JOIN chefs ON waiters.kitchen_id = chefs.kitchen_id
INNER JOIN kitchen ON kitchen.id = chefs.kitchen_id
GROUP BY waiters.id, chefs.id
HAVING COUNT(chefs.id) > 1
ORDER BY chefs.id, waiters.id;
(That works for MySQL/MariaDB. I believe GROUP_CONCAT is non-standard but in this case it's also non-essential and can be left out)
How to 'join' two indexes in ElasticSearch
As answered in your other question, nothing prevents you from storing the Customer
name inside each order_item
document at indexing time, while still having a dedicated index orders
also containing the Customer
data. Remember that it's all about cleverly denormalizing your data so that each of your documents be as "self-contained" as you need.
curl -XPUT localhost:9200/order_items/order_item/1 -d '{
"ID": 1,
"Name": "Shoes",
"Price": 9.99,
"OrderID": 82,
"Customer": "John Smith"
}'
curl -XPUT localhost:9200/order_items/order_item/2 -d '{
"ID": 2,
"Name": "Hat",
"Price": 19.99,
"OrderID": 82,
"Customer": "John Smith"
}
The advantages of this solution is that each order item is completely self-contained, and you can group/aggregate them on OrderID
in order to get all items of a given order.
Also, as @JohnAment mentioned in his comment, the order/order_item
use case is also a good candidate for using either
- the parent/child relationship
- or nested objects.
In the first case, you'd have one order
"parent" document...
curl -XPUT localhost:9200/orders/order/82 -d '{
"ID": 82,
"Customer": "John Smith"
}'
And several order_item
"children" documents that you index using their parent ID:
curl -XPUT localhost:9200/order_items/order_item/1?parent=82 -d '{
"ID": 1,
"Name": "Shoes",
"Price": 9.99
}'
curl -XPUT localhost:9200/order_items/order_item/2?parent=82 -d '{
"ID": 2,
"Name": "Hat",
"Price": 19.99
}'
In the second case, your order
document would contain all order items in a nested OrderItems
property and would look like this:
curl -XPUT localhost:9200/orders/order/82 -d '{
"ID": 82,
"Customer": "John Smith",
"OrderItems": [
{
"ID": 1,
"Name": "Shoes",
"Price": 9.99
},{
"ID": 2,
"Name": "Hat",
"Price": 19.99
}
]
}'
Related Topics
How to Limit the Results on a SQL Query
There Are a Method to Paging Using Ansi SQL Only
How to Return a New Identity Column Value from an SQLserver Select Statement
Difference Between Datetime Converts in Msexcel and SQL Server
Search Count of Words Within a String Using SQL
Oracle/Sql: Wm_Concat & Order By
Find the Date/Time a Table's Column Was Created
Why When Matched' Cannot Appear More Than Once in a 'Update' Clause of a Merge Statement
How to Test My Ad-Hoc SQL with Parameters in Postgres Query Window
SQL Not a Single Group Group Function Error
How to Format Datetime as M/D/Yyyy in SQL Server
Postgres 9.4 JSONb Array as Table
Join Versus Exists Performance
How to Use Distinct in Ms Access
Performance of Regexp_Replace VS Translate in Oracle