Elasticsearch Map Two SQL Tables with a Foreign Key

ElasticSearch map two sql tables with a foreign key

In the SQL world, everything is about keeping relationships between tables in such a way that data is never repeated twice (or as seldom as possible), hence the primary-key/foreign-key approach.

In the NoSQL world in general, and in Elasticsearch in particular, there are no explicit relationships between your indices and types. There are ways to create relationships between documents themselves (e.g. parent-child or nested objects), but there is no concept of JOIN.

Instead, the main idea is to denormalize your data in such a way that your documents will be "usable" to carry out your specific use cases. One should not be afraid of keeping redundant copies of data. More generally, you need to ask yourself the following two questions (among many others):

  1. what data/fields do you need to display to your users?
  2. what queries do you need to perform to retrieve the above data?

In the simple case you highlighted, I would definitely go with a document that contains the JOIN of your two tables:

{
"ID": 1,
"Name": "Episode 2",
"TVSeriesID": 4,
"TVSeriesName": "Friends"
}

It is no big deal that the title Friends will be contained in 236 documents, the idea is that you can retrieve any episode and it contains all the data you need to know about it.

SQL: Count the number of times two tables share a foreign key

This gives you an overview over all chefs and waiters that worked together more than once. See this SQLfiddle for an example.

SELECT 
chefs.id as chef,
waiters.id as waiter,
COUNT(chefs.id) AS cowork_count,
GROUP_CONCAT(kitchen.name) AS kitchens
FROM waiters
INNER JOIN chefs ON waiters.kitchen_id = chefs.kitchen_id
INNER JOIN kitchen ON kitchen.id = chefs.kitchen_id
GROUP BY waiters.id, chefs.id
HAVING COUNT(chefs.id) > 1
ORDER BY chefs.id, waiters.id;

(That works for MySQL/MariaDB. I believe GROUP_CONCAT is non-standard but in this case it's also non-essential and can be left out)

How to 'join' two indexes in ElasticSearch

As answered in your other question, nothing prevents you from storing the Customer name inside each order_item document at indexing time, while still having a dedicated index orders also containing the Customer data. Remember that it's all about cleverly denormalizing your data so that each of your documents be as "self-contained" as you need.

curl -XPUT localhost:9200/order_items/order_item/1 -d '{
"ID": 1,
"Name": "Shoes",
"Price": 9.99,
"OrderID": 82,
"Customer": "John Smith"
}'

curl -XPUT localhost:9200/order_items/order_item/2 -d '{
"ID": 2,
"Name": "Hat",
"Price": 19.99,
"OrderID": 82,
"Customer": "John Smith"
}

The advantages of this solution is that each order item is completely self-contained, and you can group/aggregate them on OrderID in order to get all items of a given order.

Also, as @JohnAment mentioned in his comment, the order/order_item use case is also a good candidate for using either

  1. the parent/child relationship
  2. or nested objects.

In the first case, you'd have one order "parent" document...

curl -XPUT localhost:9200/orders/order/82 -d '{
"ID": 82,
"Customer": "John Smith"
}'

And several order_item "children" documents that you index using their parent ID:

curl -XPUT localhost:9200/order_items/order_item/1?parent=82 -d '{
"ID": 1,
"Name": "Shoes",
"Price": 9.99
}'
curl -XPUT localhost:9200/order_items/order_item/2?parent=82 -d '{
"ID": 2,
"Name": "Hat",
"Price": 19.99
}'

In the second case, your order document would contain all order items in a nested OrderItems property and would look like this:

curl -XPUT localhost:9200/orders/order/82 -d '{
"ID": 82,
"Customer": "John Smith",
"OrderItems": [
{
"ID": 1,
"Name": "Shoes",
"Price": 9.99
},{
"ID": 2,
"Name": "Hat",
"Price": 19.99
}
]
}'


Related Topics



Leave a reply



Submit