How to Find "Related Items" in PHP

How to find related items in PHP

There are many ways to calculate similarity of two items, but for a straightforward method, take a look at the Jaccard Coefficient.

http://en.wikipedia.org/wiki/Jaccard_index

Which is: J(a,b) = intersection(a,b)/union(a,b)

So lets say you want to compute the coefficient of two items:

Item A, which has the tags "books, school, pencil, textbook, reading"
Item B, which has the tags "books, reading, autobiography"

intersection(A,B) = books, reading
union(A,B) = books, school, pencil, textbook, reading, autobiography

so J(a,b) = 2/6 = .333

So the most related item to A would be the item which results in the highest Jaccard Coefficient when paired with A.

MySQL / PHP: Find similar / related items by tag / taxonomy

You question about How do I calculate which are the most closely related city? For example. If I were looking at city 1 (Paris), the results should be: London (2), New York (3) and based on your provided data set there is only one thing to relate that is the common tags between the cities so the cities which shares the common tags would be the closest one below is the subquery which finds the cities (other than which is provided to find its closest cities) that shares the common tags

SELECT * FROM `cities`  WHERE id IN (
SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )

Working

I assume you will input one of the city id or name to find their closest one in my case "Paris" has the id one

 SELECT tag_id FROM `cities_tags` WHERE city_id=1

It will find all the tags id which paris has then

SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )

It will fetch all the cities except paris that has the some same tags that paris also has

Here is your Fiddle

While reading about the Jaccard similarity/index found some stuff to understand about the what actualy the terms is lets take this example we have two sets A & B

Set A={A, B, C, D, E}

Set B={I, H, G, F, E, D}

Formula to calculate the jaccard similarity is JS=(A intersect B)/(A
union B)

A intersect B = {D,E}= 2

A union B ={A, B, C, D, E,I, H, G, F} =9

JS=2/9 =0.2222222222222222

Now move towards your scenario

Paris has the tag_ids 1,3 so we make the set of this and call our Set
P ={Europe,River}

London has the tag_ids 1,3 so we make the set of this and call our
Set L ={Europe,River}

New York has the tag_ids 2,3 so we make the set of this and call our
Set NW ={North America,River}

Calculting the JS Paris with London JSPL = P intersect L / P union L ,
JSPL = 2/2 = 1

Calculting the JS Paris with New York JSPNW = P intersect NW / P
union NW ,JSPNW = 1/3 = 0.3333333333

Here is the query so far which calcluates the perfect jaccard index you can see the below fiddle example

SELECT a.*, 
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index
FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` ,
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT GROUP_CONCAT(tag_id SEPARATOR ',') FROM `cities_tags` WHERE city_id= 1)AS parisset

FROM `cities_tags`
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`)
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC

In above query i have the i have derived the result set to two subselects in order get my custom calculated aliases

Sample Image

You can add the filter in above query not to calculate the similarity with itself

SELECT a.*, 
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index
FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` ,
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT GROUP_CONCAT(tag_id SEPARATOR ',') FROM `cities_tags` WHERE city_id= 1)AS parisset

FROM `cities_tags`
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`) WHERE cities.`id` !=1
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC

So the result shows Paris is closely related to London and then related to New York

Jaccard Similarity Fiddle

Creating a related or similar posts using PHP & MySQL

Using the MySQL Full Text search MATCH (col1,col2,...) AGAINST (expr [search_modifier]) thing.

Let's say your table is articles and you need to find related posts about a title of current post. Do it like this:

SELECT *, MATCH(title, body) AGAINST('$CurrentPostTitle') AS score
FROM articles
WHERE MATCH(title, body) AGAINST('$CurrentPostTitle')
ORDER BY score DESC LIMIT 5

This will give you top 5 related posts.

But first remember to enabled Full Text search for that table's columns, by running this query:

ALTER TABLE articles ADD FULLTEXT (title, body);

[EDIT]: Why to not use LIKE: Clarification to OP:

Because it will not give correct results. Let's say you current title is "Music of 1980" and you want related posts on that. Now, if you use LIKE then only the posts containing EXACTLY the sequence of words "Music of 1980" will appear. However, if you use MATCH ... AGAINST, then posts that contain Music OR 1980 will appear. Also, the posts that contain both Music and 1980 will appear on Top because it gives a SCORE to each results and we are sorting by that score.I hope that's clear.

[EDIT]: 2:

If you have categories, you can add AND Category = '$CurrentCategory' in the SQL query where clause to get more specific results.

[EDIT]: 3: OP can't use Full text:

If you can not use Full Text (for some reason), you can just show 5 random posts from the same category. As they are in same category, they are somehow related at least:

SELECT *
FROM articles
WHERE Category = '$CurrentCategory'
LIMIT 5

Edited Syntax: Changed LIMTI to LIMIT in MySQL Code

Find Related Items Though Tags

Use the mysql find_in_set() function. It works like this.

SELECT * FROM items WHERE FIND_IN_SET(tag1,tag2,tag3) > 0;

Since your incoming input is already command delimited, all you need to do is add quotes.

$tags = $row['tags']; //format: tag1,tag2,tag3
$set = "'".implode("','", $tags)."'";

$sql = "SELECT * FROM items WHERE FIND_IN_SET(" . $set . ") > 0);

Query for related products

It's not a great idea to store a multi-valued attribute in a single field. Ideally, you would have a Products table, a Tags table and a ProductTags Table.

However, you could select the Tags for the Product and use explode() to get an array of Tags. The for each other product, do the same and use array_intersect to get an array of common elements. Then use count() > 1 to determine if it's related

So:

function getRelatedProducts($productName)
{
$productResults = mysql_query("SELECT * FROM products WHERE productName = '$productName' LIMIT 0,1");

$relatedProducts = array();

if(mysql_num_rows($productResults) == 1)
{
$product = mysql_fetch_array($productResults);
$tags = explode(",",$product['tags']);

$otherProducts = mysql_query("SELECT * FROM products WHERE productName != '$productName'");

while($otherProduct = mysql_fetch_array($otherProducts))
{
$otherTags = explode(",",$otherProduct['tags']);
$overlap = array_intersect($tags,$otherTags);
if(count($overlap > 1)) $relatedProducts[] = $otherProduct;
}
}

return $relatedProducts;
}

It's a bit rough and ready but it should work. This code assumes you have columns called productName and tags.

PHP:array_intersect - Manual

If you go ahead with a product_tags table, you can use the following code to find related products:

function getRelatedProducts($productId)
{
$sql = "SELECT p.*,COUNT(*) AS matchedTags FROM products p
INNER JOIN product_tags pt ON pt.product_id = p.id
WHERE pt.tag_id IN (SELECT tag_id FROM product_tags WHERE product_id = $product_id)
GROUP BY p.id
HAVING COUNT(*) > 1";

$results = mysql_query($sql);

$relatedProducts = array();

while($result = mysql_fetch_array($results))
{
$relatedProducts[] = $result;
}

return $relatedProducts;
}

The important part is the SQL at the start of the function. It will give you the related products. Do with them what you will!

select all items and related items in one query

If I would need a fast'n'dirty way, I'd go with building a temporary array of images per product

$groupedProducts = array();
foreach ($products as $product) {
if (!array_key_exists($product['id'], $groupedProducts) {
$groupedProducts[$product['id']] = $product;
$groupedProducts[$product['id']]['images'] = array();
}
$groupedProducts[$product['id']]['images'][] = $product['product_images.url'];
}

Then iterate

foreach ($groupedProducts as $product) {
// output some description, title, price, etc.
foreach ($product['images'] as $imageUrl) {
echo '<img src="'.$imageUrl.'">';
}
}

finding related items using mysql?

I don't know why I can't comment on other people's answers so forgive this "answer"

"i forgot to add that each topic, has a little bio about them, like 200 character about paragraph saying what the topic is about, would that be good way to relate stuff to each other?"

This is a very unreliable way to relate anything. If this is the case, pretty much everything would relate as common words appear all over every topic. Words like "the" or "and" would relate everything in something like a full text search, depending on how you orchestrate your comparison.

In your case, you might want to clarify HOW you plan on comparing. You already mention you are using php and mysql, so we all assume you are writing a query to find the relations.
Why then are you avoiding a relational table? A keyword table? It's basic web development at this point and seems very strange that you'd be interested in a complicated work around for a very simple concept.

blog{id, title body, date}
topic{id, title}
blog_topic{id, blog_id, topic_id}

$sql = "SELECT b.*, t.* FROM `blogs` b LEFT JOIN `blog_topic` r ON r.`blog_id` = b.`id` LEFT JOIN `topic` t ON t.`id` = r.`topic_id` WHERE t.`id` = ".$whatever_topic_id_you_pass;

Seems simple enough to me. Much more reliable than some weird full text search.

How to get a list of similar items

If you have both the item_id and the city_id to pass in:

   SELECT i.name,
GROUP_CONCAT(a.name) attributes,
c.name
FROM items i
JOIN city c
ON c.id = i.city_id
JOIN item_attribute ia
ON ia.item_id = i.id
AND EXISTS (
SELECT 1
FROM item_attribute ia1
JOIN item_attribute ia2
ON ia2.attribute_id = ia1.attribute_id
AND ia2.item_id = ia.item_id
WHERE ia1.item_id = :item_id /* Pass in item id variable */
)
JOIN attributes a
ON a.id = ia.attribute_id
WHERE i.city_id = :city_id /* Pass in city id variable */
GROUP BY i.name, c.name

If you just want to pass the example item id: (A little bit sloppy, but should work)

   SELECT i.name,
GROUP_CONCAT(a.name) attributes,
c.name
FROM items base
JOIN items i
ON i.city_id = base.city_id
JOIN city c
ON c.id = i.city_id
JOIN item_attribute ia
ON ia.item_id = i.id
AND EXISTS (
SELECT 1
FROM item_attribute ia1
JOIN item_attribute ia2
ON ia2.attribute_id = ia1.attribute_id
AND ia2.item_id = ia.item_id
WHERE ia1.item_id = base.id
)
JOIN attributes a
ON a.id = ia.attribute_id
WHERE base.id = :item_id /* Pass in item id variable */
GROUP BY i.name, c.name

** UPDATE **

Ordering:

... 
JOIN (
SELECT ia2.item_id, COUNT(*) count
FROM item_attribute ia1
JOIN item_attribute ia2
ON ia2.attribute_id = ia1.attribute_id
AND ia2.item_id = ia1.item_id
/* AND ia2.id != ia1.id /* If you don't want the original item */
WHERE ia1.item_id = base.id
GROUP BY ia2.item_id
) similar
ON similar.id = ia.item_id
...
ORDER BY similar.count DESC


Related Topics



Leave a reply



Submit