Google-Like Search Engine in PHP/Mysql

Google-like Search Engine in PHP/mySQL

You can also try out SphinxSearch. Craigslist uses sphinx and it can connect to both mysql and postgresql.

How to search database like google in php

I build search engines.

I'm going to give you 6 tips to explore, so that you can continue to learn how to program & proceed if desired.

TIP #1: Focus!

First define what you're trying to accomplish. Think about what you really want to do, before trying to build a search engine from scratch. It may not be your end goal AKA what you really want to do.

Do you really want to crawl the web, with this idea: "Extract all url from sitemap.xml with PHP CURL"?

Or do you simply want to add a search box to your website, which gets product data from your product database & displays that product data on your website, with this idea: "I want to implement my own search feature into my website"?

It's kind of hard to tell.

If you want to add a product-based "search feature" to your website, then you don't need to extract content from an XML Sitemap. You'd simply retrieve it from a database like: MySQL, PosgreSQL, Oracle, SQL Server, etc... and display the results onto your search results page. That's usually what people are looking to do, when they want to add a "search feature" into their website.

TIP #2: For searching, simpler is faster.

This is good to remember, when writing code: Simple Always Wins. It's known as the "S.A.W. Principle".

First, let's look at your SQL. It has 2 select statements, which are joined together with a UNION keyword.

SELECT * FROM search_engine
WHERE soundex(keyword) LIKE soundex('%$q%')
UNION
SELECT * FROM search_engine
WHERE title LIKE '%$q%' OR link LIKE '%$q%'
ORDER BY `clicks` DESC

Since this looks like it's coming from the same database table, then you could combine it as follows... where the change is replacing UNION SELECT * FROM search_engine WHERE with OR:

SELECT * FROM search_engine
WHERE soundex(keyword) LIKE soundex('%$q%')
OR title LIKE '%$q%' OR link LIKE '%$q%'
ORDER BY `clicks` DESC

So if you can remove the UNION keyword & combine the 2 select statements into 1 select statement, then the database engine can do less work to fulfill the search query request.

If you're using 2 different tables, then you'll need to do some homework to look up a concept known as an inverted index. The concept is the same: Keep the search as simple as possible... so that the database server does as little work as possible... so that the search experience runs as fast as possible!

Even though simpler means faster, it doesn't mean more accuracy.

TIP #3: Accuracy makes a search engine more relevant for users. Think of this as accuracy = "powerful".

Let's look at these page titles & how a search query works with them:

  1. Extract all url from sitemap.xml with PHP CURL
  2. How to Extract all url from sitemap.xml with PHP CURL

For these results to match with accuracy, then you'd have to have the 2nd page's full title in your database. If you only have the 1st page's title (without the "How to "), then a query for the 2nd page's title won't find a result for the 1st page's title. That's the problem that you've noticed with your site's search feature.

The reason is that a query for the 1st page's title is an exact partial match inside of the 2nd page's title. However a query for the 2nd page's title is not an exact match, nor even a partial match of the 1st page's title.

To get around that problem, search engines work on the basis of keywords.

TIP #4: Learn about Keywords vs. Stop Words & how to parse them in your search query.

In a search query, there are both relevant key words known as keywords & non-relevant junk words, called stop words. You may want to investigate the concept of what stop words are & how search engines use them or most often, throw them away before the search query is actually performed.

So in your queries, these are your unique & meaningful keywords. They have self-contained concrete meanings, when you think of each word individually.

array('extract', 'url', 'sitemap.xml', 'PHP', 'CURL')

Concrete meanings:

  • Extract = Pull, grasp, grab something out of a group.
  • URL = A hyperlink.
  • sitemap.xml = An XML Sitemap file.
  • PHP = A programming language name.
  • CURL = Command-line URL fetch.

These are most likely the stop words, which have either no meaning to them by themselves or a vague meaning.

array('How', 'to', 'all', 'from', 'with');

Vague Meanings:

  • How = A simple lead-off to a question. So what does a search engine do with this? It throws it away.
  • To = A connecting word. It points to a group of something. Maybe useful. Maybe not. Toss it.
  • All = A group of everything. Possibly useful, but it seems vague to a search engine. Toss it.
  • From = Another connecting word. It points to a group of something else. Again vague. Toss it.
  • With = Including. Another connecting word. Also vague. A computer doesn't know to add "PHP" or "PHP CURL" after the with keyword. Bummer! Toss it.

Search engines usually strip stop words & query the meaningful keywords for results. A relevance score is how accurate the search results are.

Here is a hypothetical example (which I made up off of the top of my head while writing this): If a query finds a page with 1 of 5 unique keywords, then the relevancy score would be 20%. If it finds a page with 4 of 5 unique keywords, then the relevancy score would be 80%. It's hypothetical, because it's not how any specific search engine currently works. It's just a basic concept to explain a point, using a simple illustration.

The relevancy algorithm & score is really up to the search engine designer/builder to create. The relevancy algorithm can be as simple or as complex as the search engine designers and/or builders want to make it. Search engine developers can spend a lot of time fine-tuning that relevancy algorithm & score. It also depends on the search algorithm, which is used & how well the search bot finds data for those algorithms.

Tip #5: Explore building search bots!

You should look into building search bots, if you really want to accomplish this: "Extract all url from sitemap.xml with PHP CURL".

I've written a search bot too. It has already crawled over 1 million URLs!

PHP Curl isn't what extracts links. It's what fetches content from 1 URL. The search bot has to be written to parse the returned HTML, so that it can figure out what to extract from those search results.

Just a warning: People don't write perfect HTML syntax in their URLs. So your search bot will require a lot of fine tuning to get it to detect sloppy programming, which will crash your search bot. That is a huge time commitment! Just be ready for spending years on this project or even decades, if you decide to pursue building your own search bot. Building a search engine is a long journey! Your search bot WILL CRASH hundreds to thousands of times, before you can get it to crawl millions of URLs.

So... Do you really want to "Extract all url from sitemap.xml" or do you want to query a list of previously uploaded product data, which resides in your database? That latter database querying idea is A LOT FASTER to build & easier to maintain in the future!

Tip #6: If you don't want to spend a lot of time building a search engine from scratch, plus a search bot from scratch, plus a relevancy score algorithm from scratch, then look at some pre-built search engine solutions. Here are a few popular ones. They can be fun to play around with!

  1. Elastic Search
  2. Lucene
  3. Solr

Conclusion: Search engines are not easy to build! They can take years to build. Be ready for a significant time commitment (easily months, realistically years, possibly decades), if you really want to accomplish this goal: "I want to show results to user in all possible ways."

Database search like google

As to not let this go without a working answer:

<?php
$search = 'this is my search';

$searchSplit = explode(' ', $search);

$searchQueryItems = array();
foreach ($searchSplit as $searchTerm) {
/*
* NOTE: Check out the DB connections escaping part
* below for the one you should use.
*/
$searchQueryItems[] = "name LIKE '%" . mysqli_real_escape_string($searchTerm) . "%'";
}

$query = 'SELECT pageurl FROM names' . (!empty($searchQueryItems) ? ' WHERE ' . implode(' AND ', $searchQueryItems) : '');
?>

DB connections escaping

mysqli_:

Keep using mysqli_real_escape_string or use $mysqli->real_escape_string($searchTerm).

mysql_:

if you use mysql_ you should use mysql_real_escape_string($searchTerm) (and think about changing as it's deprecated).

PDO:

If you use PDO, you should use trim($pdo->quote($searchTerm), "'").

How to create a search engine to search through mysql database

Now that's a very complicated topic. So I'm just going to point you to a few starting points. If someone else is able to formulate a good answer to this hugely complicated topic here, I'd be glad to give them an upvote..

  • Doctrine 1 has a decent "search engine". [1] (If you're not already using an ORM, I highly recommend you give it a try.)
  • Lucene[2] is also worth a try.
  • Roll your own search engine. I also used to think it's about some MySQL features or something, but it's really not. It's all about building good indexes and using them well. (Building them yourself, that is. Not just using database indexes.) It's actually a pretty interesting topic to get into, if you have the time.
  • Buy Nine Algorithms That Changed the Future by John MacCormick.[3] It's got an awesome chapter about how search engines work.

[1] http://docs.doctrine-project.org/projects/doctrine1/en/latest/en/manual/searching.html

[2] http://en.wikipedia.org/wiki/Lucene

[3] http://www.amazon.com/Nine-Algorithms-That-Changed-Future/dp/0691147140

Creating search engine in PHP

It is 'cause { %$term% } means the query try search something that contain all of the string in $term. Try this :

$term = explode(" ",$term);
if (count($term) > 0) {
$Where = '';
foreach($term as $Item) {
$Where .= "Title like '%$Item%' OR ";
}
$Where = substr($Where,0,-4);
$query = mysql_query("SELECT * FROM `save_data` WHERE $Where");
}

In this way the search will check out all words.



Related Topics



Leave a reply



Submit