How Would I Implement a Simple Site Search with PHP and MySQL

How would I implement a simple site search with php and mySQL?

Everyone is suggesting MySQL fulltext search, however you should be aware of a HUGE caveat. The Fulltext search engine is only available for the MyISAM engine (not InnoDB, which is the most commonly used engine due to its referential integrity and ACID compliance).

So you have a few options:

1. The simplest approach is outlined by Particle Tree. You can actaully get ranked searches off of pure SQL (no fulltext, no nothing). The SQL query below will search a table and rank results based off the number of occurrences of a string in the search fields:

SELECT
SUM(((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'term', '')))/4) +
((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'search', '')))/6))
AS Occurrences
FROM
posts AS p
GROUP BY
p.id
ORDER BY
Occurrences DESC

edited their example to provide a bit more clarity

Variations on the above SQL query, adding WHERE statements (WHERE p.body LIKE '%whatever%you%want'), etc. will probably get you exactly what you need.

2. You can alter your database schema to support full text. Often what is done to keep the InnoDB referential integrity, ACID compliance, and speed without having to install plugins like Sphinx Fulltext Search Engine for MySQL is to split the quote data into it's own table. Basically you would have a table Quotes that is an InnoDB table that, rather than having your TEXT field "data" you have a reference "quote_data_id" which points to the ID on a Quote_Data table which is a MyISAM table. You can do your fulltext on the MyISAM table, join the IDs returned with your InnoDB tables and voila you have your results.

3. Install Sphinx. Good luck with this one.

Given what you described, I would HIGHLY recommend you take the 1st approach I presented since you have a simple database driven site. The 1st solution is simple, gets the job done quickly. Lucene will be a bitch to setup especially if you want to integrate it with the database as Lucene is designed mainly to index files not databases. Google custom site search just makes your site lose tons of reputation (makes you look amateurish and hacked), and MySQL fulltext will most likely cause you to alter your database schema.

PHP/MySQL: Simple search engine

Search is obviously a big topic and there are lots of different ways to handle things.

That said, for the most basic possible solution, you should be able to search on the concatenation of those two fields:

SELECT UserID
FROM UserInfoTable
WHERE CONCAT(FirstName, ' ', LastName) LIKE '%Dave Smith%';

The CONCAT() MySQL function takes as many strings as you're concatenating as arguments. In the above case, we're just added a space between FirstName and LastName.

How to create a simple Mysql search engine

The % wildcard should be in the bound variable and not the prepared statement

$query = "SELECT * FROM news WHERE contenuto LIKE :contenuto"; 
$query_params = array(
':contenuto' => $_POST['contenuto'] . '%'
);

Creating search engine in PHP

It is 'cause { %$term% } means the query try search something that contain all of the string in $term. Try this :

$term = explode(" ",$term);
if (count($term) > 0) {
$Where = '';
foreach($term as $Item) {
$Where .= "Title like '%$Item%' OR ";
}
$Where = substr($Where,0,-4);
$query = mysql_query("SELECT * FROM `save_data` WHERE $Where");
}

In this way the search will check out all words.

How to search database like google in php

I build search engines.

I'm going to give you 6 tips to explore, so that you can continue to learn how to program & proceed if desired.

TIP #1: Focus!

First define what you're trying to accomplish. Think about what you really want to do, before trying to build a search engine from scratch. It may not be your end goal AKA what you really want to do.

Do you really want to crawl the web, with this idea: "Extract all url from sitemap.xml with PHP CURL"?

Or do you simply want to add a search box to your website, which gets product data from your product database & displays that product data on your website, with this idea: "I want to implement my own search feature into my website"?

It's kind of hard to tell.

If you want to add a product-based "search feature" to your website, then you don't need to extract content from an XML Sitemap. You'd simply retrieve it from a database like: MySQL, PosgreSQL, Oracle, SQL Server, etc... and display the results onto your search results page. That's usually what people are looking to do, when they want to add a "search feature" into their website.

TIP #2: For searching, simpler is faster.

This is good to remember, when writing code: Simple Always Wins. It's known as the "S.A.W. Principle".

First, let's look at your SQL. It has 2 select statements, which are joined together with a UNION keyword.

SELECT * FROM search_engine
WHERE soundex(keyword) LIKE soundex('%$q%')
UNION
SELECT * FROM search_engine
WHERE title LIKE '%$q%' OR link LIKE '%$q%'
ORDER BY `clicks` DESC

Since this looks like it's coming from the same database table, then you could combine it as follows... where the change is replacing UNION SELECT * FROM search_engine WHERE with OR:

SELECT * FROM search_engine
WHERE soundex(keyword) LIKE soundex('%$q%')
OR title LIKE '%$q%' OR link LIKE '%$q%'
ORDER BY `clicks` DESC

So if you can remove the UNION keyword & combine the 2 select statements into 1 select statement, then the database engine can do less work to fulfill the search query request.

If you're using 2 different tables, then you'll need to do some homework to look up a concept known as an inverted index. The concept is the same: Keep the search as simple as possible... so that the database server does as little work as possible... so that the search experience runs as fast as possible!

Even though simpler means faster, it doesn't mean more accuracy.

TIP #3: Accuracy makes a search engine more relevant for users. Think of this as accuracy = "powerful".

Let's look at these page titles & how a search query works with them:

  1. Extract all url from sitemap.xml with PHP CURL
  2. How to Extract all url from sitemap.xml with PHP CURL

For these results to match with accuracy, then you'd have to have the 2nd page's full title in your database. If you only have the 1st page's title (without the "How to "), then a query for the 2nd page's title won't find a result for the 1st page's title. That's the problem that you've noticed with your site's search feature.

The reason is that a query for the 1st page's title is an exact partial match inside of the 2nd page's title. However a query for the 2nd page's title is not an exact match, nor even a partial match of the 1st page's title.

To get around that problem, search engines work on the basis of keywords.

TIP #4: Learn about Keywords vs. Stop Words & how to parse them in your search query.

In a search query, there are both relevant key words known as keywords & non-relevant junk words, called stop words. You may want to investigate the concept of what stop words are & how search engines use them or most often, throw them away before the search query is actually performed.

So in your queries, these are your unique & meaningful keywords. They have self-contained concrete meanings, when you think of each word individually.

array('extract', 'url', 'sitemap.xml', 'PHP', 'CURL')

Concrete meanings:

  • Extract = Pull, grasp, grab something out of a group.
  • URL = A hyperlink.
  • sitemap.xml = An XML Sitemap file.
  • PHP = A programming language name.
  • CURL = Command-line URL fetch.

These are most likely the stop words, which have either no meaning to them by themselves or a vague meaning.

array('How', 'to', 'all', 'from', 'with');

Vague Meanings:

  • How = A simple lead-off to a question. So what does a search engine do with this? It throws it away.
  • To = A connecting word. It points to a group of something. Maybe useful. Maybe not. Toss it.
  • All = A group of everything. Possibly useful, but it seems vague to a search engine. Toss it.
  • From = Another connecting word. It points to a group of something else. Again vague. Toss it.
  • With = Including. Another connecting word. Also vague. A computer doesn't know to add "PHP" or "PHP CURL" after the with keyword. Bummer! Toss it.

Search engines usually strip stop words & query the meaningful keywords for results. A relevance score is how accurate the search results are.

Here is a hypothetical example (which I made up off of the top of my head while writing this): If a query finds a page with 1 of 5 unique keywords, then the relevancy score would be 20%. If it finds a page with 4 of 5 unique keywords, then the relevancy score would be 80%. It's hypothetical, because it's not how any specific search engine currently works. It's just a basic concept to explain a point, using a simple illustration.

The relevancy algorithm & score is really up to the search engine designer/builder to create. The relevancy algorithm can be as simple or as complex as the search engine designers and/or builders want to make it. Search engine developers can spend a lot of time fine-tuning that relevancy algorithm & score. It also depends on the search algorithm, which is used & how well the search bot finds data for those algorithms.

Tip #5: Explore building search bots!

You should look into building search bots, if you really want to accomplish this: "Extract all url from sitemap.xml with PHP CURL".

I've written a search bot too. It has already crawled over 1 million URLs!

PHP Curl isn't what extracts links. It's what fetches content from 1 URL. The search bot has to be written to parse the returned HTML, so that it can figure out what to extract from those search results.

Just a warning: People don't write perfect HTML syntax in their URLs. So your search bot will require a lot of fine tuning to get it to detect sloppy programming, which will crash your search bot. That is a huge time commitment! Just be ready for spending years on this project or even decades, if you decide to pursue building your own search bot. Building a search engine is a long journey! Your search bot WILL CRASH hundreds to thousands of times, before you can get it to crawl millions of URLs.

So... Do you really want to "Extract all url from sitemap.xml" or do you want to query a list of previously uploaded product data, which resides in your database? That latter database querying idea is A LOT FASTER to build & easier to maintain in the future!

Tip #6: If you don't want to spend a lot of time building a search engine from scratch, plus a search bot from scratch, plus a relevancy score algorithm from scratch, then look at some pre-built search engine solutions. Here are a few popular ones. They can be fun to play around with!

  1. Elastic Search
  2. Lucene
  3. Solr

Conclusion: Search engines are not easy to build! They can take years to build. Be ready for a significant time commitment (easily months, realistically years, possibly decades), if you really want to accomplish this goal: "I want to show results to user in all possible ways."



Related Topics



Leave a reply



Submit