Command-Line Fulltext Indexing

How to find all files containing specific text (string) on Linux?

Do the following:

grep -rnw '/path/to/somewhere/' -e 'pattern'
  • -r or -R is recursive,
  • -n is line number, and
  • -w stands for match the whole word.
  • -l (lower-case L) can be added to just give the file name of matching files.
  • -e is the pattern used during the search

Along with these, --exclude, --include, --exclude-dir flags could be used for efficient searching:

  • This will only search through those files which have .c or .h extensions:

    grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
  • This will exclude searching all the files ending with .o extension:

    grep --exclude=\*.o -rnw '/path/to/somewhere/' -e "pattern"
  • For directories it's possible to exclude one or more directories using the --exclude-dir parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:

    grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/search/' -e "pattern"

This works very well for me, to achieve almost the same purpose like yours.

For more options, see man grep.

Fast search for text in files in a directory in unix?

This might be overkill for your purposes, but Beagle allows you to perform very fast searches of local files. It's usually marketed as a desktop application, but in fact it is just a daemon that can respond to requests from the command-line using beagle-query.

How to create a FULLTEXT index for two tables in MySQL?

No.

But... It would make sense to collect the various columns from the various tables together in a single column and apply a FULLTEXT index to it.

Assuming there are many tags for each item, you could initialize such via:

CREATE search_info ( PRIMARY KEY(name) )
SELECT name,
CONCAT(name, ' ',
( SELECT GROUP_CONCAT(tags) FROM iteminfo
WHERE code = items.code ) ) AS search
FROM items;

Then

ALTER TABLE search_info ADD FULLTEXT(search);

(After that, changes to items or iteminfo would need to also modify search_info.)

How can I select MySQL fulltext index entries for automplete functionality?

No, not that I know of. It would be a great feature though.

I built a search interface with autocomplete on top of MySQL. I run a daily job that scans all columns in all tables that I want to search in, extract words with regular expressions, then store the words in a separate table. I also have a many-to-many table with one column to hold the id of the object, and one column to hold the id of the word so as to record the fact that "word is part of text belonging to object".

The autocomplete works by taking the words typed into the box, and then generating a query that goes like:

SELECT     obj.title
FROM obj_word
INNER JOIN obj
ON obj_word.obj_id = obj.id
INNER JOIN word
ON obj_word.word_id = word.id
WHERE word.word IN ('word1', 'word2', 'word3') -- generated dynamically, word1 etc are typed by the user
GROUP BY obj.id
HAVING COUNT(DISTINCT word.id) = 3 -- the 3 is generated, because user typed 3 words.

This works fairly well for me, but I don't have huge amounts of data to work with.

(the actual implementation is slightly fancier, beause the last word is matched with LIKE to allow partial matches)

EDIT:

I just learned that the myisam_ft_dump utility may be used to extract a list of words from the index file. The command line goes something like this:

myisam_ftdump -d film_text 1  > D:\tmp\out.txt

Here, -d means dump (get a list of all entries), film_text is the name of a MyISAM table with a full text index, 1 is one, and ordinal identifying which index you want to dump.

I must say, the utility works, but I am not surer it is fast enough to use this for pulling a live list for autocompletion. You could of course have a periodical job that runs the command and dumps it to file. Unfortunately this dumps index entries not individual, unique words.

My hunch is you could use this utility as a means to extract the words, but it will need processing to turn it into a proper autocomplete list.

Can MySQL fulltext search return an index(position) instead of a score?

Fulltext searching is a scoring function. its not a search for occurrence function. In other words the highest scoring result may not have a starting position for the match. As it may be a combination of weighted results of different matches within the text. if you include query expansion the search for word/s may not even appear in the result!
http://dev.mysql.com/doc/refman/5.0/en/fulltext-query-expansion.html

I hope that makes some sense.

Anyway your best bet is to take the results and then use some text searching function to find the first occurrence of the first matching word. My guess is that would be best suited to a text processing language like perl or a more general language like php or what ever language you are using to run the query.

DC



Related Topics



Leave a reply



Submit