Good Database and Structure to Store Synonyms

Good database and structure to store synonyms

Keeping it very simple, you could create a table of words and a relationship table for synonyms. You could consider adding a restriction like WordID1 < WordID2 to prevent repeating pairs of words (e.g, (1,2) and (2,1)).

Sample Image

data structure for storing synonyms

I don't know how complex your rules need to be, but if it is as simple as your example, then you could use a simple relational model like so:

Sample Image

Your word list (Term) contains all of the words like "cell", "mobile" etc. Your rule table (Context) lists the domain in which the terms are being used ("calling", "biology", etc.) and the linking table (Synonym) joins two Terms in one Context.

Best way to store and retrieve synonyms in database mysql

Don't use a (one) string to store different entries.

In other words: Build a word table (word_ID,word) and a synonym table (word_ID,synonym_ID) then add the word to the word table and one entry per synonym to the synonyms table.

UPDATE (added 3rd synonym)

Your word table must contain every word (ALL), your synonym table only holds pointers to synonyms (not a single word!) ..

If you had three words: A, B and C, that are synonyms, your DB would be

WORD_TABLE            SYNONYM_TABLE
ID | WORD W_ID | S_ID
---+----- -----+-------
1 | A 1 | 2
2 | B 2 | 1
3 | C 1 | 3
3 | 1
2 | 3
3 | 2

Don't be afraid of the many entries in the SYNONYM_TABLE, they will be managed by the computer and are needed to reflect the existing relations between the words.



2nd approach

You might also be tempted (I don't think you should!) to go with one table that has separate fields for word and a list of synonyms (or IDs) (word_id,word,synonym_list). Beware that that is contrary to the way a relational DB works (one field, one fact).

Database structure for handling synonyms in a search engine

For single-word searches you can maintain a list of synonyms for a given word and then search on those as well. Here's a good way of doing that:Best way to store and retrieve synonyms in database mysql

When you start talking multi-word searches where each word can have it's own synonyms you're looking at a whole different beast because determining the best match is very difficult. A good search algorithm will use a weighting system to determine the best matches.

IE, if it finds a match on the original keyword in the title, it'll return that before it does a synonym.

Data structure for storing keywords and synonyms in C#?

will searching [the list] be too slow?

When you are talking about 10..15 keywords, it is hard to come up with an algorithm inefficient enough to make end-users notice the slowness. There's simply not enough data to slow down a modern CPU.

One approach would be to build a Dictionary<string,string> that maps every synonym to its "canonical" keyword. This would include the canonical version itself:

var keywords = new Dictionary<string,string> {
["keyword1"] = "keyword1"
, ["synonym1"] = "keyword1"
, ["synonym2"] = "keyword1"
, ["keyword2"] = "keyword2"
, ["synonym3"] = "keyword2"
, ["keyword3"] = "keyword3"
};

Note how both keywords and synonyms appear as keys, while only keywords appear as values. This lets you look up a keyword or synonym, and get back a guaranteed keyword.

Best data structure for implementing a dictionary?

Depending on what you want to do, there are many good data structures.

If you just want to store the words and ask "is this word here or not?", a standard hash table with no other fancy machinery is a reasonable approach. If that word is list fixed in advance, consider using a perfect hash table to get excellent performance and space usage.

If you want to be able to check if a given prefix exists while supporting fast lookups, a trie is a good option, though it can be a bit space-inefficient. It also supports fast insertions or deletions. It also allows for iteration in alphabetical order, which hashing doesn't offer. This is essentially the structure you've described in your answer, but depending on the use case other representations of tries might be better.

If in addition to the above, you know for a fact that the word list is fixed, consider using a DAWG (directed acyclic word graph), which is essentially a minimum-state DFA for the language. It's substantially more compact than the trie, but supports many of the same operations.

If you want trie-like behavior but don't want to pay a huge space penalty, the ternary search tree is another viable option, as is the radix tree. These are very different structures, but can be much better than the trie in different circumstances.

If space is a concern but you want a trie, look into the succinct trie representation, which has slower lookups but just about theoretically optimal space usage. The link discusses how it's being used in JavaScript as an easy way to transmit a huge amount of data. An alternative compact representation is the double-array trie, though admittedly I know very little about it.

If you want to use the dictionary for operations like spell-checking where you need to find words similar to other words, the BK-tree is an excellent data structure to consider.

synonym data structure for searching

Access is O(1). But building a data structure looks like generating
duplicate items. A better data structure where only one entry is
stored.

You can use two data structures. One for storing them and one for lookup.
One vector of vectors that contains all the synonyms of a word. And a hashmap that points to the container with all of the synonyms for O(1) lookup.

So you would store your synonyms in a data structure like this (a list of lists of strings):

{{"USA","North America","United States"},{"Tiny","Small"},{"Great","Good"}}

And then you would have a hashmap, so if you search for "USA" you will get the first list. If you search for "Small" you will get the second list.

"USA"->{"USA","North America","United States"}
"Small"->{"Small","Tiny"}

The data in the hashmap is only a reference to the list of synonyms that you saved in the other data structure.



Related Topics



Leave a reply



Submit