Determine Prefix from a Set of (Similar) Strings

Determine prefix from a set of (similar) strings

Never rewrite what is provided to you: os.path.commonprefix does exactly this:

Return the longest path prefix (taken
character-by-character) that is a prefix of all paths in list. If list
is empty, return the empty string (''). Note that this may return
invalid paths because it works a character at a time.

For comparison to the other answers, here's the code:

# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1

How to get common prefix of strings in a list

Use os.path.commonprefix it will do exactly what you want.

In [1]: list = ['nomad', 'normal', 'nonstop', 'noob']

In [2]: import os.path as p

In [3]: p.commonprefix(list)
Out[3]: 'no'

As an aside, naming a list "list" will make it impossible to access the list class, so I would recommend using a different variable name.

Determine if one string is a prefix of another

Looks like you've got two different problems.

One is to determine if a string is contained as a prefix in another string. For this I would suggest using a function already implemented in the language's string library. In JavaScript you could do this

if (str2.indexOf(str1) === 0) {
// string str1 is a prefix of str2
}

See documentation for String.indexOf here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf

For the other problem, in a bunch of strings, find out which ones have a given string as a prefix, building a data structure like a Trie or the one you mention seems like the way to go, if you want fast look-ups.

String Match with prefixes

You can use a trie.

Here is an implementation:

class Trie(dict):
def add(self, s):
node = self
for ch in s:
if ch not in node:
node[ch] = Trie()
node = node[ch]
node["end"] = True

def findprefix(self, s):
node = self
len = 0
for i, ch in enumerate(s):
if "end" in node:
len = i
if ch not in node:
break
node = node[ch]
return s[:len]

trie = Trie()
for s in ["good", "goo", "go", "goodbyeparty"]:
trie.add(s)
print(trie.findprefix("goodbye")) # "good"```

Improve string - prefix matching performance

If the prefixes are known in advance and can be preprocessed, you might try a trie. Especially if they are going to be as short as 10 characters. That would mean each check is on the order of 10 comparisons. Not sure how much better one could do.

function buildTrie(trie, words){
for (let word of words){
let _trie = trie;

for (let i=0; i<word.length; i++){
const letter = word[i];
_trie[letter] = _trie[letter] || {};

if (i == word.length - 1)
_trie[letter]['leaf'] = true;

_trie = _trie[letter];
}
}

return trie;
}

function find(trie, str, i=0){
const letter = str[i];

if (!trie[letter])
return false;

if (trie[letter]['leaf'])
return true;

return find(trie[letter], str, i + 1);
}

const prefixes = [ "Hey", "Heya", "Hi", "Hola"];
const trie = buildTrie({}, prefixes)

console.log(trie)

console.log(find(trie, "Hey, I'm Michael"));
console.log(find(trie, "Heuy, I'm Michael"));

Longest Common Prefix from list elements in Python

No matter what, you need to look at each character from each string in turn (until you find a set of corresponding characters that doesn't match), so there's no benefit to splitting the list up. Just iterate through and break when the common prefix stops being common:

def common_prefix(strs) -> str:
prefix = ""
for chars in zip(*strs):
if len(set(chars)) > 1:
break
prefix += chars[0]
return prefix

print(common_prefix(["flowers", "flow", "flight"])) # fl

Check if one string is a prefix of another

Use std::mismatch. Pass in the shorter string as the first iterator range and the longer as the second iterator range. The return is a pair of iterators, the first is the iterator in the first range and the second, in the second rage. If the first is end of the first range, then you know the the short string is the prefix of the longer string e.g.

std::string foo("foo");
std::string foobar("foobar");

auto res = std::mismatch(foo.begin(), foo.end(), foobar.begin());

if (res.first == foo.end())
{
// foo is a prefix of foobar.
}

Get the longest common string prefix of all words in a list

>>> import os
>>> os.path.commonprefix(["flexible","flexile","flexion","flexor"])
'flex'


Related Topics



Leave a reply



Submit