Filtering a List of Strings Based on Contents

Filtering a list of strings based on contents

This simple filtering can be achieved in many ways with Python. The best approach is to use "list comprehensions" as follows:

>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']

Another way is to use the filter function. In Python 2:

>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']

In Python 3, it returns an iterator instead of a list, but you can cast it:

>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']

Though it's better practice to use a comprehension.

Filter a list of strings by a char in same position

The easiest, and likely quite efficient, way to do this would be to translate your pattern into a regular expression, if regular expressions are in your "toolbox". (The re module is in the standard library.)

In a regular expression, . matches any single character. So, we replace all _s with .s and add "^" and "$" to anchor the regular expression to the whole string.

import re

def filter_words(words, pattern, wrong_guesses):
re_pattern = re.compile("^" + re.escape(pattern).replace("_", ".") + "$")

# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) match the pattern
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
re_pattern.match(word)
)
]

all_words = [
"cat",
"dog",
"mouse",
"horse",
"cow",
]

print(filter_words(all_words, "c_t", []))
print(filter_words(all_words, "c__", []))
print(filter_words(all_words, "c__", ["cat"]))

prints out

['cat']
['cat', 'cow']
['cow']

If you don't care for using regexps, you can instead translate the pattern to a dict mapping each defined position to the character that should be found there:

def filter_words_without_regex(words, pattern, wrong_guesses):
# get a map of the pattern's defined letters to their positions
letter_map = {i: letter for i, letter in enumerate(pattern) if letter != "_"}
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) have the correct letters in the correct positions
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
all(word[i] == ch for i, ch in letter_map.items())
)
]

The result is the same.

Python - filtering list of strings based on multiple conditions

The expression if 'Street' and 'Bike' in s evaluates to True all the time because what it is saying is: if 'Street', which is always True because 'Street' is Truthy. and 'Bike' in s which is also always True because 'Bike' is in all items in the list. So you need

if 'Street' in s and 'Bike' in s:

Substring filter list elements by another list in Python

List comprehension and any:

[i for i in list1 if any(i for j in list2 if str(j) in i)]

any to check if any element of list2 is a substring of the list1 item (__contains__) being iterated over.

Example:

In [92]: list1 = ['bj-100-cy','bj-101-hd','sh-200-pd','sh-201-hp']
...: list2 = [100, 200]
...:

In [93]: [i for i in list1 if any(i for j in list2 if str(j) in i)]
Out[93]: ['bj-100-cy', 'sh-200-pd']

How to filter list of strings based on word length using list comprehensions?

You need to use nested list comprehensions, not a single list comprehension. The outer one is for the sentences, the inner one is for the words.

And you need to join with a space, not an empty string, to put a space between the words.

output = [' '.join([word for word in sentence.split() if len(word) > 2]) for sentence in l]

Beginner in Python - Filter list of strings based on condition

dna_2 = [i for i in range(len(dna)) if (dna[i].count('A'))== (dna[i].count('T')) and (dna[i].count('C'))== (dna[i].count('G'))]

i here refers to the index (which you probably know but mistyped since you are using dna[i] when calling .count).

You can change it to dna_2 = [dna[i] for i ...] or better yet, just iterate directly over the sequence strings instead of superficially using the indexes:

dna_2 = [sequence for sequence in dna if sequence.count('A') ... ]

Filtering a list based on string match

One option would be to use lapply to filter the dataframe in your list like so:

lapply(result_abd, function(x) x[x$UP_DOWN %in% c("UP", "DOWN"), ])
#> $C1_ref_vs_C2
#> Symbol log2FoldChange UP_DOWN
#> 2 Cdc45 0.30915286 DOWN
#> 3 H19 1.80655193 UP
#> 5 Narf -0.66709244 DOWN
#> 7 Klf6 -0.08168849 DOWN
#> 8 Scmh1 -0.31652589 DOWN
#>
#> $C1_ref_vs_C3
#> Symbol log2FoldChange UP_DOWN
#> 2 Cdc45 0.30915286 DOWN
#> 3 H19 1.80655193 UP
#> 5 Narf -0.66709244 DOWN
#> 7 Klf6 -0.08168849 DOWN
#> 8 Scmh1 -0.31652589 DOWN

DATA


df <- structure(list(Symbol = c(
"Gnai3", "Cdc45", "H19", "Scml2", "Narf",
"Cav2", "Klf6", "Scmh1", "Cox5a", "Wnt9a"
), log2FoldChange = c(
0.07417434,
0.30915286, 1.80655193, -0.99676631, -0.66709244, 0.14435672,
-0.08168849, -0.31652589, 0.26321581, -0.50397731
), UP_DOWN = c(
"NS",
"DOWN", "UP", "NS", "DOWN", "NS", "DOWN", "DOWN", "NS", "NS"
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"
))

result_abd <- list(C1_ref_vs_C2 = df, C1_ref_vs_C3 = df)

Filter list of strings starting with specific keyword

You may use str.startswith(..) along with a list comprehension to get the list of the words starting with some string as:

>>> PartialWord = "ab"
>>> WordList = ['absail', 'rehab', 'dolphin']

>>> [word for word in WordList if word.startswith(PartialWord)]
['absail']

As per the str.startswith document:

str.startswith(prefix[, start[, end]]):

Return True if string starts
with the prefix, otherwise return False. prefix can also be a tuple of
prefixes to look for. With optional start, test string beginning at
that position. With optional end, stop comparing string at that
position.

How to filter a List of strings and split them and return an Array in Java 8

split returns an array

You need to apply flatMap() in order to produce a stream of Strings from a stream of Strings[].

And in order to collect the stream data into an array, you need to apply toArray(), which expects a function that produces an array of the desired type, as terminal operation:

public static void main(String[] args) {
List<String> myList =
List.of("15:09:00 SOME TEXT SOME TEXT 088",
"15:09 SOME TEXT 1546 AMOUNT",
"15:09:06 SOME TEXT 1546 AMOUNT",
"13:03:00 SOME TEXT TEXT TEXT 00");

String[] parts = getParts(myList, "15:09:06", "\\s+");

System.out.println(Arrays.toString(parts));
}

public static String[] getParts(List<String> source, String prefix, String delimiter) {
return source.stream()
.filter(str -> str.startsWith(prefix)) // Stream<String>
.map(str -> str.split(delimiter)) // Stream<String[]>
.flatMap(Stream::of) // Stream<String>
.toArray(String[]::new);
}

Output

[15:09:06, SOME, TEXT, 1546, AMOUNT]


Related Topics



Leave a reply



Submit