Filtering a list of strings based on contents
This simple filtering can be achieved in many ways with Python. The best approach is to use "list comprehensions" as follows:
>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']
Another way is to use the filter
function. In Python 2:
>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']
In Python 3, it returns an iterator instead of a list, but you can cast it:
>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']
Though it's better practice to use a comprehension.
Filter a list of strings by a char in same position
The easiest, and likely quite efficient, way to do this would be to translate your pattern into a regular expression, if regular expressions are in your "toolbox". (The re
module is in the standard library.)
In a regular expression, .
matches any single character. So, we replace all _
s with .
s and add "^"
and "$"
to anchor the regular expression to the whole string.
import re
def filter_words(words, pattern, wrong_guesses):
re_pattern = re.compile("^" + re.escape(pattern).replace("_", ".") + "$")
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) match the pattern
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
re_pattern.match(word)
)
]
all_words = [
"cat",
"dog",
"mouse",
"horse",
"cow",
]
print(filter_words(all_words, "c_t", []))
print(filter_words(all_words, "c__", []))
print(filter_words(all_words, "c__", ["cat"]))
prints out
['cat']
['cat', 'cow']
['cow']
If you don't care for using regexps, you can instead translate the pattern to a dict mapping each defined position to the character that should be found there:
def filter_words_without_regex(words, pattern, wrong_guesses):
# get a map of the pattern's defined letters to their positions
letter_map = {i: letter for i, letter in enumerate(pattern) if letter != "_"}
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) have the correct letters in the correct positions
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
all(word[i] == ch for i, ch in letter_map.items())
)
]
The result is the same.
Python - filtering list of strings based on multiple conditions
The expression if 'Street' and 'Bike' in s
evaluates to True
all the time because what it is saying is: if 'Street'
, which is always True
because 'Street'
is Truthy. and 'Bike' in s
which is also always True
because 'Bike' is in all items in the list. So you need
if 'Street' in s and 'Bike' in s:
Substring filter list elements by another list in Python
List comprehension and any
:
[i for i in list1 if any(i for j in list2 if str(j) in i)]
any
to check if any element of list2
is a substring of the list1
item (__contains__
) being iterated over.
Example:
In [92]: list1 = ['bj-100-cy','bj-101-hd','sh-200-pd','sh-201-hp']
...: list2 = [100, 200]
...:
In [93]: [i for i in list1 if any(i for j in list2 if str(j) in i)]
Out[93]: ['bj-100-cy', 'sh-200-pd']
How to filter list of strings based on word length using list comprehensions?
You need to use nested list comprehensions, not a single list comprehension. The outer one is for the sentences, the inner one is for the words.
And you need to join with a space, not an empty string, to put a space between the words.
output = [' '.join([word for word in sentence.split() if len(word) > 2]) for sentence in l]
Beginner in Python - Filter list of strings based on condition
dna_2 = [i for i in range(len(dna)) if (dna[i].count('A'))== (dna[i].count('T')) and (dna[i].count('C'))== (dna[i].count('G'))]
i
here refers to the index (which you probably know but mistyped since you are using dna[i]
when calling .count
).
You can change it to dna_2 = [dna[i] for i ...]
or better yet, just iterate directly over the sequence strings instead of superficially using the indexes:
dna_2 = [sequence for sequence in dna if sequence.count('A') ... ]
Filtering a list based on string match
One option would be to use lapply
to filter the dataframe in your list like so:
lapply(result_abd, function(x) x[x$UP_DOWN %in% c("UP", "DOWN"), ])
#> $C1_ref_vs_C2
#> Symbol log2FoldChange UP_DOWN
#> 2 Cdc45 0.30915286 DOWN
#> 3 H19 1.80655193 UP
#> 5 Narf -0.66709244 DOWN
#> 7 Klf6 -0.08168849 DOWN
#> 8 Scmh1 -0.31652589 DOWN
#>
#> $C1_ref_vs_C3
#> Symbol log2FoldChange UP_DOWN
#> 2 Cdc45 0.30915286 DOWN
#> 3 H19 1.80655193 UP
#> 5 Narf -0.66709244 DOWN
#> 7 Klf6 -0.08168849 DOWN
#> 8 Scmh1 -0.31652589 DOWN
DATA
df <- structure(list(Symbol = c(
"Gnai3", "Cdc45", "H19", "Scml2", "Narf",
"Cav2", "Klf6", "Scmh1", "Cox5a", "Wnt9a"
), log2FoldChange = c(
0.07417434,
0.30915286, 1.80655193, -0.99676631, -0.66709244, 0.14435672,
-0.08168849, -0.31652589, 0.26321581, -0.50397731
), UP_DOWN = c(
"NS",
"DOWN", "UP", "NS", "DOWN", "NS", "DOWN", "DOWN", "NS", "NS"
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"
))
result_abd <- list(C1_ref_vs_C2 = df, C1_ref_vs_C3 = df)
Filter list of strings starting with specific keyword
You may use str.startswith(..)
along with a list comprehension to get the list of the words starting with some string as:
>>> PartialWord = "ab"
>>> WordList = ['absail', 'rehab', 'dolphin']
>>> [word for word in WordList if word.startswith(PartialWord)]
['absail']
As per the str.startswith
document:
str.startswith(prefix[, start[, end]]):
Return
True
if string starts
with the prefix, otherwise returnFalse
. prefix can also be atuple
of
prefixes to look for. With optional start, test string beginning at
that position. With optional end, stop comparing string at that
position.
How to filter a List of strings and split them and return an Array in Java 8
split returns an array
You need to apply flatMap()
in order to produce a stream of Strings
from a stream of Strings[]
.
And in order to collect the stream data into an array, you need to apply toArray()
, which expects a function that produces an array of the desired type, as terminal operation:
public static void main(String[] args) {
List<String> myList =
List.of("15:09:00 SOME TEXT SOME TEXT 088",
"15:09 SOME TEXT 1546 AMOUNT",
"15:09:06 SOME TEXT 1546 AMOUNT",
"13:03:00 SOME TEXT TEXT TEXT 00");
String[] parts = getParts(myList, "15:09:06", "\\s+");
System.out.println(Arrays.toString(parts));
}
public static String[] getParts(List<String> source, String prefix, String delimiter) {
return source.stream()
.filter(str -> str.startsWith(prefix)) // Stream<String>
.map(str -> str.split(delimiter)) // Stream<String[]>
.flatMap(Stream::of) // Stream<String>
.toArray(String[]::new);
}
Output
[15:09:06, SOME, TEXT, 1546, AMOUNT]
Related Topics
How to Interpret Conda Package Conflicts
Understanding Matplotlib.Subplots Python
Why Does '.Sort()' Cause the List to Be 'None' in Python
Regex Matching 5-Digit Substrings Not Enclosed with Digits
Calculating Direction of the Player to Shoot Pygame
Recommendations of Python Rest (Web Services) Framework
How to Save a Trained Model in Pytorch
Normalize Columns of a Dataframe
What's the Best Way to Parse Command Line Arguments
Making a Request to a Restful API Using Python
Pandas 'Count(Distinct)' Equivalent
Pandas Dataframe: Replace All Values in a Column, Based on Condition
How to Merge a Transparent Png Image with Another Image Using Pil
Pandas: Setting No. of Max Rows