pandas select from Dataframe using startswith
You can use the str.startswith
DataFrame method to give more consistent results:
In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])
In [12]: s
Out[12]:
0 a
1 ab
2 c
3 11
4 NaN
dtype: object
In [13]: s.str.startswith('a', na=False)
Out[13]:
0 True
1 True
2 False
3 False
4 False
dtype: bool
and the boolean indexing will work just fine (I prefer to use loc
, but it works just the same without):
In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0 a
1 ab
dtype: object
.
It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...
Python 3 Pandas Select Dataframe using Startswith + or
Try this:
df[df['Office'].str.contains("^(?:N|M|V|R)")]
or:
df[df['Office'].str.contains("^[NMVR]+")]
Demo:
In [91]: df
Out[91]:
Office
0 No-No
1 AAAA
2 MicroHard
3 Valley
4 vvvvv
5 zzzzzzzzzz
6 Risk is fun
In [92]: df[df['Office'].str.contains("^(?:N|M|V|R)")]
Out[92]:
Office
0 No-No
2 MicroHard
3 Valley
6 Risk is fun
In [93]: df[df['Office'].str.contains("^[NMVR]+")]
Out[93]:
Office
0 No-No
2 MicroHard
3 Valley
6 Risk is fun
Selecting columns with startswith in pandas
Convert to Series
is not necessary, but if want add to another list of columns convert output to list
:
cols = df.columns[df.columns.str.startswith('t')].tolist()
df = df[['score','obs'] + cols].rename(columns = {'treatment':'treat'})
Another idea is use 2 masks and chain by |
for bitwise OR
:
Notice:
Columns names are filtered from original columns names before rename
in your solution, so is necessary rename later.
m1 = df.columns.str.startswith('t')
m2 = df.columns.isin(['score','obs'])
df = df.loc[:, m1 | m2].rename(columns = {'treatment':'treat'})
print (df)
obs treat score tr tk
0 1 0 strong 1 6
1 2 1 weak 2 7
2 3 0 normal 3 8
3 1 1 weak 4 9
4 2 0 strong 5 10
If need rename
first, is necessary reassign back for filter by renamed columns names:
df = df.rename(columns = {'treatment':'treat'})
df = df.loc[:, df.columns.str.startswith('t') | df.columns.isin(['score','obs'])]
Python.pandas: how to select rows where objects start with letters 'PL'
If you use a string method on the Series that should return you a true/false result. You can then use that as a filter combined with .loc to create your data subset.
new_df = df.loc[df[‘Code’].str.startswith('pl')].copy()
Create a loop startswith specific string pandas
The cool thing about pandas is that you don't have to do these things in a loop.
import pandas as pd
data = [
['Closed', 'j.snow', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'j.doe', 'General Issue', 'Master Group'],
['Closed', 'j.snow', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'm.smith', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'a.richards', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'a.blecha', 'General Issue', 'Master Group'],
['Closed', 'r.kipling', 'Wrong Data. Contact your admin', 'First Group']
]
df = pd.DataFrame(data, columns=['status', 'created', 'short_desc', 'group'])
print(df)
# Pick only those rows where short_desc starts with "Wrong".
df1 = df[df['short_desc'].str.startswith('Wrong')]
# Pick only those rows where group is "Specific Group".
df1 = df1[df1['group']=='Specific Group']
# Print the "short_desc" column.
print(df1['short_desc'])
Or, in a single line:
df1 = df[
(df['short_desc'].str.startswith('Wrong')) &
(df['group']=='Specific Group')
]
This is pandas' "magic indexing". Those comparison operators return an array of booleans, True where the condition is true. When passing that to df[...]
, that returns only the rows where the array element is True.
DataFrame value startswith
By default a Series
is returned when accessing a specific column and row in a DataFrame
if you want a scalar value then you can access the array element using .values
to return np
array and then indexing into it:
In [101]:
df.loc[df["Event"].str.startswith("Bericht"), "Datum"].values[0]
Out[101]:
'15.05.2017'
For safety you should check whether your selection yields any results prior to indexing into it, otherwise you get a KeyError
:
if len(df.loc[df["Event"].str.startswith("Bericht"), "Datum"]) > 0:
return df.loc[df["Event"].str.startswith("Bericht"), "Datum"].values[0]
want to know how to use startswith in python dataframe by considering two columns
You could use np.select
:
conditions = [(df["A"].str[:4].isin(["INKA", "IDKA"]))|(df["B"].str[:4].isin(["INKA", "IDKA"])),
(df["A"].str[:4].isin(["INAP", "IDAP"]))|(df["B"].str[:4].isin(["INAP", "IDAP"])),
(df["A"].str[:4].isin(["INRJ", "IDRJ"]))|(df["B"].str[:4].isin(["INRJ", "IDRJ"]))]
df["C"] = np.select(conditions, ["KAR", "AP", "RAJ"], None)
Alternatively, you could use map
and combine_first
:
mapper = {"INKA": "KAR", "IDKA": "KAR", "INAP": "AP", "IDAP": "AP", "INRJ": "RAJ", "IDRJ": "RAJ"
df["C"] = df["A"].str[:4].map(mapper).combine_first(df["B"].str[:4].map(mapper))
Related Topics
Relative Imports - Modulenotfounderror: No Module Named X
Ignore Python Multiple Return Value
Replacing Column Values in a Pandas Dataframe
Django Passing Custom Form Parameters to Formset
How to Uninstall Anaconda Completely from MACos
How to Get a Random Number Between a Float Range
Multiprocessing: Sharing a Large Read-Only Object Between Processes
Importing an Ipynb File from Another Ipynb File
Find and Replace String Values in List
How to Verify If One List Is a Subset of Another
Python String Prints as [U'String']
How to Sandbox Python in Pure Python
Find_Element_By_* Commands Are Deprecated in Selenium
Numpy Array Assignment with Copy