Pandas Select from Dataframe Using Startswith

pandas select from Dataframe using startswith

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0 a
1 ab
2 c
3 11
4 NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0 True
1 True
2 False
3 False
4 False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0 a
1 ab
dtype: object

.

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

Python 3 Pandas Select Dataframe using Startswith + or

Try this:

df[df['Office'].str.contains("^(?:N|M|V|R)")]

or:

df[df['Office'].str.contains("^[NMVR]+")]

Demo:

In [91]: df
Out[91]:
Office
0 No-No
1 AAAA
2 MicroHard
3 Valley
4 vvvvv
5 zzzzzzzzzz
6 Risk is fun

In [92]: df[df['Office'].str.contains("^(?:N|M|V|R)")]
Out[92]:
Office
0 No-No
2 MicroHard
3 Valley
6 Risk is fun

In [93]: df[df['Office'].str.contains("^[NMVR]+")]
Out[93]:
Office
0 No-No
2 MicroHard
3 Valley
6 Risk is fun

Selecting columns with startswith in pandas

Convert to Series is not necessary, but if want add to another list of columns convert output to list:

cols =  df.columns[df.columns.str.startswith('t')].tolist()

df = df[['score','obs'] + cols].rename(columns = {'treatment':'treat'})

Another idea is use 2 masks and chain by | for bitwise OR:

Notice:

Columns names are filtered from original columns names before rename in your solution, so is necessary rename later.

m1 = df.columns.str.startswith('t')
m2 = df.columns.isin(['score','obs'])

df = df.loc[:, m1 | m2].rename(columns = {'treatment':'treat'})
print (df)
obs treat score tr tk
0 1 0 strong 1 6
1 2 1 weak 2 7
2 3 0 normal 3 8
3 1 1 weak 4 9
4 2 0 strong 5 10

If need rename first, is necessary reassign back for filter by renamed columns names:

df = df.rename(columns = {'treatment':'treat'})
df = df.loc[:, df.columns.str.startswith('t') | df.columns.isin(['score','obs'])]

Python.pandas: how to select rows where objects start with letters 'PL'

If you use a string method on the Series that should return you a true/false result. You can then use that as a filter combined with .loc to create your data subset.

new_df = df.loc[df[‘Code’].str.startswith('pl')].copy()

Create a loop startswith specific string pandas

The cool thing about pandas is that you don't have to do these things in a loop.

import pandas as pd
data = [
['Closed', 'j.snow', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'j.doe', 'General Issue', 'Master Group'],
['Closed', 'j.snow', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'm.smith', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'a.richards', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'a.blecha', 'General Issue', 'Master Group'],
['Closed', 'r.kipling', 'Wrong Data. Contact your admin', 'First Group']
]

df = pd.DataFrame(data, columns=['status', 'created', 'short_desc', 'group'])
print(df)
# Pick only those rows where short_desc starts with "Wrong".
df1 = df[df['short_desc'].str.startswith('Wrong')]
# Pick only those rows where group is "Specific Group".
df1 = df1[df1['group']=='Specific Group']
# Print the "short_desc" column.
print(df1['short_desc'])

Or, in a single line:

df1 = df[
(df['short_desc'].str.startswith('Wrong')) &
(df['group']=='Specific Group')
]

This is pandas' "magic indexing". Those comparison operators return an array of booleans, True where the condition is true. When passing that to df[...], that returns only the rows where the array element is True.

DataFrame value startswith

By default a Series is returned when accessing a specific column and row in a DataFrame if you want a scalar value then you can access the array element using .values to return np array and then indexing into it:

In [101]:
df.loc[df["Event"].str.startswith("Bericht"), "Datum"].values[0]

Out[101]:
'15.05.2017'

For safety you should check whether your selection yields any results prior to indexing into it, otherwise you get a KeyError:

if len(df.loc[df["Event"].str.startswith("Bericht"), "Datum"]) > 0:
return df.loc[df["Event"].str.startswith("Bericht"), "Datum"].values[0]

want to know how to use startswith in python dataframe by considering two columns

You could use np.select:

conditions = [(df["A"].str[:4].isin(["INKA", "IDKA"]))|(df["B"].str[:4].isin(["INKA", "IDKA"])),
(df["A"].str[:4].isin(["INAP", "IDAP"]))|(df["B"].str[:4].isin(["INAP", "IDAP"])),
(df["A"].str[:4].isin(["INRJ", "IDRJ"]))|(df["B"].str[:4].isin(["INRJ", "IDRJ"]))]

df["C"] = np.select(conditions, ["KAR", "AP", "RAJ"], None)

Alternatively, you could use map and combine_first:

mapper = {"INKA": "KAR", "IDKA": "KAR", "INAP": "AP", "IDAP": "AP", "INRJ": "RAJ", "IDRJ": "RAJ"
df["C"] = df["A"].str[:4].map(mapper).combine_first(df["B"].str[:4].map(mapper))


Related Topics



Leave a reply



Submit