Pandas Select from Dataframe Using Startswith

pandas select from Dataframe using startswith

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

Python 3 Pandas Select Dataframe using Startswith + or

Try this:

df[df['Office'].str.contains("^(?:N|M|V|R)")]

or:

df[df['Office'].str.contains("^[NMVR]+")]

Demo:

In [91]: df
Out[91]:
        Office
0        No-No
1         AAAA
2    MicroHard
3       Valley
4        vvvvv
5   zzzzzzzzzz
6  Risk is fun

In [92]: df[df['Office'].str.contains("^(?:N|M|V|R)")]
Out[92]:
        Office
0        No-No
2    MicroHard
3       Valley
6  Risk is fun

In [93]: df[df['Office'].str.contains("^[NMVR]+")]
Out[93]:
        Office
0        No-No
2    MicroHard
3       Valley
6  Risk is fun

Selecting columns with startswith in pandas

Convert to Series is not necessary, but if want add to another list of columns convert output to list:

cols =  df.columns[df.columns.str.startswith('t')].tolist()

df = df[['score','obs'] + cols].rename(columns = {'treatment':'treat'})

Another idea is use 2 masks and chain by | for bitwise OR:

Notice:

Columns names are filtered from original columns names before rename in your solution, so is necessary rename later.

m1 = df.columns.str.startswith('t')
m2 = df.columns.isin(['score','obs'])

df = df.loc[:, m1 | m2].rename(columns = {'treatment':'treat'})
print (df)
   obs  treat   score  tr  tk
0    1      0  strong   1   6
1    2      1    weak   2   7
2    3      0  normal   3   8
3    1      1    weak   4   9
4    2      0  strong   5  10

If need rename first, is necessary reassign back for filter by renamed columns names:

df = df.rename(columns = {'treatment':'treat'})
df = df.loc[:, df.columns.str.startswith('t') | df.columns.isin(['score','obs'])]

Python.pandas: how to select rows where objects start with letters 'PL'

If you use a string method on the Series that should return you a true/false result. You can then use that as a filter combined with .loc to create your data subset.

new_df = df.loc[df[‘Code’].str.startswith('pl')].copy()

Create a loop startswith specific string pandas

The cool thing about pandas is that you don't have to do these things in a loop.

import pandas as pd
data = [
    ['Closed', 'j.snow', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'j.doe', 'General Issue', 'Master Group'],
    ['Closed', 'j.snow', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'm.smith', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'a.richards', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'a.blecha', 'General Issue', 'Master Group'],
    ['Closed', 'r.kipling', 'Wrong Data.  Contact your admin', 'First Group']
]

df = pd.DataFrame(data, columns=['status', 'created', 'short_desc', 'group'])
print(df)
# Pick only those rows where short_desc starts with "Wrong".
df1 = df[df['short_desc'].str.startswith('Wrong')]
# Pick only those rows where group is "Specific Group".
df1 = df1[df1['group']=='Specific Group'] 
# Print the "short_desc" column.
print(df1['short_desc'])

Or, in a single line:

df1 = df[
        (df['short_desc'].str.startswith('Wrong')) &
        (df['group']=='Specific Group')
    ]

This is pandas' "magic indexing". Those comparison operators return an array of booleans, True where the condition is true. When passing that to df[...], that returns only the rows where the array element is True.

DataFrame value startswith

By default a Series is returned when accessing a specific column and row in a DataFrame if you want a scalar value then you can access the array element using .values to return np array and then indexing into it:

In [101]:
df.loc[df["Event"].str.startswith("Bericht"), "Datum"].values[0]

Out[101]:
'15.05.2017'

For safety you should check whether your selection yields any results prior to indexing into it, otherwise you get a KeyError:

if len(df.loc[df["Event"].str.startswith("Bericht"), "Datum"]) > 0:
   return df.loc[df["Event"].str.startswith("Bericht"), "Datum"].values[0]

want to know how to use startswith in python dataframe by considering two columns

You could use np.select:

conditions = [(df["A"].str[:4].isin(["INKA", "IDKA"]))|(df["B"].str[:4].isin(["INKA", "IDKA"])),
              (df["A"].str[:4].isin(["INAP", "IDAP"]))|(df["B"].str[:4].isin(["INAP", "IDAP"])),
              (df["A"].str[:4].isin(["INRJ", "IDRJ"]))|(df["B"].str[:4].isin(["INRJ", "IDRJ"]))]

df["C"] = np.select(conditions, ["KAR", "AP", "RAJ"], None)

Alternatively, you could use map and combine_first:

mapper = {"INKA": "KAR", "IDKA": "KAR", "INAP": "AP", "IDAP": "AP", "INRJ": "RAJ", "IDRJ": "RAJ"
df["C"] = df["A"].str[:4].map(mapper).combine_first(df["B"].str[:4].map(mapper))

Pandas Select from Dataframe Using Startswith