Extract Text After "/" in a Data Frame Column

How to extract part of a string in Pandas column and make a new column

Use str.extract with a regex and str.replace to rename values:

dff['Version_short'] = dff['Name'].str.extract('_(V\d+)$').fillna('')
dff['Version_long'] = dff['Version_short'].str.replace('V', 'Version ')

Output:

>>> dff
    col1  col3            Name        Date Version_short Version_long
0      1     1  2a df a1asd_V1  2021-06-13            V1    Version 1
1      2    22    xcd a2asd_V3  2021-06-13            V3    Version 3
2      3    33   23vg aabsd_V1  2021-06-13            V1    Version 1
3      4    44  dfgdf_aabsd_V0  2021-06-14            V0    Version 0
4      5    55      a3as  d_V1  2021-06-15            V1    Version 1
5     60    60       aa bsd_V3  2021-06-15            V3    Version 3
6      0     1         aasd_V4  2021-06-13            V4    Version 4
7      0     5        aabsd_V4  2021-06-16            V4    Version 4
8      6     6   aa_adn sd_V15  2021-06-13           V15   Version 15
9      3     3             NaN  2021-06-13                           
10     2     2        aasd_V12  2021-06-13           V12   Version 12
11     4     4      aasd120Abs  2021-06-16

How to extract entire part of string after certain character in dataframe column?

Use str.split, and extract the last slice with -1 (also gracefully handles false cases):

df = pd.DataFrame(columns=[
    'data.answers.1234567890.value.0987654321', 'blahblah.value.12345', 'foo'])

df.columns = df.columns.str.split('value.').str[-1]
df.columns
# Index(['0987654321', '12345', 'foo'], dtype='object')

Another alternative is splitting inside a listcomp:

df.columns = [x.split('value.')[-1] for x in df.columns]
df.columns
# Index(['0987654321', '12345', 'foo'], dtype='object')

Extract elements from data column (String) before and after character

I am not really sure if this is what you want, but it does the work:

regions = []
for i in df['Region'].str.split('.').str[0]:
    regions.append(''.join([d for d in i if d.isdigit()]))

df['BGC Region'] = df['Strain'].str.split('_').str[2] + '_' + regions + '.region'

region_number = df['Region'].str.split('.').str[1]
for i, rn in enumerate(region_number):
    if int(rn) < 10:
        df['BGC Region'][i] += '00' + rn
    elif int(rn) < 100:
        df['BGC Region'][i] += '0' + rn

Extracting Specific Text From column in dataframe

We can use regex to extract the necessary part of the string.

Here we are checking for atleast one [A-C] and 0 or more[0-9]

data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]*)')

or (based on need)

data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]+)')

Output

    Description             extract
0   ABC12345679 132465      ABC12345679
1   Test ABC12346548        ABC12346548
2   Test ABC1231321 4645    ABC1231321

To Extract Substring from Column of DataFrame

Try with str.findall:

>>> df["NE Name"].str.findall(r"/([^/]{4})")
0                      [01HJ]
1    [01HL, 02HL, 03HL, 10HL]
2    [01HL, 02HL, 03HL, 10HL]
3    [01HL, 02HL, 03HL, 10HL]
4    [01HL, 02HL, 03HL, 10HL]
Name: NE Name, dtype: object

Input DataFrame:

>>> df
                                                     NE Name     Subrack ID  pattern
0   10100000/01HJ   0   01HJ
1   10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL               1     01HJ
2   10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL               0     01HJ
3   10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL               2     01HJ
4   10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL               3     01HJ

Python pandas: remove everything after a delimiter in a string

You can use pandas.Series.str.split just like you would use split normally. Just split on the string '::', and index the list that's created from the split method:

>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]})
>>> df
                 text
0  vendor a::ProductA
1  vendor b::ProductA
2  vendor a::Productb
>>> df['text_new'] = df['text'].str.split('::').str[0]
>>> df
                 text  text_new
0  vendor a::ProductA  vendor a
1  vendor b::ProductA  vendor b
2  vendor a::Productb  vendor a

Here's a non-pandas solution:

>>> df['text_new1'] = [x.split('::')[0] for x in df['text']]
>>> df
                 text  text_new text_new1
0  vendor a::ProductA  vendor a  vendor a
1  vendor b::ProductA  vendor b  vendor b
2  vendor a::Productb  vendor a  vendor a

Edit: Here's the step-by-step explanation of what's happening in pandas above:

# Select the pandas.Series object you want
>>> df['text']
0    vendor a::ProductA
1    vendor b::ProductA
2    vendor a::Productb
Name: text, dtype: object

# using pandas.Series.str allows us to implement "normal" string methods 
# (like split) on a Series
>>> df['text'].str
<pandas.core.strings.StringMethods object at 0x110af4e48>

# Now we can use the split method to split on our '::' string. You'll see that
# a Series of lists is returned (just like what you'd see outside of pandas)
>>> df['text'].str.split('::')
0    [vendor a, ProductA]
1    [vendor b, ProductA]
2    [vendor a, Productb]
Name: text, dtype: object

# using the pandas.Series.str method, again, we will be able to index through
# the lists returned in the previous step
>>> df['text'].str.split('::').str
<pandas.core.strings.StringMethods object at 0x110b254a8>

# now we can grab the first item in each list above for our desired output
>>> df['text'].str.split('::').str[0]
0    vendor a
1    vendor b
2    vendor a
Name: text, dtype: object

I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.

Extracting text after a phrase and in between spaces from Pandas Dataframe

You get the match Jacobs as the pattern (\w+(?=\s+FLEX\s)) matches 1+ word characters asserting what is directly to the right is whitespace chars followed by FLEX.

Instead, you can use a pattern with a capture group to match 2 words after FLEX:

\bFLEX\s+(\w+\s+\w+)

Regex demo

Or a broader match:

\bFLEX\s+(\S+\s+\S+)

\bFLEX A word boundary, match FLEX
\s+ Match 1+ whitespace chars
(\S+\s+\S+) Capture group 1 match 1+ non whitespace chars, 1+ whitespace chars and again 1+ non whitespace chars

See a regex demo.

import pandas as pd

strings = ['QB Aaron Rodgers RB Josh Jacobs FLEX Davante Adams']
df = pd.DataFrame(strings, columns=["Lineup"])
df['Lineup'] = df["Lineup"].str.extract(r'\bFLEX\s+(\S+\s+\S+)')
print(df)

Output

          Lineup
0  Davante Adams

If you want to match 2 or more words, you could use a repeating non capture group:

\bFLEX\s+(\w+(?:\s+\w+)+)

Extract a certain part of a string after a key phrase using pandas?

You can use the Series str.extract string method:

In [11]: df = pd.DataFrame([["(12:25) (No Huddle Shotgun) P.Manning pass short left to W.Welker pushed ob at DEN 34 for 10 yards (C.Graham)."]])

In [12]: df
Out[12]:
                                                   0
0  (12:25) (No Huddle Shotgun) P.Manning pass sho...

This will "extract" what's it the group (inside the parenthesis):

In [13]: df[0].str.extract("for (\d+)")
Out[13]:
0    10
Name: 0, dtype: object

In [14]: df[0].str.extract("for (\d+) yards")
Out[14]:
0    10
Name: 0, dtype: object

You'll need to convert to int, e.g. using astype(int).

Pandas DataFrame - Extract string between two strings and include the first delimiter

you can accomplish this all within the regex without having to use string slicing.

df['field'] = df.string_value.str.extract('(FILE.*(?=.txt))')

FILE is the what we begin the match on
.* grabs any number of characters
(?=) is a lookahead assertion that matches without
consuming.

Handy regex tool https://pythex.org/

Extracting number from string only when string is present in a dataframe

Use Series.str.extract with the regex pattern r'(?:^|\s)(\d+):

(?:^|\s) matches the beginning of the string ('^') or ('|') any whitespace character ('\s') without capturing it ((?:...))
(\d+) captures one or more digit (greedy)

df['Item Code'] = df['Item Code'].str.extract(r'(?:^|\s)(\d+)', expand=False)

Note that the values of 'Item Code' are still stings after the extraction. If you want to convert them to integers use Series.astype.

df['Item Code'] = df['Item Code']str.extract(r'(?:\s|^)(\d+)', expand=False).astype(int)