Splitting a Pandas Dataframe Column by Delimiter

Splitting a pandas dataframe column by delimiter

Use vectoried str.split with expand=True:

In [42]:
df[['V','allele']] = df['V'].str.split('-',expand=True)
df

Out[42]:
ID Prob V allele
0 3009 1.0000 IGHV7 B*01
1 129 1.0000 IGHV7 B*01
2 119 0.8000 IGHV6 A*01
3 120 0.8056 GHV6 A*01
4 121 0.9000 IGHV6 A*01
5 122 0.8050 IGHV6 A*01
6 130 1.0000 IGHV4 L*03
7 3014 1.0000 IGHV4 L*03
8 266 0.9970 IGHV5 A*01
9 849 0.4010 IGHV5 A*04
10 174 1.0000 IGHV6 A*02
11 844 1.0000 IGHV6 A*02

Pandas split column into multiple columns by comma

In case someone else wants to split a single column (deliminated by a value) into multiple columns - try this:

series.str.split(',', expand=True)

This answered the question I came here looking for.

Credit to EdChum's code that includes adding the split columns back to the dataframe.

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

Note: The first argument df[[0]] is DataFrame.

The second argument df[1].str.split is the series that you want to split.

split Documentation

concat Documentation

split pandas column into many using delimiter from right to left

Like in pure python, use rsplit:

df = pd.DataFrame({'variable': ["hi.this.is.an.example"]})

df['variable'].str.rsplit('.', 2, expand=True)

output:

            0   1        2
0 hi.this.is an example

pandas split column by some specific words and keep this delimiter

Borrowing from Tim's answer using a lookbehind Regex to split on and or or, without using up the seperating string in the split:

d = {'a': ["abc123and321abcor213cba", "abc321or123cbaand321cba"]}

df = pandas.DataFrame(data=d)

df[["b", "c", "d"]] = df['a'].str.split(r'(?<=and)|(?<=or)', expand=True)

Output:

                         a          b          c       d
0 abc123and321abcor213cba abc123and 321abcor 213cba
1 abc321or123cbaand321cba abc321or 123cbaand 321cba

Pandas split column by combined delimiters

So I manage to find an answer. Basically to treat this whole string of delimiter "|" as a single delimiter, we have to implement it this way.

df.str.split('\"\|\"', expand = True) 

Split dataframe column with second column as delimiter

Try apply.

bigdata[['title', 'location']]=bigdata.apply(func=lambda row: row['title_location'].split(row['delimiter']), axis=1, result_type="expand")

Splitting the column values based on a delimiter (Pandas)

Just use apply function with split -

df['AA_IDs'].apply(lambda x: x.split('-#'))

This should give you a series with a list for each row as [AFB001 9183Daily, 789876A]

This would be significantly faster than using regex, and not to mention the readability.

splitting a pandas column by delimiter, two different sizes in the rows

IN the case there is no spaces in FirstName and LastName (else how you distinguish them):

pattern = ('^(?P<Initials>\w+)\s'
+ '(?P<FName>\w+)\s'
+ '(?P<LName>\w+)\s'
+ '(?P<SignupTime>\d+/\d+/\d+ \d+:\d+ \w+)\s'
+ '(?P<Waiver>.*)'
)

df['name'].str.extract(pattern)

Output:

  Initials      FName     LName           SignupTime                Waiver
0 DA Firstname Lastname 09/30/2020 07:44 AM 9/23/2020 6:06:38 PM
1 JW Firstname Lastname 10/25/2020 11:06 AM None

Update: For optional Initials, you can try this pattern:

pattern = ('^(?P<Initials>\w+\s)?'    # make initial optional
+ '(?P<FName>\w+)\s+'
+ '(?P<LName>\w+)\s+'
+ '(?P<SignupTime>\d+/\d+/\d+ \d+:\d+ \w+)\s'
+ '(?P<Waiver>.*)'
)

Note that, now if Initials exists, there will be a trailing space, which you can easily handle.

Split Dataframe column on delimiter when number of strings to split is not definite

You can covert your string to list with string .split() inside .map() method:

df['B'] = df['B'].map(lambda x: x.split(';'))

And then use .explode():

df.explode('B').reset_index(drop=True)

Split column in several columns by delimiter '\' in pandas

It looks like your file is tab-delimited, because of the "\t". This may work

pd.read_csv('file.txt', sep='\t', skiprows=8)


Related Topics



Leave a reply



Submit