Splitting a pandas dataframe column by delimiter
Use vectoried str.split
with expand=True
:
In [42]:
df[['V','allele']] = df['V'].str.split('-',expand=True)
df
Out[42]:
ID Prob V allele
0 3009 1.0000 IGHV7 B*01
1 129 1.0000 IGHV7 B*01
2 119 0.8000 IGHV6 A*01
3 120 0.8056 GHV6 A*01
4 121 0.9000 IGHV6 A*01
5 122 0.8050 IGHV6 A*01
6 130 1.0000 IGHV4 L*03
7 3014 1.0000 IGHV4 L*03
8 266 0.9970 IGHV5 A*01
9 849 0.4010 IGHV5 A*04
10 174 1.0000 IGHV6 A*02
11 844 1.0000 IGHV6 A*02
Pandas split column into multiple columns by comma
In case someone else wants to split a single column (deliminated by a value) into multiple columns - try this:
series.str.split(',', expand=True)
This answered the question I came here looking for.
Credit to EdChum's code that includes adding the split columns back to the dataframe.
pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)
Note: The first argument df[[0]]
is DataFrame
.
The second argument df[1].str.split
is the series that you want to split.
split Documentation
concat Documentation
split pandas column into many using delimiter from right to left
Like in pure python, use rsplit
:
df = pd.DataFrame({'variable': ["hi.this.is.an.example"]})
df['variable'].str.rsplit('.', 2, expand=True)
output:
0 1 2
0 hi.this.is an example
pandas split column by some specific words and keep this delimiter
Borrowing from Tim's answer using a lookbehind Regex to split on and
or or
, without using up the seperating string in the split:
d = {'a': ["abc123and321abcor213cba", "abc321or123cbaand321cba"]}
df = pandas.DataFrame(data=d)
df[["b", "c", "d"]] = df['a'].str.split(r'(?<=and)|(?<=or)', expand=True)
Output:
a b c d
0 abc123and321abcor213cba abc123and 321abcor 213cba
1 abc321or123cbaand321cba abc321or 123cbaand 321cba
Pandas split column by combined delimiters
So I manage to find an answer. Basically to treat this whole string of delimiter "|" as a single delimiter, we have to implement it this way.
df.str.split('\"\|\"', expand = True)
Split dataframe column with second column as delimiter
Try apply
.
bigdata[['title', 'location']]=bigdata.apply(func=lambda row: row['title_location'].split(row['delimiter']), axis=1, result_type="expand")
Splitting the column values based on a delimiter (Pandas)
Just use apply function with split -
df['AA_IDs'].apply(lambda x: x.split('-#'))
This should give you a series with a list for each row as [AFB001 9183Daily, 789876A]
This would be significantly faster than using regex, and not to mention the readability.
splitting a pandas column by delimiter, two different sizes in the rows
IN the case there is no spaces in FirstName and LastName (else how you distinguish them):
pattern = ('^(?P<Initials>\w+)\s'
+ '(?P<FName>\w+)\s'
+ '(?P<LName>\w+)\s'
+ '(?P<SignupTime>\d+/\d+/\d+ \d+:\d+ \w+)\s'
+ '(?P<Waiver>.*)'
)
df['name'].str.extract(pattern)
Output:
Initials FName LName SignupTime Waiver
0 DA Firstname Lastname 09/30/2020 07:44 AM 9/23/2020 6:06:38 PM
1 JW Firstname Lastname 10/25/2020 11:06 AM None
Update: For optional Initials, you can try this pattern:
pattern = ('^(?P<Initials>\w+\s)?' # make initial optional
+ '(?P<FName>\w+)\s+'
+ '(?P<LName>\w+)\s+'
+ '(?P<SignupTime>\d+/\d+/\d+ \d+:\d+ \w+)\s'
+ '(?P<Waiver>.*)'
)
Note that, now if Initials
exists, there will be a trailing space, which you can easily handle.
Split Dataframe column on delimiter when number of strings to split is not definite
You can covert your string to list with string .split()
inside .map()
method:
df['B'] = df['B'].map(lambda x: x.split(';'))
And then use .explode()
:
df.explode('B').reset_index(drop=True)
Split column in several columns by delimiter '\' in pandas
It looks like your file is tab-delimited, because of the "\t". This may work
pd.read_csv('file.txt', sep='\t', skiprows=8)
Related Topics
Flask to Return Image Stored in Database
Regular Expression to Extract Url from an HTML Link
Index of Duplicates Items in a Python List
How to Make File Creation an Atomic Operation
Two Variables in Python Have Same Id, But Not Lists or Tuples
Is There a Matplotlib Equivalent of Matlab's Datacursormode
How to Put Multiple Statements in One Line
Url Query Parameters to Dict Python
Check If a File Is Not Open Nor Being Used by Another Process
Complexity of *In* Operator in Python
Using Pickle.Dump - Typeerror: Must Be Str, Not Bytes
Why Doesn't a Python Dict.Update() Return the Object
Calculate Average of Every X Rows in a Table and Create New Table
Python Out of Memory on Large CSV File (Numpy)
Why Does Sys.Exit() Not Exit When Called Inside a Thread in Python