How to Split a Dataframe String Column into Two Columns

How to split a dataframe string column into two columns?

There might be a better way, but this here's one approach:

                            row
    0       00000 UNITED STATES
    1             01000 ALABAMA
    2  01001 Autauga County, AL
    3  01003 Baldwin County, AL
    4  01005 Barbour County, AL

df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
                                 columns = ['fips','row'])

   fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

How to split a string column into two columns with a 'variable' delimiter?

Use Series.str.split with the regex \s+\.+\s+, which splits by 1+ spaces, 1+ periods, 1+ spaces:

df = pd.DataFrame({'A': ['Mayor ............... Paul Jones', 'Senator ................. Billy Twister', 'Congress Rep. .......... Chris Rock', 'Chief of Staff ....... Tony Allen']})

df[['Title', 'Name']] = df['A'].str.split('\s+\.+\s+', expand=True)

#                                          A           Title           Name
# 0         Mayor ............... Paul Jones           Mayor     Paul Jones
# 1  Senator ................. Billy Twister         Senator  Billy Twister
# 2      Congress Rep. .......... Chris Rock   Congress Rep.     Chris Rock
# 3        Chief of Staff ....... Tony Allen  Chief of Staff     Tony Allen

How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item

There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.

df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]

How to split a dataframe column into two columns and transform values in one expression using Python?

As pointed out in @Ynjxsjmh's comment you can use .assign(), but you'd need a lambda function to give you access to the current state of the dataframe (you need access to both new columns):

df = pd.DataFrame({"Col1": ["123-abc"] * 3 + ["12345-abcde"] * 3})

df[["Col1", "Col2"]] = (
    df["Col1"]
    .str.split("-", expand=True)
    .rename(columns={0: "C1", 1: "C2"})
    .assign(C2=lambda df: df["C2"].where(df["C1"].str.len().ne(5), df["C2"].str[::-1]))
)

How to split a dataframe column into two columns

read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/')
   V1    V2
1  NA      
2 1.0  0.82
3 1.1 1.995
4 0.1 1.146
5  NA      
6 1.1 1.995

If you want columns in respective data.types:

type.convert(read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/'), as.is = TRUE)
   V1    V2
1  NA    NA
2 1.0 0.820
3 1.1 1.995
4 0.1 1.146
5  NA    NA
6 1.1 1.995

df <- structure(list(X1 = c(NA, "1/0:0.82", "1/1:1.995", "0/1:1.146", NA,
                 "1/1:1.995")), class = "data.frame", row.names = c(NA, -6L))

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the
right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0        Hello  X  Y
1  Hello world  Q  R
2           Hi  S  T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0        Hello      X      Y        Hello X Y     A
1  Hello world      Q      R  Hello world Q R     B
2           Hi      S      T           Hi S T     C

How to split a dataframe string column into multiple columns?

do this..

import pandas as pd

tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]
df = pd.DataFrame({"tags": tags})

df["letter1"] = df["tags"].apply(lambda x: x.split("&")[0].split("=")[-1])
df["letter2"] = df["tags"].apply(lambda x: x.split("&")[1].split("=")[-1])
df["letter3"] = df["tags"].apply(lambda x: x.split("&")[2].split("=")[-1])
df = df[["letter1", "letter2", "letter3"]]
df

Sample Image

Split column into multiple columns when a row starts with a string

try this:

pd.concat([sub.reset_index(drop=True) for _, sub in df.groupby(
    df.Group.str.contains(r'^Group\s+123').cumsum())], axis=1)
>>>

    Group           Group           Group
0   Group 123 nv-1  Group 123 mt-d2 Group 123 id-01
1   a, v            b, v            n,m
2   s,b             NaN             x, y
3   y, i            NaN             z, m
4   NaN             NaN             l,b

Python split one column into multiple columns and reattach the split columns into original dataframe

There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat for join together and then add to original by DataFrame.join or concat with axis=1:

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

Or:

df = df.join(pd.concat([USA, nonUSA]))
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F     NaN   NaN   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2          NaN          NaN  
3    Charlotte         None  
4          NaN          NaN  
5          NaN          NaN  
6         None         None

But it seems it is possible simplify:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F      NA  None   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2         None         None  
3    Charlotte         None  
4         None         None  
5         None         None  
6         None         None

How to Split a Dataframe String Column into Two Columns