Using Condition to Split Pandas Column of Lists into Multiple Columns.

Using condition to split pandas column of lists into multiple columns.

Use apply with a custom function

def my_split(row):
    return pd.Series({
        '1st_half_T1': [i for i in row.Time1 if i <= 96],
        '2nd_half_T1': [i for i in row.Time1 if i > 96],
        '1st_half_T2': [i for i in row.Time2 if i <= 96],
        '2nd_half_T2': [i for i in row.Time2 if i > 96]
    })
df.apply(my_split, axis=1)

Out[]:
  1st_half_T1   1st_half_T2 2nd_half_T1 2nd_half_T2
0        [93]  [16, 48, 66]  [109, 187]       [128]
1          []            []       [159]  [123, 136]
2    [94, 96]          [40]  [154, 169]  [177, 192]

Split a column into multiple columns with condition

Assuming the "Value" column contains strings, you can use str.split and pivot like so:

value = df["Value"].str.split(",").explode().astype(int).reset_index()
output = value.pivot(index="index", columns="Value", values="Value")
output = output.reindex(range(value["Value"].min(), value["Value"].max()+1), axis=1)

>>> output

Value    1    2    3    4   5    6    7    8    9
index                                            
0      1.0  NaN  NaN  NaN NaN  NaN  NaN  NaN  NaN
1      1.0  NaN  3.0  NaN NaN  NaN  NaN  NaN  NaN
2      NaN  NaN  NaN  4.0 NaN  6.0  NaN  8.0  NaN
3      1.0  NaN  3.0  NaN NaN  NaN  NaN  NaN  NaN
4      NaN  2.0  NaN  NaN NaN  NaN  7.0  NaN  9.0

Input `df`:

df = pd.DataFrame({"Value": ["1", "1,3", "4,6,8", "1,3", "2,7,9"]})

pandas - split column with arrays into multiple columns and count values

np.random.seed(2022)

from collections import Counter
import numpy as np

df = pd.DataFrame(data=[[[np.random.randint(1,7) for _ in range(10)] for _ in range(5)]], 
                  index=["col1"])
df = df.transpose()

You can use Series.explode with SeriesGroupBy.value_counts and reshape by Series.unstack:

df1 = (df['col1'].explode()
                 .groupby(level=0)
                 .value_counts()
                 .unstack(fill_value=0)
                 .add_prefix('col')
                 .rename_axis(None, axis=1))
print (df1)
   col1  col2  col3  col4  col5  col6
0     4     2     1     0     1     2
1     3     2     0     4     0     1
2     3     1     3     2     0     1
3     1     1     3     0     1     4
4     1     1     1     1     3     3

Or use list comprehension with Counter and DataFrame constructor:

df1 = (pd.DataFrame([Counter(x) for x in df['col1']])
         .sort_index(axis=1)
         .fillna(0)
         .astype(int)
         .add_prefix('col'))
print (df1)
   col1  col2  col3  col4  col5  col6
0     4     2     1     0     1     2
1     3     2     0     4     0     1
2     3     1     3     2     0     1
3     1     1     3     0     1     4
4     1     1     1     1     3     3

Split lists into multiple columns in a pandas DataFrame

What you could do is instead of appending columns on every iteration append all of them after running your loop:

df3 = pd.DataFrame(columns=['name', 'hobby'])
d_list = []

for index, row in df.iterrows():
    for value in str(row['hobbies']).split(';'):
        d_list.append({'name':row['name'], 
                       'value':value})
df3 = df3.append(d_list, ignore_index=True)
df3 = df3.groupby('name')['value'].value_counts()
df3 = df3.unstack(level=-1).fillna(0)
df3

I checked how much time it would take for you example dataframe. With the improvement I suggest it's ~50 times faster.

Split a column containing a list into multiple rows in Pandas based on a condition

After using literal_eval to evaluate the strings in column status as python lists you can use:

For pandas version >= 0.25 you can use explode:

# Explode dataframe
df_out = df.explode('status').reset_index(drop=True)

# fill the NaN with empty lists
df_out['status'] = df_out['status'].dropna().reindex(df_out.index, fill_value=[])

For pandas version < 0.25 as explode is not available you can use replicate the explode like behaviour using index.repeat, then flatening out the nested lists using chain:

from itertools import chain

l = df['status'].str.len()
m = l > 0

df_out = df.reindex(df[m].index.repeat(l[m]))
df_out['status'] = list(chain(*df.loc[m, 'status']))
df_out = df_out.append(df[~m]).sort_index().reset_index(drop=True)

>>> df_out

                        date                       status
0 2021-02-06 08:18:01.212763     [London, New York, BUSY]
1 2021-02-06 08:17:01.018633        [Mumbai, Tokyo, IDLE]
2 2021-02-06 08:16:01.182888   [Amsterdam, Chicago, IDLE]
3 2021-02-06 08:16:01.182888    [Amsterdam, London, IDLE]
4 2021-02-06 08:16:01.182888    [Amsterdam, Berlin, BUSY]
5 2021-02-06 08:15:01.245619        [Tokyo, Moscow, IDLE]
6 2021-02-06 07:18:01.413066  [Mumbai, Los Angeles, IDLE]
7 2021-02-06 07:18:01.413066       [Mumbai, Berlin, IDLE]
8 2021-02-06 07:17:01.154138                           []
9 2021-02-06 07:16:01.253111                           []

Pandas: Split and/or update columns, based on inconsistent data?

Use:

#part of cities with space
cities = ['York','Angeles']

#test rows
m = df['Team'].str.contains('|'.join(cities))

#first split by first space to 2 new columns
df[['City','Franchise']] = df['Team'].str.split(n=1, expand=True)
#split by second space only filtered rows
s = df.loc[m, 'Team'].str.split(n=2)
 
#update values
df.update(pd.concat([s.str[:2].str.join(' '), s.str[2]], axis=1, ignore_index=True).set_axis(['City','Franchise'], axis=1))
print (df)
                Team      City  Franchise
0    New York Giants  New York     Giants
1     Atlanta Braves   Atlanta     Braves
2       Chicago Cubs   Chicago       Cubs
3  Chicago White Sox   Chicago  White Sox

How to split a list-column in a Pandas Dataframe based on a condition on the elements of the list?

df_out = pd.DataFrame(
    df["combined_list"]
    .apply(lambda x: list(zip(*[s.split("|") for s in x])))
    .tolist(),
    columns=["countries", "abbreviations"],
)
print(df_out)

Prints:

                                       countries     abbreviations
0  (Netherlands, Germany, United_States, Poland)  (NL, DE, US, PL)
1                (Netherlands, Austria, Belgium)      (NL, AU, BE)
2                       (United_States, Germany)          (US, DE)

To have lists in the columns:

df_out = pd.DataFrame(
    df["combined_list"]
    .apply(lambda x: list(map(list, zip(*[s.split("|") for s in x]))))
    .tolist(),
    columns=["countries", "abbreviations"],
)
print(df_out)

Prints:

                                       countries     abbreviations
0  [Netherlands, Germany, United_States, Poland]  [NL, DE, US, PL]
1                [Netherlands, Austria, Belgium]      [NL, AU, BE]
2                       [United_States, Germany]          [US, DE]

how to split a multiline text in dataframe column into multiple columns using start and end words as pattern to capture the the text inbetween

Even though it's not entirely clear what you want to achieve, I think you're looking for extract.

Make some test data

import pandas as pd
import re # for the re.DOTALL flag

data = {"index_id": ["ROW1"],
        "raw_text": ["STARTWORD: MULTILINE TEXT TO COL1 ENDWORD MULTILINE\nTEXT TO COL2 ENDWORD2 MULTILINE TEXT TO COL3 ENDWORD3"]}

df = pd.DataFrame(data).set_index("index_id")

df looks like this:

                                                   raw_text
index_id                                                   
ROW1      STARTWORD: MULTILINE TEXT TO COL1 ENDWORD MULT...

Extract the columns

The following matches everything in between the split words as long as the order in the list matches the order of their occurrence in the raw string.
(You need the re.DOTALL flag so the dot . matches newlines, too.)

split_words = ["STARTWORD:", "ENDWORD", "ENDWORD2", "ENDWORD3"]

new_df = df.raw_text.str.extract("(.+)".join(split_words), flags=re.DOTALL)

Result

                                 0                          1                         2
index_id                                                                               
ROW1       MULTILINE TEXT TO COL1    MULTILINE\nTEXT TO COL2    MULTILINE TEXT TO COL3

Python:how to split column into multiple columns in a dataframe and with dynamic column naming

We can reconstruct your data with tolist and pd.DataFrame. Then concat everything together again:

d = [pd.DataFrame(df[col].tolist()).add_prefix(col) for col in df.columns]
df = pd.concat(d, axis=1)

   id0  id1   id2  value0  value1  value2
0   10   10   NaN   apple  orange    None
1   15   67   NaN  banana  orange    None
2   12   34  45.0   apple  banana  orange

How can I split Pandas dataframe column with strings according to multiple conditions

`Series.str.split`

s = df['Col.A'].str.split(r'\s+(?=\b(?:dark|\d+)\b)', n=1, expand=True)
df[['ID']].join(s.set_axis(['Col.A', 'Col.B'], 1))

      ID                     Col.A                          Col.B
0  28654                 This is a  dark chocolate which is sweet
1  39876               Sky is blue        1234 Sky is cloudy 3423
2  88776  Stars can be seen in the                       dark sky
3  35491        Schools are closed        4568 but shops are open

Regex details:

\s+ : Matches any whitespace character one or more time
(?=\b(?:dark|\d+)\b) : Positive Lookahead
- \b : Word boundary to prevent partial matches
- (?:dark|\d+): Non capturing group
  - dark : First Alternative matches the characters dark literally
  - \d+ : Second alternative which matches any digit one or more times
- \b : Word boundary to prevent partial matches

See the online regex demo

Using Condition to Split Pandas Column of Lists into Multiple Columns.