Replicating Rows in a Pandas Data Frame by a Column Value

Python: How to replicate rows in Dataframe with column value but changing the column value to its range

You can do a groupby().cumcount() after that:

out = df.loc[df.index.repeat(df['Table'])]
out['Table'] = out.groupby(level=0).cumcount() + 1

Output:

   Store  Aisle  Table
0     11     59      1
0     11     59      2
1     11     61      1
1     11     61      2
1     11     61      3

Python - Replicate rows in Pandas Dataframe based on condition

import pandas as pd

Firstly create a boolean mask to check your condition by using isin() method:

mask=df[columns].isin(values).any(1)

Finally use reindex() method ,repeat those rows rep_times and append() method to append rows back to dataframe that aren't satisfying the condition:

df=df.reindex(df[mask].index.repeat(rep_times)).append(df[~mask])

How to replicate rows based on value of a column in same pandas dataframe

Try with reindex + repeat

out = df.reindex(df.index.repeat(df['count']))

Replicating rows in a pandas data frame by a column value

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:

df2 = df.loc[df.index.repeat(df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

Or you could use np.repeat to get the repeated indices and then use that to index into the frame:

df2 = df.loc[np.repeat(df.index.values, df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

After which there's only a bit of cleaning up to do:

df2 = df2.drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Note that if you might have duplicate indices to worry about, you could use .iloc instead:

df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

which uses the positions, and not the index labels.

Pandas - replicate rows with new column value from a list for each replication

Here is a way using the keys paramater of pd.concat():

(pd.concat([df]*len(New_Cost_List),
           keys = New_Cost_List,
           names = ['New_Cost',None])
 .reset_index(level=0))

Output:

   New_Cost State  Cost
0         1     A     2
1         1     B     9
2         1     C     8
3         1     D     4
0         5     A     2
1         5     B     9
2         5     C     8
3         5     D     4
0        10     A     2
1        10     B     9
2        10     C     8
3        10     D     4

Replicate row in Pandas dataframe based on condition and change values for a specific column

You can use pandas.Index.repeat to repeat the rows [Duration times] based on column Duration and then using pandas.core.groupby.GroupBy.cumcount you can add increasing cumulative values to the start_year column.

Reading data

data = [[1500, 1501, ['A','B'], ['C','D'], 1],
        [1500, 1510, ['P','Q','R'], ['X','Y'], 10],
        [1520, 1520, ['A','X'], ['C'], 0],
        [1809, 1820, ['M'], ['F','H','Z'], 11]]
df = pd.DataFrame(data, columns = ['Start_Year', 'End_Year', 'Opp1', 'Opp2', 'Duration'])

Repeating the values

mask = df['Duration'].gt(0)
df1 = df[mask].copy()
df1 = df1.loc[df1.index.repeat(df1['Duration'] + 1)]

Assigning increasing values to each group

df1['Start_Year'] += df1[['Start_Year', 'End_Year', 'Opp1', 'Opp2']].astype(str).groupby(['Start_Year', 'End_Year', 'Opp1', 'Opp2']).cumcount()

Generating output

df1['Duration'] = df1['End_Year'] - df1['Start_Year']
df = pd.concat([df1, df[~mask]]).sort_index(kind = 'mergesort').reset_index(drop=True)

This gives us the expected output :

    Start_Year  End_Year       Opp1       Opp2  Duration
0         1500      1501     [A, B]     [C, D]         1
1         1501      1501     [A, B]     [C, D]         0
2         1500      1510  [P, Q, R]     [X, Y]        10
3         1501      1510  [P, Q, R]     [X, Y]         9
4         1502      1510  [P, Q, R]     [X, Y]         8
5         1503      1510  [P, Q, R]     [X, Y]         7
6         1504      1510  [P, Q, R]     [X, Y]         6
7         1505      1510  [P, Q, R]     [X, Y]         5
8         1506      1510  [P, Q, R]     [X, Y]         4
9         1507      1510  [P, Q, R]     [X, Y]         3
10        1508      1510  [P, Q, R]     [X, Y]         2
11        1509      1510  [P, Q, R]     [X, Y]         1
12        1510      1510  [P, Q, R]     [X, Y]         0
13        1520      1520     [A, X]        [C]         0
14        1809      1820        [M]  [F, H, Z]        11
15        1810      1820        [M]  [F, H, Z]        10
16        1811      1820        [M]  [F, H, Z]         9
17        1812      1820        [M]  [F, H, Z]         8
18        1813      1820        [M]  [F, H, Z]         7
19        1814      1820        [M]  [F, H, Z]         6
20        1815      1820        [M]  [F, H, Z]         5
21        1816      1820        [M]  [F, H, Z]         4
22        1817      1820        [M]  [F, H, Z]         3
23        1818      1820        [M]  [F, H, Z]         2
24        1819      1820        [M]  [F, H, Z]         1
25        1820      1820        [M]  [F, H, Z]         0

Alternatively

You can also try the other way around after Repeating the values by assigning Duration in first decreasing cumulatively. And then calculating the 'Start_Year' again

df1['Duration'] = df1[['Start_Year', 'End_Year', 'Opp1', 'Opp2']].astype(str).groupby(['Start_Year', 'End_Year', 'Opp1', 'Opp2']).cumcount(ascending=False)
df1['Start_Year'] = df1['End_Year'] - df1['Duration']
df = pd.concat([df1, df[~mask]]).sort_index(kind = 'mergesort').reset_index(drop=True)

Output :

This gives you same expected output:

    Start_Year  End_Year       Opp1       Opp2  Duration
0         1500      1501     [A, B]     [C, D]         1
1         1501      1501     [A, B]     [C, D]         0
2         1500      1510  [P, Q, R]     [X, Y]        10
3         1501      1510  [P, Q, R]     [X, Y]         9
4         1502      1510  [P, Q, R]     [X, Y]         8
5         1503      1510  [P, Q, R]     [X, Y]         7
6         1504      1510  [P, Q, R]     [X, Y]         6
7         1505      1510  [P, Q, R]     [X, Y]         5
8         1506      1510  [P, Q, R]     [X, Y]         4
9         1507      1510  [P, Q, R]     [X, Y]         3
10        1508      1510  [P, Q, R]     [X, Y]         2
11        1509      1510  [P, Q, R]     [X, Y]         1
12        1510      1510  [P, Q, R]     [X, Y]         0
13        1520      1520     [A, X]        [C]         0
14        1809      1820        [M]  [F, H, Z]        11
15        1810      1820        [M]  [F, H, Z]        10
16        1811      1820        [M]  [F, H, Z]         9
17        1812      1820        [M]  [F, H, Z]         8
18        1813      1820        [M]  [F, H, Z]         7
19        1814      1820        [M]  [F, H, Z]         6
20        1815      1820        [M]  [F, H, Z]         5
21        1816      1820        [M]  [F, H, Z]         4
22        1817      1820        [M]  [F, H, Z]         3
23        1818      1820        [M]  [F, H, Z]         2
24        1819      1820        [M]  [F, H, Z]         1
25        1820      1820        [M]  [F, H, Z]         0

You can reset the index using pandas.DataFrame.reset_index.

Summary :

Basically, what we have done here is duplicated rows based on value from column Duration with condition.

We saved the rows which could have got vanished on using pandas.Index.repeat to repeat the rows [Duration value times] and once we replicated and applied logic on the rows with Duration > 0 replacing column values by subsequent increasing/decreasing cumulative values using pandas.core.groupby.GroupBy.cumcount we concatenated both the dataframe and sorted them on index using pandas.DataFrame.sort_index since the index was also supposed to be repeated when we used pandas.Index.repeat to repeat the rows [Duration value times]. Hence the sort on index would give us the dataframe in same order as it was in the original dataframe.

How to replicate pandas DataFrame rows and change periodically one column

Add values to column col_d by DataFrame.assign with numpy.tile:

L = ['P','Q','R']
new_df = (pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
           .assign(col_d = np.tile(L, len(df))))

print (new_df)
  col_acol_b col_c col_d
0       A1B1    C1     P
1       A1B1    C1     Q
2       A1B1    C1     R
3       A2B2    C2     P
4       A2B2    C2     Q
5       A2B2    C2     R
6       A3B3    C3     P
7       A3B3    C3     Q
8       A3B3    C3     R

Another similar idea is repeat indices and duplicated rows by DataFrame.loc:

L = ['P','Q','R']
new_df = (df.loc[df.index.repeat(3)]
            .assign(col_d = np.tile(L, len(df)))
            .reset_index(drop=True))

print (new_df)
  col_acol_b col_c col_d
0       A1B1    C1     P
1       A1B1    C1     Q
2       A1B1    C1     R
3       A2B2    C2     P
4       A2B2    C2     Q
5       A2B2    C2     R
6       A3B3    C3     P
7       A3B3    C3     Q
8       A3B3    C3     R

EDIT:

L = ['P','Q','R','S']
new_df = (pd.DataFrame(np.repeat(df.values, len(L), axis=0), columns=df.columns)
           .assign(col_d = np.tile(L, len(df)),
                   col_c = lambda x: x['col_c'].mask(x['col_d'].eq('S'), 'T')))

print (new_df)
   col_acol_b col_c col_d
0        A1B1    C1     P
1        A1B1    C1     Q
2        A1B1    C1     R
3        A1B1     T     S
4        A2B2    C2     P
5        A2B2    C2     Q
6        A2B2    C2     R
7        A2B2     T     S
8        A3B3    C3     P
9        A3B3    C3     Q
10       A3B3    C3     R
11       A3B3     T     S

How can I replicate rows in Pandas?

Use `np.repeat`:

Version 1:

Try using np.repeat:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)

The above code will output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

np.repeat repeats the values of df, 3 times.

Then we add the columns with assigning new_df.columns = df.columns.

Version 2:

You could also assign the column names in the first line, like below:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

Replicating Rows in a Pandas Data Frame by a Column Value

Python: How to replicate rows in Dataframe with column value but changing the column value to its range

Python - Replicate rows in Pandas Dataframe based on condition

How to replicate rows based on value of a column in same pandas dataframe

Replicating rows in a pandas data frame by a column value

Pandas - replicate rows with new column value from a list for each replication

Replicate row in Pandas dataframe based on condition and change values for a specific column

Reading data

Repeating the values

Assigning increasing values to each group

Generating output

Alternatively

Output :

Summary :

How to replicate pandas DataFrame rows and change periodically one column

How can I replicate rows in Pandas?

Use `np.repeat`:

Version 1:

Version 2:

Related Topics

Leave a reply

Python: How to replicate rows in Dataframe with column value but changing the column value to its range

Python - Replicate rows in Pandas Dataframe based on condition

How to replicate rows based on value of a column in same pandas dataframe

Replicating rows in a pandas data frame by a column value

Pandas - replicate rows with new column value from a list for each replication

Replicate row in Pandas dataframe based on condition and change values for a specific column

Reading data

Repeating the values

Assigning increasing values to each group

Generating output

Alternatively

Output :

Summary :

How to replicate pandas DataFrame rows and change periodically one column

How can I replicate rows in Pandas?

Use np.repeat:

Version 1:

Version 2:

Related Topics

Leave a reply

Use `np.repeat`: