Python: How to replicate rows in Dataframe with column value but changing the column value to its range
You can do a groupby().cumcount()
after that:
out = df.loc[df.index.repeat(df['Table'])]
out['Table'] = out.groupby(level=0).cumcount() + 1
Output:
Store Aisle Table
0 11 59 1
0 11 59 2
1 11 61 1
1 11 61 2
1 11 61 3
Python - Replicate rows in Pandas Dataframe based on condition
import pandas as pd
Firstly create a boolean mask to check your condition by using isin()
method:
mask=df[columns].isin(values).any(1)
Finally use reindex()
method ,repeat those rows rep_times
and append()
method to append rows back to dataframe that aren't satisfying the condition:
df=df.reindex(df[mask].index.repeat(rep_times)).append(df[~mask])
How to replicate rows based on value of a column in same pandas dataframe
Try with reindex
+ repeat
out = df.reindex(df.index.repeat(df['count']))
Replicating rows in a pandas data frame by a column value
You can use Index.repeat
to get repeated index values based on the column then select from the DataFrame:
df2 = df.loc[df.index.repeat(df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
Or you could use np.repeat
to get the repeated indices and then use that to index into the frame:
df2 = df.loc[np.repeat(df.index.values, df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
After which there's only a bit of cleaning up to do:
df2 = df2.drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Note that if you might have duplicate indices to worry about, you could use .iloc
instead:
df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
which uses the positions, and not the index labels.
Pandas - replicate rows with new column value from a list for each replication
Here is a way using the keys
paramater of pd.concat()
:
(pd.concat([df]*len(New_Cost_List),
keys = New_Cost_List,
names = ['New_Cost',None])
.reset_index(level=0))
Output:
New_Cost State Cost
0 1 A 2
1 1 B 9
2 1 C 8
3 1 D 4
0 5 A 2
1 5 B 9
2 5 C 8
3 5 D 4
0 10 A 2
1 10 B 9
2 10 C 8
3 10 D 4
Replicate row in Pandas dataframe based on condition and change values for a specific column
You can use pandas.Index.repeat
to repeat the rows [Duration times
] based on column Duration
and then using pandas.core.groupby.GroupBy.cumcount
you can add increasing cumulative values to the start_year
column.
Reading data
data = [[1500, 1501, ['A','B'], ['C','D'], 1],
[1500, 1510, ['P','Q','R'], ['X','Y'], 10],
[1520, 1520, ['A','X'], ['C'], 0],
[1809, 1820, ['M'], ['F','H','Z'], 11]]
df = pd.DataFrame(data, columns = ['Start_Year', 'End_Year', 'Opp1', 'Opp2', 'Duration'])
Repeating the values
mask = df['Duration'].gt(0)
df1 = df[mask].copy()
df1 = df1.loc[df1.index.repeat(df1['Duration'] + 1)]
Assigning increasing values to each group
df1['Start_Year'] += df1[['Start_Year', 'End_Year', 'Opp1', 'Opp2']].astype(str).groupby(['Start_Year', 'End_Year', 'Opp1', 'Opp2']).cumcount()
Generating output
df1['Duration'] = df1['End_Year'] - df1['Start_Year']
df = pd.concat([df1, df[~mask]]).sort_index(kind = 'mergesort').reset_index(drop=True)
This gives us the expected output :
Start_Year End_Year Opp1 Opp2 Duration
0 1500 1501 [A, B] [C, D] 1
1 1501 1501 [A, B] [C, D] 0
2 1500 1510 [P, Q, R] [X, Y] 10
3 1501 1510 [P, Q, R] [X, Y] 9
4 1502 1510 [P, Q, R] [X, Y] 8
5 1503 1510 [P, Q, R] [X, Y] 7
6 1504 1510 [P, Q, R] [X, Y] 6
7 1505 1510 [P, Q, R] [X, Y] 5
8 1506 1510 [P, Q, R] [X, Y] 4
9 1507 1510 [P, Q, R] [X, Y] 3
10 1508 1510 [P, Q, R] [X, Y] 2
11 1509 1510 [P, Q, R] [X, Y] 1
12 1510 1510 [P, Q, R] [X, Y] 0
13 1520 1520 [A, X] [C] 0
14 1809 1820 [M] [F, H, Z] 11
15 1810 1820 [M] [F, H, Z] 10
16 1811 1820 [M] [F, H, Z] 9
17 1812 1820 [M] [F, H, Z] 8
18 1813 1820 [M] [F, H, Z] 7
19 1814 1820 [M] [F, H, Z] 6
20 1815 1820 [M] [F, H, Z] 5
21 1816 1820 [M] [F, H, Z] 4
22 1817 1820 [M] [F, H, Z] 3
23 1818 1820 [M] [F, H, Z] 2
24 1819 1820 [M] [F, H, Z] 1
25 1820 1820 [M] [F, H, Z] 0
Alternatively
You can also try the other way around after Repeating the values
by assigning Duration in first decreasing cumulatively. And then calculating the 'Start_Year' again
df1['Duration'] = df1[['Start_Year', 'End_Year', 'Opp1', 'Opp2']].astype(str).groupby(['Start_Year', 'End_Year', 'Opp1', 'Opp2']).cumcount(ascending=False)
df1['Start_Year'] = df1['End_Year'] - df1['Duration']
df = pd.concat([df1, df[~mask]]).sort_index(kind = 'mergesort').reset_index(drop=True)
Output :
This gives you same expected output:
Start_Year End_Year Opp1 Opp2 Duration
0 1500 1501 [A, B] [C, D] 1
1 1501 1501 [A, B] [C, D] 0
2 1500 1510 [P, Q, R] [X, Y] 10
3 1501 1510 [P, Q, R] [X, Y] 9
4 1502 1510 [P, Q, R] [X, Y] 8
5 1503 1510 [P, Q, R] [X, Y] 7
6 1504 1510 [P, Q, R] [X, Y] 6
7 1505 1510 [P, Q, R] [X, Y] 5
8 1506 1510 [P, Q, R] [X, Y] 4
9 1507 1510 [P, Q, R] [X, Y] 3
10 1508 1510 [P, Q, R] [X, Y] 2
11 1509 1510 [P, Q, R] [X, Y] 1
12 1510 1510 [P, Q, R] [X, Y] 0
13 1520 1520 [A, X] [C] 0
14 1809 1820 [M] [F, H, Z] 11
15 1810 1820 [M] [F, H, Z] 10
16 1811 1820 [M] [F, H, Z] 9
17 1812 1820 [M] [F, H, Z] 8
18 1813 1820 [M] [F, H, Z] 7
19 1814 1820 [M] [F, H, Z] 6
20 1815 1820 [M] [F, H, Z] 5
21 1816 1820 [M] [F, H, Z] 4
22 1817 1820 [M] [F, H, Z] 3
23 1818 1820 [M] [F, H, Z] 2
24 1819 1820 [M] [F, H, Z] 1
25 1820 1820 [M] [F, H, Z] 0
You can reset the index using pandas.DataFrame.reset_index
.
Summary :
Basically, what we have done here is duplicated rows based on value from column Duration
with condition.
We saved the rows which could have got vanished on using pandas.Index.repeat
to repeat the rows [Duration value times
] and once we replicated and applied logic on the rows with Duration > 0
replacing column values by subsequent increasing/decreasing
cumulative values using pandas.core.groupby.GroupBy.cumcount
we concatenated both the dataframe
and sorted them on index
using pandas.DataFrame.sort_index
since the index was also supposed to be repeated when we used pandas.Index.repeat
to repeat the rows [Duration value times
]. Hence the sort on index would give us the dataframe in same order as it was in the original dataframe.
How to replicate pandas DataFrame rows and change periodically one column
Add values to column col_d
by DataFrame.assign
with numpy.tile
:
L = ['P','Q','R']
new_df = (pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df))))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
Another similar idea is repeat indices and duplicated rows by DataFrame.loc
:
L = ['P','Q','R']
new_df = (df.loc[df.index.repeat(3)]
.assign(col_d = np.tile(L, len(df)))
.reset_index(drop=True))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
EDIT:
L = ['P','Q','R','S']
new_df = (pd.DataFrame(np.repeat(df.values, len(L), axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df)),
col_c = lambda x: x['col_c'].mask(x['col_d'].eq('S'), 'T')))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A1B1 T S
4 A2B2 C2 P
5 A2B2 C2 Q
6 A2B2 C2 R
7 A2B2 T S
8 A3B3 C3 P
9 A3B3 C3 Q
10 A3B3 C3 R
11 A3B3 T S
How can I replicate rows in Pandas?
Use np.repeat
:
Version 1:
Try using np.repeat
:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)
The above code will output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
np.repeat
repeats the values of df
, 3
times.
Then we add the columns with assigning new_df.columns = df.columns
.
Version 2:
You could also assign the column names in the first line, like below:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)
The above code will also output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
Related Topics
How to Put Multiple Statements in One Line
Simple Argparse Example Wanted: 1 Argument, 3 Results
Timeout for Python Requests.Get Entire Response
Get Lat/Long Given Current Point, Distance and Bearing
How to Add an Integer to Each Element in a List
How to Make a Selenium Script Undetectable Using Geckodriver and Firefox Through Python
Want to Find Contours -> Valueerror: Not Enough Values to Unpack (Expected 3, Got 2), This Appears
A Good Way to Get the Charset/Encoding of an Http Response in Python
How to Call a Shell Script from Python Code
Python Script to Copy Text to Clipboard
Search for String in All Pandas Dataframe Columns and Filter
Pandas Make New Column from String Slice of Another Column