Merge Rows With Same Id

Pandas DataFrame: Merge rows with same id

I was looking for a way to do it without the "apply" function, for better runtime by using pandas build-in functions.

Compare runtimes with and without apply function:
dataset:

data_temp1 = {'timestamp':np.concatenate([np.arange(0,30000,1)]*2), 'code':[6,6, 5]*20000, 'code_2':[6,6, 5]*20000, 'q1':[0.134555,0.984554565478545, 54]*20000, 'q2':[9.7079931640624864,None, 43]*20000, 'q3':[10.25475688648455,None, 54]*20000} 
df = pd.DataFrame(data_temp1)

Solution by the use of apply similar to @Andrej Kesely example:

  • 7.21 s ± 8.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution without apply by my solution:

  • 98.4 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

My solution:
(Will fill the empty cells only if exist. So, it's right according to both of your cases).

  • Sort the rows by the number of empty cells
  • Fill each row in each group by below row (Its ok because with sort them first)
  • Remove rows with empty cells
columns_to_groupby = ["timestamp", "code"]
# Sort rows of a dataframe in descending order of None counts
df = df.iloc[df.isnull().sum(1).sort_values(ascending=True).index].set_index(columns_to_groupby)
# group by timestamp column, fill the None cells if exists, delete the incomplete rows (from which we filled in the others)
df.groupby(df.index).bfill().dropna()

Examples:

Example 1:

Input:
Sample Image

Result:
Sample Image

Example 2 (with row without empty cell):

Input:
Sample Image

Result:
Sample Image

As you can see, same result for both of them.

Pandas | merge rows with same id

Use

  • DataFrame.groupby - Group DataFrame or Series using a mapper or by a Series of columns.
  • .groupby.GroupBy.last - Compute last of group values.
  • DataFrame.replace - Replace values given in to_replace with value.

Ex.

df = df.replace('',np.nan, regex=True)
df1 = df.groupby('id',as_index=False,sort=False).last()
print(df1)

id firstname lastname email updatedate
0 A1 wendy smith smith@mail.com 2019-02-03
1 A2 harry lynn harylynn@mail.com 2019-03-12
2 A3 tinna dickey tinna@mail.com 2013-06-12
3 A4 Tom Lee Tom@mail.com 2012-06-12
4 A5 Ella NaN Ella@mail.com 2019-07-12
5 A6 Ben Lang Ben@mail.com 2019-03-12

Merge rows with the same ID but with overlapping variables

I'm not sure if this actually is what you want, but to combine rows of a data frame based on multiple conditions you can use the dplyr package and its summarise()function. I generated some data to use in R directly, you would have to modify the code according to your needs.

# generate data
ID<-rep(1:20,2)
visitors<-sample(1:50, 40, replace=TRUE)
impact<-sample(rep(c("a", "b", "c", "d", "e"), 8))
arrival<-sample(rep(8:15, 5))
departure <- sample(rep(16:23, 5))

df<-data.frame(ID, visitors, impact, arrival, departure)
df$impact<-as.character(df$impact)

# summarise rows with identical ID
df_summary <- df %>%
group_by(ID) %>%
summarise(visitors = max(visitors), arrival = min(arrival),
departure = max(departure), impact = paste0(impact, collapse =", "))

Hope this helps!

How to combine rows with the same ID into a list

try the following, it may solve your problem.

Let's say your existing table name is yourTable and the new table to be created is groupedNames. in data view, click on new table and paste the following:

groupedNames = calculatetable
(
addcolumns(
summarize(yourTable ,yourTable[Id ]),
"Names",calculate(CONCATENATEX(yourTable,[ Name ],","))
)
)

Postgres: Merge records with same ID

SELECT id, ARRAY_AGG(type) AS types FROM table GROUP BY id ORDER BY id;


Related Topics



Leave a reply



Submit