Merge Rows With Same Id

Pandas DataFrame: Merge rows with same id

I was looking for a way to do it without the "apply" function, for better runtime by using pandas build-in functions.

Compare runtimes with and without apply function:
dataset:

data_temp1 = {'timestamp':np.concatenate([np.arange(0,30000,1)]*2), 'code':[6,6, 5]*20000, 'code_2':[6,6, 5]*20000, 'q1':[0.134555,0.984554565478545, 54]*20000, 'q2':[9.7079931640624864,None, 43]*20000, 'q3':[10.25475688648455,None, 54]*20000} 
df = pd.DataFrame(data_temp1)

Solution by the use of apply similar to @Andrej Kesely example:

7.21 s ± 8.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution without apply by my solution:

98.4 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

My solution:
(Will fill the empty cells only if exist. So, it's right according to both of your cases).

Sort the rows by the number of empty cells
Fill each row in each group by below row (Its ok because with sort them first)
Remove rows with empty cells

columns_to_groupby = ["timestamp", "code"]
# Sort rows of a dataframe in descending order of None counts
df = df.iloc[df.isnull().sum(1).sort_values(ascending=True).index].set_index(columns_to_groupby)
# group by timestamp column, fill the None cells if exists, delete the incomplete rows (from which we filled in the others)
df.groupby(df.index).bfill().dropna()

Examples:

Example 1:

Input:
Sample Image

Result:
Sample Image

Example 2 (with row without empty cell):

Input:
Sample Image

Result:
Sample Image

As you can see, same result for both of them.

Pandas | merge rows with same id

Use

DataFrame.groupby - Group DataFrame or Series using a mapper or by a Series of columns.
.groupby.GroupBy.last - Compute last of group values.
DataFrame.replace - Replace values given in to_replace with value.

Ex.

df = df.replace('',np.nan, regex=True)
df1 = df.groupby('id',as_index=False,sort=False).last()
print(df1)

   id firstname lastname              email  updatedate
0  A1     wendy    smith     smith@mail.com  2019-02-03
1  A2     harry     lynn  harylynn@mail.com  2019-03-12
2  A3     tinna   dickey     tinna@mail.com  2013-06-12
3  A4       Tom      Lee       Tom@mail.com  2012-06-12
4  A5      Ella      NaN      Ella@mail.com  2019-07-12
5  A6       Ben     Lang       Ben@mail.com  2019-03-12

Merge rows with the same ID but with overlapping variables

I'm not sure if this actually is what you want, but to combine rows of a data frame based on multiple conditions you can use the dplyr package and its summarise()function. I generated some data to use in R directly, you would have to modify the code according to your needs.

# generate data
ID<-rep(1:20,2)
visitors<-sample(1:50, 40, replace=TRUE)
impact<-sample(rep(c("a", "b", "c", "d", "e"), 8))
arrival<-sample(rep(8:15, 5))
departure <- sample(rep(16:23, 5))

df<-data.frame(ID, visitors, impact, arrival, departure)
df$impact<-as.character(df$impact)

# summarise rows with identical ID
df_summary <- df %>%
  group_by(ID) %>%
  summarise(visitors = max(visitors), arrival = min(arrival), 
            departure = max(departure), impact = paste0(impact, collapse =", "))

Hope this helps!

How to combine rows with the same ID into a list

try the following, it may solve your problem.

Let's say your existing table name is yourTable and the new table to be created is groupedNames. in data view, click on new table and paste the following:

groupedNames = calculatetable
(
    addcolumns(
        summarize(yourTable ,yourTable[Id ]),
        "Names",calculate(CONCATENATEX(yourTable,[ Name ],","))
    )   
)

Postgres: Merge records with same ID

SELECT id, ARRAY_AGG(type) AS types FROM table GROUP BY id ORDER BY id;