Pandas DataFrame: Merge rows with same id
I was looking for a way to do it without the "apply" function, for better runtime by using pandas build-in functions.
Compare runtimes with and without apply function:
dataset:
data_temp1 = {'timestamp':np.concatenate([np.arange(0,30000,1)]*2), 'code':[6,6, 5]*20000, 'code_2':[6,6, 5]*20000, 'q1':[0.134555,0.984554565478545, 54]*20000, 'q2':[9.7079931640624864,None, 43]*20000, 'q3':[10.25475688648455,None, 54]*20000}
df = pd.DataFrame(data_temp1)
Solution by the use of apply similar to @Andrej Kesely example:
- 7.21 s ± 8.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solution without apply by my solution:
- 98.4 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
My solution:
(Will fill the empty cells only if exist. So, it's right according to both of your cases).
- Sort the rows by the number of empty cells
- Fill each row in each group by below row (Its ok because with sort them first)
- Remove rows with empty cells
columns_to_groupby = ["timestamp", "code"]
# Sort rows of a dataframe in descending order of None counts
df = df.iloc[df.isnull().sum(1).sort_values(ascending=True).index].set_index(columns_to_groupby)
# group by timestamp column, fill the None cells if exists, delete the incomplete rows (from which we filled in the others)
df.groupby(df.index).bfill().dropna()
Examples:
Example 1:
Input:
Result:
Example 2 (with row without empty cell):
Input:
Result:
As you can see, same result for both of them.
Pandas | merge rows with same id
Use
DataFrame.groupby
- Group DataFrame or Series using a mapper or by a Series of columns..groupby.GroupBy.last
- Compute last of group values.DataFrame.replace
- Replace values given in to_replace with value.
Ex.
df = df.replace('',np.nan, regex=True)
df1 = df.groupby('id',as_index=False,sort=False).last()
print(df1)
id firstname lastname email updatedate
0 A1 wendy smith smith@mail.com 2019-02-03
1 A2 harry lynn harylynn@mail.com 2019-03-12
2 A3 tinna dickey tinna@mail.com 2013-06-12
3 A4 Tom Lee Tom@mail.com 2012-06-12
4 A5 Ella NaN Ella@mail.com 2019-07-12
5 A6 Ben Lang Ben@mail.com 2019-03-12
Merge rows with the same ID but with overlapping variables
I'm not sure if this actually is what you want, but to combine rows of a data frame based on multiple conditions you can use the dplyr
package and its summarise()
function. I generated some data to use in R directly, you would have to modify the code according to your needs.
# generate data
ID<-rep(1:20,2)
visitors<-sample(1:50, 40, replace=TRUE)
impact<-sample(rep(c("a", "b", "c", "d", "e"), 8))
arrival<-sample(rep(8:15, 5))
departure <- sample(rep(16:23, 5))
df<-data.frame(ID, visitors, impact, arrival, departure)
df$impact<-as.character(df$impact)
# summarise rows with identical ID
df_summary <- df %>%
group_by(ID) %>%
summarise(visitors = max(visitors), arrival = min(arrival),
departure = max(departure), impact = paste0(impact, collapse =", "))
Hope this helps!
How to combine rows with the same ID into a list
try the following, it may solve your problem.
Let's say your existing table name is yourTable
and the new table to be created is groupedNames
. in data view, click on new table and paste the following:
groupedNames = calculatetable
(
addcolumns(
summarize(yourTable ,yourTable[Id ]),
"Names",calculate(CONCATENATEX(yourTable,[ Name ],","))
)
)
Postgres: Merge records with same ID
SELECT id, ARRAY_AGG(type) AS types FROM table GROUP BY id ORDER BY id;
Related Topics
Select and Compare Two Datetime Columns from Different Table Without Having Any Relation
Retrieve String Between HTML Tags Using Regex
Mysql - Trigger for Updating Same Table After Insert
Wamp Server Error [Local Server - 2 of 3 Services Running]
Sql: Select All Rows If Parameter Is Null, Else Only Select Matching Rows
Retrieve Varbinary Value as Base64 in Mssql
Disable Secure Priv for Data Loading on MySQL
Er_Access_Denied_Error: Access Denied for User ''@'Localhost' (Using Password: No)
Select Only Rows With Max Date
Inserting Date Value into Date Field Using Laravel
Select to Get Rows Based on Minimum Value of a Column
How to See Query History in SQL Server Management Studio
How to Strip All Non-Alphabetic Characters from String in SQL Server
How to Convert a Timestamp (Date Format) to Bigint in SQL
How to Insert an Image in Sqlite Database(Table)
Return Dummy Value If No Row Present for in Clause
Sql Call Stored Procedure for Each Row Without Using a Cursor
Select Every Employee That Has a Higher Salary Than the Average of His Department