R: Combine rows with same ID
Something like this:
Here we first group for all except the Var
variables, then we use summarise(across...
as suggested by @Limey in the comments section.
Main feature is to use na.rm=TRUE
:
library(dplyr)
df %>%
group_by(ID, Date, N_Date, type) %>%
summarise(across(starts_with("Var"), ~sum(., na.rm = TRUE)))
ID Date N_Date type Var1 Var2 Var3 Var4
<int> <chr> <int> <chr> <int> <int> <int> <int>
1 1 4.7.22 50000 normal 12 23 5 54
2 2 4.7.22 4000 normal 0 2 0 0
3 3 5.7.22 20000 normal 7 0 0 0
Merge rows with the same ID but with overlapping variables
I'm not sure if this actually is what you want, but to combine rows of a data frame based on multiple conditions you can use the dplyr
package and its summarise()
function. I generated some data to use in R directly, you would have to modify the code according to your needs.
# generate data
ID<-rep(1:20,2)
visitors<-sample(1:50, 40, replace=TRUE)
impact<-sample(rep(c("a", "b", "c", "d", "e"), 8))
arrival<-sample(rep(8:15, 5))
departure <- sample(rep(16:23, 5))
df<-data.frame(ID, visitors, impact, arrival, departure)
df$impact<-as.character(df$impact)
# summarise rows with identical ID
df_summary <- df %>%
group_by(ID) %>%
summarise(visitors = max(visitors), arrival = min(arrival),
departure = max(departure), impact = paste0(impact, collapse =", "))
Hope this helps!
Pandas Merge and Complete rows with same id
If there is only one non empty value per groups use:
df = df.replace('',np.nan).groupby('ID', as_index=False).first().fillna('')
If possible multiple values and need unique values in original order use lambda function:
print (df)
ID LU MA ME JE VE SA DI
0 201 B C B
1 201 C C C B C
f = lambda x: ','.join(dict.fromkeys(x.dropna()).keys())
df = df.replace('',np.nan).groupby('ID', as_index=False).agg(f)
print (df)
ID LU MA ME JE VE SA DI
0 201 B,C C C B C
Merging rows in a dataframe R with duplicate id's
You could use summarize_all
, grouped by person_id
. This preserves the variables in each first row per person_id
not being NA
.
I added a pivot_wider
to preserve the different test_date
s (as pointed out by @Andrea M).
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(person_id) %>%
mutate(id = seq_along(person_id)) %>%
pivot_wider(names_from = id,
values_from = test_date,
names_prefix = "test_date") %>%
summarize_all(list(~ .[!is.na(.)][1]))
Output
> df1
# A tibble: 2 x 9
person_id serial_number freezer_number test_1 test_2 test_3 test_4 test_date1 test_date2
<chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr>
1 x c d positive positive NA NA 01/01/2010 05/01/2010
2 y e f positive NA NA NA 02/02/2020 NA
How to merge rows from table based on a common ID? SAS EG
You will have to add some actual SAS code into your Enterprise Guide project to do that.
Create a new variable and use CATX() function to build the string. Use BY group processing.
data want;
do until (last.id1);
set QUERY_FOR_TABLE1 ;
by id1 ;
length text $200;
text=catx(',',text,text1,text2);
end;
keep id1 text;
run;
Pandas | merge rows with same id
Use
DataFrame.groupby
- Group DataFrame or Series using a mapper or by a Series of columns..groupby.GroupBy.last
- Compute last of group values.DataFrame.replace
- Replace values given in to_replace with value.
Ex.
df = df.replace('',np.nan, regex=True)
df1 = df.groupby('id',as_index=False,sort=False).last()
print(df1)
id firstname lastname email updatedate
0 A1 wendy smith smith@mail.com 2019-02-03
1 A2 harry lynn harylynn@mail.com 2019-03-12
2 A3 tinna dickey tinna@mail.com 2013-06-12
3 A4 Tom Lee Tom@mail.com 2012-06-12
4 A5 Ella NaN Ella@mail.com 2019-07-12
5 A6 Ben Lang Ben@mail.com 2019-03-12
MYSQL how to merge rows with same field id into a single row
GROUP_CONCAT supports DISTINCT
and SEPARATOR``
CREATE TABLE table1 (
`rowid` VARCHAR(139),
`title` VARCHAR(139),
`author_f_name` VARCHAR(139),
`author_m_name` VARCHAR(139),
`author_l_name` VARCHAR(139),
`coauthor_first_name` VARCHAR(139),
`coauthor_middle_name` VARCHAR(139),
`coauthor_last_name` VARCHAR(139)
);
INSERT INTO table1
(`rowid`, `title`, `author_f_name`, `author_m_name`, `author_l_name`, `coauthor_first_name`, `coauthor_middle_name`, `coauthor_last_name`)
VALUES
('1.', 'Blog Title.', 'Roy', NULL, 'Thomas.', 'Joe', 'Shann', 'Mathews'),
('1.', 'Blog Title.', 'Thomas', 'NULL', 'Edison', 'Kunal', NULL, 'Shar');
SELECT
`rowid`
, GROUP_CONCAT(DISTINCT `title` SEPARATOR ' |||') tilte
, GROUP_CONCAT(DISTINCT `author_f_name` SEPARATOR ' |||') author_f_name
, GROUP_CONCAT(DISTINCT `author_m_name` SEPARATOR ' |||') author_m_name
, GROUP_CONCAT(DISTINCT `author_l_name` SEPARATOR ' |||') author_l_name
, GROUP_CONCAT(DISTINCT `coauthor_first_name` SEPARATOR ' |||') coauthor_first_name
, GROUP_CONCAT(DISTINCT `coauthor_middle_name` SEPARATOR ' |||') coauthor_middle_name
, GROUP_CONCAT(DISTINCT `coauthor_last_name` SEPARATOR ' |||') coauthor_last_name
FROM table1
GROUP BY `rowid`
rowid | tilte | author_f_name | author_m_name | author_l_name | coauthor_first_name | coauthor_middle_name | coauthor_last_name
:---- | :---------- | :------------ | :------------ | :---------------- | :------------------ | :------------------- | :-----------------
1. | Blog Title. | Roy |||Thomas | NULL | Edison |||Thomas. | Joe |||Kunal | Shann | Mathews |||Shar
db<>fiddle here
SELECT
`rowid`
, GROUP_CONCAT(DISTINCT `title` SEPARATOR ' |||') tilte
, GROUP_CONCAT(DISTINCT CONCAT(`author_f_name`,' ',COALESCE(`author_m_name`,''),' ',`author_l_name`) SEPARATOR ' |||') author_full_name
, GROUP_CONCAT(DISTINCT CONCAT(`coauthor_first_name`,' ',COALESCE(`coauthor_middle_name`,''),' ',`coauthor_last_name`) SEPARATOR ' |||') coauthor_full_name
FROM table1
GROUP BY `rowid`
rowid | tilte | author_full_name | coauthor_full_name
:---- | :---------- | :--------------------------------- | :-------------------------------
1. | Blog Title. | Roy Thomas. |||Thomas NULL Edison | Joe Shann Mathews |||Kunal Shar
db<>fiddle here
Related Topics
Rstudio Shiny List from Checking Rows in Datatables
R: How to Run Some Code on Load of Package
Dplyr Issues When Using Group_By(Multiple Variables)
How to Disable "Save Workspace Image" Prompt in R
Delete a Column in a Data Frame Within a List
Can Sweave Produce Many PDFs Automatically
Replace All Values in a Matrix <0.1 with 0
Bigrams Instead of Single Words in Termdocument Matrix Using R and Rweka
How to Set Fixed Continuous Colour Values in Ggplot2
How to Determine the Namespace of a Function
Creating Regular 15-Minute Time-Series from Irregular Time-Series
Smaller Gap Between Two Legends in One Plot (E.G. Color and Size Scale)
Ggplot2 Multiple Scales/Legends Per Aesthetic, Revisited
How to Replace Na (Missing Values) in a Data Frame with Neighbouring Values