merge rows pandas dataframe based on condition
That's totally possible:
df.groupby(((df.Start - df.End.shift(1)) > 10).cumsum()).agg({'Start':min, 'End':max, 'Value1':sum, 'Value2': sum})
Explanation:
start_end_differences = df.Start - df.End.shift(1) #shift moves the series down
threshold_selector = start_end_differences > 10 # will give you a boolean array where true indicates a point where the difference more than 10.
groups = threshold_selector.cumsum() # sums up the trues (1) and will create an integer series starting from 0
df.groupby(groups).agg({'Start':min}) # the aggregation is self explaining
Here is a generalized solution that remains agnostic of the other columns:
cols = df.columns.difference(['Start', 'End'])
grps = df.Start.sub(df.End.shift()).gt(10).cumsum()
gpby = df.groupby(grps)
gpby.agg(dict(Start='min', End='max')).join(gpby[cols].sum())
Start End Value1 Value2
0 1 42 10 50
1 100 162 36 22
Merge rows and values in R based on condition
This really depends on the rest of your strings, but you could take a look into grep
and use ^
for begins with.
df[grep('^ABC U', df$school), 'school'] <- 'ABC University'
df[grep('^DFG U', df$school), 'school'] <- 'DFG University'
And the aggregate
as usual.
aggregate(cbind(applicant, students) ~ school, df, sum)
# school applicant students
# 1 ABC University 5100 2100
# 2 DFG University 2210 4300
Python add / merge rows of a dataframe together based on multiple conditions
Use groupby_sum
:
out = df.groupby(['Application ID', 'Test Phase'], as_index=False).sum()
print(out)
# Output
Application ID Test Phase Total Tests A
0 9 SIT 36 36
1 11 UAT 5 5
Setup:
data = {'Application ID': [9, 9, 11],
'Test Phase': ['SIT', 'SIT', 'UAT'],
'Total Tests': [9, 27, 5],
'A': [9, 27, 5]}
df = pd.DataFrame(data)
print(df)
# Output
Application ID Test Phase Total Tests A
0 9 SIT 9 9
1 9 SIT 27 27
2 11 UAT 5 5
Merge Row into one with condition and replace value in one row with value in the other R
A data.table
option
setDT(df)[
,
c(
lapply(
setNames(.(A, B), c("A", "B")),
function(x) if ("Winter" %in% D) replace(x, D == "Summer", x[D == "Winter"]) else x
),
.(D = D)
),
C
][
,
lapply(.SD, function(x) toString(unique(x))),
C
][,
.SD,
.SDcols = names(df)
]
gives
A B C D
1: X apple december Winter, Summer
2: Z apple june Winter, Summer
3: U pear march Summer
Data
> dput(df)
structure(list(A = c("X", "Y", "Z", "W", "U"), B = c("apple",
"pear", "apple", "pear", "pear"), C = c("december", "december",
"june", "june", "march"), D = c("Winter", "Summer", "Winter",
"Summer", "Summer")), class = "data.frame", row.names = c(NA,
-5L))
How to merge rows based on conditions with characters values? (Household data)
Here is one way to do it (though I admit it is pretty verbose). I created a reference dataframe (i.e., combos
) in case you had more categories than 3, which is then joined with the main dataframe (i.e., df_new
) to bring in the PCS
roman numerals.
library(dplyr)
library(tidyr)
# Create a dataframe with all of the combinations of PCS.
combos <- expand.grid(unique(df$PCS), unique(df$PCS))
combos <- unique(t(apply(combos, 1, sort))) %>%
as.data.frame() %>%
dplyr::mutate(PCS = as.roman(row_number()))
# Create another dataframe with the columns reversed (will make it easier to join to the main dataframe).
combos2 <- data.frame(V1 = c(combos$V2), V2 = c(combos$V1), PCS = c(combos$PCS)) %>%
dplyr::mutate(PCS = as.roman(PCS))
combos <- rbind(combos, combos2)
# Get the count of "Yes" for each HHnum group.
# Then, put the PCS into 2 columns to join together with "combos" df.
df_new <- df %>%
dplyr::group_by(HHnum) %>%
dplyr::mutate(work_night = sum(work_night == "Yes")) %>%
dplyr::group_by(grp = rep(1:2, length.out = n())) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = grp, values_from = PCS) %>%
dplyr::rename("V1" = 3, "V2" = 4) %>%
dplyr::left_join(combos, by = c("V1", "V2")) %>%
unique() %>%
dplyr::select(HHnum, PCS, work_night)
Combine rows in Dataframe column based on condition
Use Series.replace
with aggregate sum
:
df['Edad'] = df['Edad'].replace({'Menos de 1 año':'De 1 a 4 años'})
df = df.groupby(['Causa de muerte','Sexo','Edad','Periodo'], as_index=False)['Total'].sum()
print (df)
Causa de muerte Sexo Edad Periodo Total
0 001-102 I-XXII.Todas las causas Total De 1 a 4 años 2016 1368
1 001-102 I-XXII.Todas las causas Total De 1 a 4 años 2017 1318
2 001-102 I-XXII.Todas las causas Total De 1 a 4 años 2018 1267
merge some rows in two conditions
I solved this by make the row null then remove it from CSV
df = pd.read_csv('test.csv', encoding='utf-8')
with open('output.csv', mode='w', newline='', encoding='utf-16') as f:
writer = csv.writer(f, delimiter=' ')
rows = []
for i, data in enumerate(df['Sentence']):
if i + 1 == len(df['Sentence']):
writer.writerow([data])
elif len(df['Sentence'][i + 1]) < 19:
writer.writerow([data + df['Sentence'][i + 1]])
df['Sentence'][i + 1] = ''
elif len(df['Sentence'][i + 1]) >= 19:
writer.writerow([data])
Pandas: merging rows with condition
Use numpy.sort
first and then GroupBy.agg
:
df[['A','B']] = np.sort(df[['A','B']], axis=1)
df = df.groupby(['A','B'], as_index=False).agg({'C':'sum', 'D':'mean'})
print (df)
A B C D
0 1 2 12 0.625
1 3 4 5 0.300
If original values cannot be changed:
arr = np.sort(df[['A','B']], axis=1)
df = (df.groupby([arr[:, 0],arr[:, 1]])
.agg({'C':'sum', 'D':'mean'})
.rename_axis(('A','B'))
.reset_index())
print (df)
A B C D
0 1 2 12 0.625
1 3 4 5 0.300
Related Topics
Disable Secure Priv for Data Loading on MySQL
Sql Server: Create an Incremental Counter for Records in the Same Year
Sql Server: Check If Variable Is Null and Then Assign Statement for Where Clause
How to Merge Rows on Specific Condition
Grouping But With Keeping All Non-Null Values
Mysql Query - Records Between Today and Last 30 Days
Ssis Date in Derived Column as Yyyy-Mm-Dd Format
How to Store the Select Statement Output into Variable Thro Psql ( Postgresql )
Removing Null Value in SQL Join and Union Operators in SQL
How to Replace Single-Quote With Double-Quote in SQL Query - Oracle 10G
Phone Number Display Method, SQL Query
Sql - Select Parent and Child Records in an Order
How to Select an Empty Result Set
Select Ids from Multiple Rows Where Column Values Satisfy One Condition But Not Another
Use Current Date as Default Value for a Column
How to Minus Current and Previous Value in SQL Server