How to Merge Rows on Specific Condition

merge rows pandas dataframe based on condition

That's totally possible:

df.groupby(((df.Start  - df.End.shift(1)) > 10).cumsum()).agg({'Start':min, 'End':max, 'Value1':sum, 'Value2': sum})

Explanation:

start_end_differences = df.Start  - df.End.shift(1) #shift moves the series down
threshold_selector = start_end_differences > 10 # will give you a boolean array where true indicates a point where the difference more than 10.
groups = threshold_selector.cumsum() # sums up the trues (1) and will create an integer series starting from 0
df.groupby(groups).agg({'Start':min}) # the aggregation is self explaining

Here is a generalized solution that remains agnostic of the other columns:

cols = df.columns.difference(['Start', 'End'])
grps = df.Start.sub(df.End.shift()).gt(10).cumsum()
gpby = df.groupby(grps)
gpby.agg(dict(Start='min', End='max')).join(gpby[cols].sum())

Start End Value1 Value2
0 1 42 10 50
1 100 162 36 22

Merge rows and values in R based on condition

This really depends on the rest of your strings, but you could take a look into grep and use ^ for begins with.

df[grep('^ABC U', df$school), 'school'] <- 'ABC University'
df[grep('^DFG U', df$school), 'school'] <- 'DFG University'

And the aggregate as usual.

aggregate(cbind(applicant, students) ~ school, df, sum)
# school applicant students
# 1 ABC University 5100 2100
# 2 DFG University 2210 4300

Python add / merge rows of a dataframe together based on multiple conditions

Use groupby_sum:

out = df.groupby(['Application ID', 'Test Phase'], as_index=False).sum()
print(out)

# Output
Application ID Test Phase Total Tests A
0 9 SIT 36 36
1 11 UAT 5 5

Setup:

data = {'Application ID': [9, 9, 11],
'Test Phase': ['SIT', 'SIT', 'UAT'],
'Total Tests': [9, 27, 5],
'A': [9, 27, 5]}
df = pd.DataFrame(data)
print(df)

# Output
Application ID Test Phase Total Tests A
0 9 SIT 9 9
1 9 SIT 27 27
2 11 UAT 5 5

Merge Row into one with condition and replace value in one row with value in the other R

A data.table option

setDT(df)[
,
c(
lapply(
setNames(.(A, B), c("A", "B")),
function(x) if ("Winter" %in% D) replace(x, D == "Summer", x[D == "Winter"]) else x
),
.(D = D)
),
C
][
,
lapply(.SD, function(x) toString(unique(x))),
C
][,
.SD,
.SDcols = names(df)
]

gives

   A     B        C              D
1: X apple december Winter, Summer
2: Z apple june Winter, Summer
3: U pear march Summer

Data

> dput(df)
structure(list(A = c("X", "Y", "Z", "W", "U"), B = c("apple",
"pear", "apple", "pear", "pear"), C = c("december", "december",
"june", "june", "march"), D = c("Winter", "Summer", "Winter",
"Summer", "Summer")), class = "data.frame", row.names = c(NA,
-5L))

How to merge rows based on conditions with characters values? (Household data)

Here is one way to do it (though I admit it is pretty verbose). I created a reference dataframe (i.e., combos) in case you had more categories than 3, which is then joined with the main dataframe (i.e., df_new) to bring in the PCS roman numerals.

library(dplyr)
library(tidyr)

# Create a dataframe with all of the combinations of PCS.
combos <- expand.grid(unique(df$PCS), unique(df$PCS))
combos <- unique(t(apply(combos, 1, sort))) %>%
as.data.frame() %>%
dplyr::mutate(PCS = as.roman(row_number()))
# Create another dataframe with the columns reversed (will make it easier to join to the main dataframe).
combos2 <- data.frame(V1 = c(combos$V2), V2 = c(combos$V1), PCS = c(combos$PCS)) %>%
dplyr::mutate(PCS = as.roman(PCS))
combos <- rbind(combos, combos2)

# Get the count of "Yes" for each HHnum group.
# Then, put the PCS into 2 columns to join together with "combos" df.
df_new <- df %>%
dplyr::group_by(HHnum) %>%
dplyr::mutate(work_night = sum(work_night == "Yes")) %>%
dplyr::group_by(grp = rep(1:2, length.out = n())) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = grp, values_from = PCS) %>%
dplyr::rename("V1" = 3, "V2" = 4) %>%
dplyr::left_join(combos, by = c("V1", "V2")) %>%
unique() %>%
dplyr::select(HHnum, PCS, work_night)

Combine rows in Dataframe column based on condition

Use Series.replace with aggregate sum:

df['Edad'] = df['Edad'].replace({'Menos de 1 año':'De 1 a 4 años'})

df = df.groupby(['Causa de muerte','Sexo','Edad','Periodo'], as_index=False)['Total'].sum()
print (df)
Causa de muerte Sexo Edad Periodo Total
0 001-102 I-XXII.Todas las causas Total De 1 a 4 años 2016 1368
1 001-102 I-XXII.Todas las causas Total De 1 a 4 años 2017 1318
2 001-102 I-XXII.Todas las causas Total De 1 a 4 años 2018 1267

merge some rows in two conditions

I solved this by make the row null then remove it from CSV

df = pd.read_csv('test.csv', encoding='utf-8')

with open('output.csv', mode='w', newline='', encoding='utf-16') as f:
writer = csv.writer(f, delimiter=' ')
rows = []
for i, data in enumerate(df['Sentence']):
if i + 1 == len(df['Sentence']):
writer.writerow([data])
elif len(df['Sentence'][i + 1]) < 19:
writer.writerow([data + df['Sentence'][i + 1]])
df['Sentence'][i + 1] = ''
elif len(df['Sentence'][i + 1]) >= 19:
writer.writerow([data])

Pandas: merging rows with condition

Use numpy.sort first and then GroupBy.agg:

df[['A','B']] = np.sort(df[['A','B']], axis=1)

df = df.groupby(['A','B'], as_index=False).agg({'C':'sum', 'D':'mean'})
print (df)
A B C D
0 1 2 12 0.625
1 3 4 5 0.300

If original values cannot be changed:

arr = np.sort(df[['A','B']], axis=1)

df = (df.groupby([arr[:, 0],arr[:, 1]])
.agg({'C':'sum', 'D':'mean'})
.rename_axis(('A','B'))
.reset_index())
print (df)
A B C D
0 1 2 12 0.625
1 3 4 5 0.300


Related Topics



Leave a reply



Submit