Consolidate Duplicate Rows

Consolidate duplicate rows

This works:

library(plyr)
ddply(df,"x",numcolwise(sum))

in words: (1) split the data frame df by the "x" column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd in ddply stands for "take a d ata frame as input, return a d ata frame")

Another, possibly clearer, approach:

aggregate(y~x,data=df,FUN=sum)

See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.

Consolidate duplicate rows into one by applying a formula

If I am interpreting your question correctly, each row in your current data frame represents a measurement of diam at a particular location. There are a number of unique locations which are defined by their x, y values, but some of these locations have multiple rows in your data frame representing multiple measurements at the same site. You would like to be able to summarize the diam values at each unique location by taking each location's vector of diam measurements at each site and applying some function that returns a single value (such as sum or mean).

You can do this very easily with the dplyr package. You can group_by each unique location then summarize all the values of diam at each x, y location.

In the following example, I have used a simple sum of all the diameters, but you could change this to any function that takes a numeric vector as input and gives a single numeric output (such as max , mean, median etc):

library(dplyr)
library(ggplot2)

forest %>%
group_by(x, y) %>%
summarize(diam = sum(diam)) %>%
ggplot() +
geom_point(aes(x, y, size = diam))

Sample Image


EDIT

The function for finding a single equivalent diameter from several individual diameters would be:

sum_diams <- function(x) 2 * sqrt(sum((x / 2)^2))

So your code would become:

library(dplyr)
library(ggplot2)

sum_diams <- function(x) 2 * sqrt(sum((x / 2)^2))

forest %>%
group_by(x, y) %>%
summarize(diam = sum_diams(diam)) %>%
ggplot() +
geom_point(aes(x, y, size = diam))

FURTHER EDIT

To store the modified data frame, you can do:

new_forest <- forest %>% 
group_by(x, y) %>%
summarize(diam = sum_diams(diam))

If you want to plot it, you can do:

ggplot(new_forest) +
geom_point(aes(x, y, size = diam))

And if you want to analyze it further, your data frame new_forest is still in memory.

How to combine duplicate rows in pandas?

I think need sort_values with drop_duplicates:

df = df.sort_values(['c1','c2']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
4 20.0 200
2 30.0 300

Or first remove rows with NaNs by dropna:

df = df.dropna(subset=['c1']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200

df = df.dropna(subset=['c1']).drop_duplicates(['c1','c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200

combine duplicate rows in apps script

Here is the script:

const HDR = 1 // header height
const GROUP_COL = 1 // in which column is the grouping value

function onEdit(e){
// we are interested only if value is not empty and the content of colum GROUP_COL has been changed
if(!e.value || e.value===e.oldValue || e.range.columnStart!==GROUP_COL || e.range.rowStart<=HDR){
return
}

const sh = SpreadsheetApp.getActiveSheet()
const values = sh.getDataRange().getValues().slice(HDR-1)
const existingRows = values.map((r,i)=>({ row:HDR+i, cells:r, found:r[GROUP_COL-1]==e.value })).filter(x=>x.found)
console.log(existingRows)
if(existingRows.length===0){
return
}

const destRange = sh.getRange(e.range.rowStart,1,1,existingRows[0].cells.length)
const destRow = destRange.getValues()[0]

// merge content into the new row
existingRows.forEach(er=>{
er.cells.forEach((c,i)=>{
if(i===GROUP_COL-1 || er.row===e.range.rowStart){
return
}
destRow[i] = (destRow[i] + "\n" + er.cells[i]).trim()
})
})

destRange.setValues([destRow])

// delete extra rows
existingRows.forEach((er,i)=>{
if(er.row!==e.range.rowStart){
sh.deleteRow(er.row-i)
}
})
}

Combine duplicate rows and SUM QTY by Time stamp

You may try grouping by your TRK_ID and HRS before finding the sum of IN_QTY eg

Select TRK_ID, SUM(IN_QTY) as IN_QTY, TRUNC(LOT_DTTM, 'hh') as HRS
from TRK_ID_LOT
WHERE facility in 'DP1DM5'
and trk_id like ('AE%')
and lot_dttm > sysdate - 1
GROUP BY TRK_ID, TRUNC(LOT_DTTM, 'hh')

Combine duplicate rows in a loop vba

Create a primary key by joining the 2 columns with tilde ~ and use a Dictionary Object to locate duplicates.

Option Explicit

Private Sub CommandButton1_Click()

Dim wb As Workbook, ws As Worksheet
Dim iLastRow As Long, iRow As Long, iTarget As Long

Set wb = ThisWorkbook
Set ws = wb.Sheets("Sheet2")
iLastRow = ws.Cells(Rows.Count, "H").End(xlUp).Row

Dim dict As Object, sKey As String
Set dict = CreateObject("Scripting.Dictionary")

' build dictionary and
' consolidate any existing duplicates, scan up
For iRow = iLastRow To 3 Step -1

' create composite primary key
sKey = LCase(ws.Cells(iRow, 1).Value) & "~" & Format(ws.Cells(iRow, 3).Value, "yyyy-mm-dd")

If dict.exists(sKey) Then
iTarget = dict(sKey)
' summate and delete
ws.Cells(iTarget, 2) = ws.Cells(iTarget, 2) + ws.Cells(iRow, 2)
ws.Cells(iTarget, 8) = ws.Cells(iTarget, 8) + ws.Cells(iRow, 8)
ws.Rows(iRow).EntireRow.Delete
Else
dict(sKey) = iRow
End If
Next

' add new record from form using dictionary to locate any existing
iLastRow = ws.Cells(Rows.Count, "H").End(xlUp).Row
sKey = LCase(TextBox1.Text) & "~" & Format(DateValue(TextBox3.Text), "yyyy-mm-dd")
If dict.exists(sKey) Then
iTarget = dict(sKey)
ws.Cells(iTarget, 2) = ws.Cells(iTarget, 2) + TextBox2.Text
ws.Cells(iTarget, 8) = ws.Cells(iTarget, 8) + TextBox2.Text
Else
iTarget = iLastRow + 1
ws.Cells(iTarget, 1) = TextBox1.Text
ws.Cells(iTarget, 2) = TextBox2.Text
ws.Cells(iTarget, 3) = TextBox3.Text
ws.Cells(iTarget, 8) = TextBox2.Text
End If

End Sub




Related Topics



Leave a reply



Submit