Consolidate duplicate rows
This works:
library(plyr)
ddply(df,"x",numcolwise(sum))
in words: (1) split the data frame df
by the "x"
column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd
in ddply
stands for "take a d ata frame as input, return a d ata frame")
Another, possibly clearer, approach:
aggregate(y~x,data=df,FUN=sum)
See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.
Consolidate duplicate rows into one by applying a formula
If I am interpreting your question correctly, each row in your current data frame represents a measurement of diam
at a particular location. There are a number of unique locations which are defined by their x, y values, but some of these locations have multiple rows in your data frame representing multiple measurements at the same site. You would like to be able to summarize the diam
values at each unique location by taking each location's vector of diam
measurements at each site and applying some function that returns a single value (such as sum
or mean
).
You can do this very easily with the dplyr
package. You can group_by
each unique location then summarize
all the values of diam
at each x, y location.
In the following example, I have used a simple sum
of all the diameters, but you could change this to any function that takes a numeric vector as input and gives a single numeric output (such as max
, mean
, median
etc):
library(dplyr)
library(ggplot2)
forest %>%
group_by(x, y) %>%
summarize(diam = sum(diam)) %>%
ggplot() +
geom_point(aes(x, y, size = diam))
EDIT
The function for finding a single equivalent diameter from several individual diameters would be:
sum_diams <- function(x) 2 * sqrt(sum((x / 2)^2))
So your code would become:
library(dplyr)
library(ggplot2)
sum_diams <- function(x) 2 * sqrt(sum((x / 2)^2))
forest %>%
group_by(x, y) %>%
summarize(diam = sum_diams(diam)) %>%
ggplot() +
geom_point(aes(x, y, size = diam))
FURTHER EDIT
To store the modified data frame, you can do:
new_forest <- forest %>%
group_by(x, y) %>%
summarize(diam = sum_diams(diam))
If you want to plot it, you can do:
ggplot(new_forest) +
geom_point(aes(x, y, size = diam))
And if you want to analyze it further, your data frame new_forest
is still in memory.
How to combine duplicate rows in pandas?
I think need sort_values
with drop_duplicates
:
df = df.sort_values(['c1','c2']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
4 20.0 200
2 30.0 300
Or first remove rows with NaN
s by dropna
:
df = df.dropna(subset=['c1']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200
df = df.dropna(subset=['c1']).drop_duplicates(['c1','c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200
combine duplicate rows in apps script
Here is the script:
const HDR = 1 // header height
const GROUP_COL = 1 // in which column is the grouping value
function onEdit(e){
// we are interested only if value is not empty and the content of colum GROUP_COL has been changed
if(!e.value || e.value===e.oldValue || e.range.columnStart!==GROUP_COL || e.range.rowStart<=HDR){
return
}
const sh = SpreadsheetApp.getActiveSheet()
const values = sh.getDataRange().getValues().slice(HDR-1)
const existingRows = values.map((r,i)=>({ row:HDR+i, cells:r, found:r[GROUP_COL-1]==e.value })).filter(x=>x.found)
console.log(existingRows)
if(existingRows.length===0){
return
}
const destRange = sh.getRange(e.range.rowStart,1,1,existingRows[0].cells.length)
const destRow = destRange.getValues()[0]
// merge content into the new row
existingRows.forEach(er=>{
er.cells.forEach((c,i)=>{
if(i===GROUP_COL-1 || er.row===e.range.rowStart){
return
}
destRow[i] = (destRow[i] + "\n" + er.cells[i]).trim()
})
})
destRange.setValues([destRow])
// delete extra rows
existingRows.forEach((er,i)=>{
if(er.row!==e.range.rowStart){
sh.deleteRow(er.row-i)
}
})
}
Combine duplicate rows and SUM QTY by Time stamp
You may try grouping by your TRK_ID
and HRS
before finding the sum of IN_QTY
eg
Select TRK_ID, SUM(IN_QTY) as IN_QTY, TRUNC(LOT_DTTM, 'hh') as HRS
from TRK_ID_LOT
WHERE facility in 'DP1DM5'
and trk_id like ('AE%')
and lot_dttm > sysdate - 1
GROUP BY TRK_ID, TRUNC(LOT_DTTM, 'hh')
Combine duplicate rows in a loop vba
Create a primary key by joining the 2 columns with tilde ~ and use a Dictionary Object to locate duplicates.
Option Explicit
Private Sub CommandButton1_Click()
Dim wb As Workbook, ws As Worksheet
Dim iLastRow As Long, iRow As Long, iTarget As Long
Set wb = ThisWorkbook
Set ws = wb.Sheets("Sheet2")
iLastRow = ws.Cells(Rows.Count, "H").End(xlUp).Row
Dim dict As Object, sKey As String
Set dict = CreateObject("Scripting.Dictionary")
' build dictionary and
' consolidate any existing duplicates, scan up
For iRow = iLastRow To 3 Step -1
' create composite primary key
sKey = LCase(ws.Cells(iRow, 1).Value) & "~" & Format(ws.Cells(iRow, 3).Value, "yyyy-mm-dd")
If dict.exists(sKey) Then
iTarget = dict(sKey)
' summate and delete
ws.Cells(iTarget, 2) = ws.Cells(iTarget, 2) + ws.Cells(iRow, 2)
ws.Cells(iTarget, 8) = ws.Cells(iTarget, 8) + ws.Cells(iRow, 8)
ws.Rows(iRow).EntireRow.Delete
Else
dict(sKey) = iRow
End If
Next
' add new record from form using dictionary to locate any existing
iLastRow = ws.Cells(Rows.Count, "H").End(xlUp).Row
sKey = LCase(TextBox1.Text) & "~" & Format(DateValue(TextBox3.Text), "yyyy-mm-dd")
If dict.exists(sKey) Then
iTarget = dict(sKey)
ws.Cells(iTarget, 2) = ws.Cells(iTarget, 2) + TextBox2.Text
ws.Cells(iTarget, 8) = ws.Cells(iTarget, 8) + TextBox2.Text
Else
iTarget = iLastRow + 1
ws.Cells(iTarget, 1) = TextBox1.Text
ws.Cells(iTarget, 2) = TextBox2.Text
ws.Cells(iTarget, 3) = TextBox3.Text
ws.Cells(iTarget, 8) = TextBox2.Text
End If
End Sub
Related Topics
Concatenate Several Columns to Comma Separated Strings by Group
Different Colours of Geom_Line Above and Below a Specific Value
Error in Plot.Window(...):Need Finite 'Xlim' Values
Use Rle to Group by Runs When Using Dplyr
Differencebetween [ ] and [[ ]] in R
Render Dropdown for Single Column in Dt Shiny
How to Show Only Part of the Plot Area of Polar Ggplot with Facet
Ggplot, Drawing Multiple Lines Across Facets
Add Secondary X Axis Labels to Ggplot with One X Axis
Get X-Value Given Y-Value: General Root Finding for Linear/Non-Linear Interpolation Function
How to Rotate a Plot in R (Base Graphics)
Seeking Workaround for Gtable_Add_Grob Code Broken by Ggplot 2.2.0
Spread with Duplicate Identifiers (Using Tidyverse and %>%)