Consolidate Duplicate Rows

Consolidate duplicate rows

This works:

library(plyr)
ddply(df,"x",numcolwise(sum))

in words: (1) split the data frame df by the "x" column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd in ddply stands for "take a d ata frame as input, return a d ata frame")

Another, possibly clearer, approach:

aggregate(y~x,data=df,FUN=sum)

See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.

Consolidate duplicate rows into one by applying a formula

If I am interpreting your question correctly, each row in your current data frame represents a measurement of diam at a particular location. There are a number of unique locations which are defined by their x, y values, but some of these locations have multiple rows in your data frame representing multiple measurements at the same site. You would like to be able to summarize the diam values at each unique location by taking each location's vector of diam measurements at each site and applying some function that returns a single value (such as sum or mean).

You can do this very easily with the dplyr package. You can group_by each unique location then summarize all the values of diam at each x, y location.

In the following example, I have used a simple sum of all the diameters, but you could change this to any function that takes a numeric vector as input and gives a single numeric output (such as max , mean, median etc):

library(dplyr)
library(ggplot2)

forest %>% 
  group_by(x, y) %>% 
  summarize(diam = sum(diam)) %>%
  ggplot() +
  geom_point(aes(x, y, size = diam))

Sample Image

EDIT

The function for finding a single equivalent diameter from several individual diameters would be:

sum_diams <- function(x) 2 * sqrt(sum((x / 2)^2))

So your code would become:

library(dplyr)
library(ggplot2)

sum_diams <- function(x) 2 * sqrt(sum((x / 2)^2))

forest %>% 
  group_by(x, y) %>% 
  summarize(diam = sum_diams(diam)) %>%
  ggplot() +
  geom_point(aes(x, y, size = diam))

FURTHER EDIT

To store the modified data frame, you can do:

new_forest <- forest %>% 
  group_by(x, y) %>% 
  summarize(diam = sum_diams(diam))

If you want to plot it, you can do:

ggplot(new_forest) +
  geom_point(aes(x, y, size = diam))

And if you want to analyze it further, your data frame new_forest is still in memory.

How to combine duplicate rows in pandas?

I think need sort_values with drop_duplicates:

df = df.sort_values(['c1','c2']).drop_duplicates(['c2'])
print (df)
     c1   c2
0  10.0  100
4  20.0  200
2  30.0  300

Or first remove rows with NaNs by dropna:

df = df.dropna(subset=['c1']).drop_duplicates(['c2'])
print (df)
     c1   c2
0  10.0  100
2  30.0  300
4  20.0  200

df = df.dropna(subset=['c1']).drop_duplicates(['c1','c2'])
print (df)
     c1   c2
0  10.0  100
2  30.0  300
4  20.0  200

combine duplicate rows in apps script

Here is the script:

const HDR = 1 // header height
const GROUP_COL = 1 // in which column is the grouping value

function onEdit(e){      
  // we are interested only if value is not empty and the content of colum GROUP_COL has been changed
  if(!e.value || e.value===e.oldValue || e.range.columnStart!==GROUP_COL || e.range.rowStart<=HDR){
    return
  }

  const sh = SpreadsheetApp.getActiveSheet()
  const values = sh.getDataRange().getValues().slice(HDR-1)
  const existingRows = values.map((r,i)=>({ row:HDR+i, cells:r, found:r[GROUP_COL-1]==e.value })).filter(x=>x.found)
  console.log(existingRows)
  if(existingRows.length===0){
    return
  }

  const destRange = sh.getRange(e.range.rowStart,1,1,existingRows[0].cells.length)
  const destRow = destRange.getValues()[0]

  // merge content into the new row
  existingRows.forEach(er=>{
    er.cells.forEach((c,i)=>{
      if(i===GROUP_COL-1 || er.row===e.range.rowStart){
        return
      }
      destRow[i] = (destRow[i] + "\n" + er.cells[i]).trim()
    })
  })

  destRange.setValues([destRow])

  // delete extra rows
  existingRows.forEach((er,i)=>{
    if(er.row!==e.range.rowStart){
      sh.deleteRow(er.row-i)
    }
  })
}

Combine duplicate rows and SUM QTY by Time stamp

You may try grouping by your TRK_ID and HRS before finding the sum of IN_QTY eg

Select TRK_ID, SUM(IN_QTY) as IN_QTY, TRUNC(LOT_DTTM, 'hh') as HRS
from TRK_ID_LOT
WHERE facility in 'DP1DM5'
and trk_id like ('AE%')
and lot_dttm > sysdate - 1
GROUP BY TRK_ID, TRUNC(LOT_DTTM, 'hh')

Combine duplicate rows in a loop vba

Create a primary key by joining the 2 columns with tilde ~ and use a Dictionary Object to locate duplicates.

Option Explicit

Private Sub CommandButton1_Click()

    Dim wb As Workbook, ws As Worksheet
    Dim iLastRow As Long, iRow As Long, iTarget As Long

    Set wb = ThisWorkbook
    Set ws = wb.Sheets("Sheet2")
    iLastRow = ws.Cells(Rows.Count, "H").End(xlUp).Row

    Dim dict As Object, sKey As String
    Set dict = CreateObject("Scripting.Dictionary")

    ' build dictionary and
    ' consolidate any existing duplicates, scan up
    For iRow = iLastRow To 3 Step -1

        ' create composite primary key
        sKey = LCase(ws.Cells(iRow, 1).Value) & "~" & Format(ws.Cells(iRow, 3).Value, "yyyy-mm-dd")

        If dict.exists(sKey) Then
            iTarget = dict(sKey)
            ' summate and delete
            ws.Cells(iTarget, 2) = ws.Cells(iTarget, 2) + ws.Cells(iRow, 2)
            ws.Cells(iTarget, 8) = ws.Cells(iTarget, 8) + ws.Cells(iRow, 8)
            ws.Rows(iRow).EntireRow.Delete
        Else
            dict(sKey) = iRow
        End If
    Next

    ' add new record from form using dictionary to locate any existing
    iLastRow = ws.Cells(Rows.Count, "H").End(xlUp).Row
    sKey = LCase(TextBox1.Text) & "~" & Format(DateValue(TextBox3.Text), "yyyy-mm-dd")
    If dict.exists(sKey) Then
        iTarget = dict(sKey)
        ws.Cells(iTarget, 2) = ws.Cells(iTarget, 2) + TextBox2.Text
        ws.Cells(iTarget, 8) = ws.Cells(iTarget, 8) + TextBox2.Text
    Else
        iTarget = iLastRow + 1
        ws.Cells(iTarget, 1) = TextBox1.Text
        ws.Cells(iTarget, 2) = TextBox2.Text
        ws.Cells(iTarget, 3) = TextBox3.Text
        ws.Cells(iTarget, 8) = TextBox2.Text
    End If

End Sub