Filter Group of Rows Based on Sum of Values from Different Column

Filter group of rows based on sum of values from different column

We need to get the sum of 'FREQUENCY' and check whether it is greater than 5 in the filter after grouping by 'HEADWORD'

Words1 %>% 
     group_by(HEADWORD) %>% 
     filter(sum(FREQUENCY) >5)   
#   HEADWORD VARIANT FREQUENCY
#     <chr>   <chr>     <int>
#1   KNIGHT  knight         6
#2   KNIGHT   kniht         2 
#3   KNIGHT    knyt         1

calculate sum of a column after filtering by and grouping on other columns

IIUC, you can try query where role value is senior then use groupby.transform

df['sum'] = (df.query('role == "senior"')
             .groupby('id')['value'].transform('sum'))

print(df)

   id    role  value  sum
0   1  junior      2  NaN
1   1  senior      3  7.0
2   1  senior      4  7.0
3   2  junior      2  NaN
4   2  senior      6  8.0
5   2  senior      2  8.0

Filter pandas column with current row values and sum another column to form a new column

IIUC, use a GroupBy+expanding.sum after sorting the data on the dates (recent to ancient):

# ensure datetime (although this format could be also sorted as string)
df['Date'] = pd.to_datetime(df['Date'])

df['sum'] = (df
 .sort_values(by='Date', ascending=False)      # reverse values
 .groupby(['Area'])['Value'].expanding().sum() # sum recent values
 .droplevel(0)
)

output:

        Date Area  Value   sum
0 2021-01-01  ABC     10  40.0
1 2021-02-01  BCD     20  45.0
2 2021-03-01  ABC     15  30.0
3 2021-04-01  BCD     25  25.0
4 2021-05-01  ABC     15  15.0

Group rows by column and sum another column within groups

You'll have to use array_walk() to modify the array. array_reduce() is to calculate a single value and not to change the array itself.

I would do something like this:

<?php

$array = [
    [
        'tag_id' => "6291",
        'az' => 5,
    ],
    [
        'tag_id' => "6291",
        'az' => 4,
    ],
    [
        'tag_id' => "6311",
        'az' => 4,
    ],
    [
        'tag_id' => "6427",
        'az' => 4,
    ]
];

$tag_id_indexes = []; // To store the index of the first tag_id found.

array_walk(
    $array,
    function ($sub_array, $index) use (&$array, &$tag_id_indexes) {
        // Store the index of the first tag_id found.
        if (!isset($tag_id_indexes[$sub_array['tag_id']])) {
            $tag_id_indexes[$sub_array['tag_id']] = $index;
        }
        else { // This tag_id already exists so we'll combine it.
            // Get the index of the previous tag_id.
            $first_tag_id_index = $tag_id_indexes[$sub_array['tag_id']];
            // Sum the az value.
            $array[$first_tag_id_index]['az'] += $sub_array['az'];
            // Remove this entry.
            unset($array[$index]);
        }
    }
);

print "The reduced array but with the original indexes:\n" . var_export($array, true) . "\n";

// If you want new indexes.
$array = array_values($array);

print "The reduced array with new indexes:\n" . var_export($array, true) . "\n";

You can test it here: https://onlinephp.io/c/58a11

This is the output:

The reduced array but with the original indexes:
array (
  0 => 
  array (
    'tag_id' => '6291',
    'az' => 9,
  ),
  2 => 
  array (
    'tag_id' => '6311',
    'az' => 4,
  ),
  3 => 
  array (
    'tag_id' => '6427',
    'az' => 4,
  ),
)
The reduced array with new indexes:
array (
  0 => 
  array (
    'tag_id' => '6291',
    'az' => 9,
  ),
  1 => 
  array (
    'tag_id' => '6311',
    'az' => 4,
  ),
  2 => 
  array (
    'tag_id' => '6427',
    'az' => 4,
  ),
)

How to groupby, and filter a dataframe based on the sum?

g['Trade Value (US$)'].min() >= 2000000 filters everything out, because it means the minimum must be greater than 2000000.
Use pandas.Grouper to groupby Period with a specified frequency.
pandas.core.groupby.DataFrameGroupBy.filter to filter based on the sum of 'Trade Value (US$)'.
- x['Trade Value (US$)'].sum() > 2000000 is the filter function. It can be put into an external def function, but it's not necessary.
Commodity Code can also be added to the groupby:
- groupby(['Partner', 'Commodity Code', pd.Grouper(key='Period', freq='1M')])

import pandas as pd

# load the data
df = pd.read_csv('https://raw.githubusercontent.com/trenton3983/stack_overflow/master/data/so_data/2020-09-01%2063694704/comtrade.csv', dtype={'Commodity Code': str})

# select desired columns
df = df.loc[:, ['Period', 'Reporter', 'Partner', 'Commodity', 'Commodity Code', 'Trade Value (US$)']]

# convert Period to datetime format
df.Period = pd.to_datetime(df.Period, format='%Y%m')

# display(df.head(3))
      Period        Reporter    Partner                                                                               Commodity Commodity Code  Trade Value (US$)
0 2014-09-01  United Kingdom      World  Milk and cream; not concentrated nor containing added sugar or other sweetening matter           0401           33279381
1 2014-09-01  United Kingdom  Australia  Milk and cream; not concentrated nor containing added sugar or other sweetening matter           0401               4558
2 2014-09-01  United Kingdom    Austria  Milk and cream; not concentrated nor containing added sugar or other sweetening matter           0401                290

# groupby Partner and month, and filter by sum of Trade value > 2000000
df_filtered = df.groupby(['Partner', pd.Grouper(key='Period', freq='1M')]).filter(lambda x: x['Trade Value (US$)'].sum() > 2000000)

# verify the period Trade Value sums per partner per month are > 2000000
df_filtered.groupby(['Partner', pd.Grouper(key='Period', freq='1M')]).agg({'Trade Value (US$)': sum})

[out]:
                                 Trade Value (US$)
Partner              Period                       
Algeria              2014-01-31            4792662
                     2014-02-28            7220679
                     2014-03-31            9835523
                     2014-04-30           14875816
                     2014-05-31           19656679
                     2014-06-30           22411564
                     2014-07-31            3214364
                     2014-10-31            4074424
                     2014-11-30            2107597
                     2014-12-31            3464600
Angola               2014-03-31            2324977
                     2014-12-31            2030001
Belgium              2014-01-31           14531571
                     2014-02-28            6955784
                     2014-03-31            9576248
                     2014-04-30            8569745
                     2014-05-31            7635442
                     2014-06-30            5435766
                     2014-07-31            5128432
                     2014-08-31            5169545
                     2014-09-30            5707207
                     2014-10-31            4982965
                     2014-11-30            8547975
                     2014-12-31            5441072
China                2014-03-31            2460056
                     2014-07-31            2778780
                     2014-09-30            3008491
                     2014-10-31            4777912
                     2014-11-30            3774279
                     2014-12-31            3045122
China, Hong Kong SAR 2014-01-31            2170443
                     2014-07-31            2048469
                     2014-11-30            2049788
Côte d'Ivoire        2014-03-31            2842636
                     2014-06-30            2499308
                     2014-08-31            2173727
                     2014-09-30            2322223
Denmark              2014-01-31            2399943
                     2014-02-28            2136906
                     2014-03-31            2523950
                     2014-04-30            2523958
                     2014-05-31            2490132
                     2014-06-30            2191829
                     2014-07-31            3180516
                     2014-08-31            2497068
                     2014-09-30            3052401
                     2014-10-31            3019545
                     2014-11-30            2929672
                     2014-12-31            4497179
France               2014-01-31           12651302
                     2014-02-28           10284508
                     2014-03-31           14342231
                     2014-04-30           12846655
                     2014-05-31           12826328
                     2014-06-30           11756821
                     2014-07-31           13075198
                     2014-08-31            9966348
                     2014-09-30           10636585
                     2014-10-31           11120326
                     2014-11-30           10612800
                     2014-12-31            9512056
Germany              2014-01-31            9744449
                     2014-02-28            7688820
                     2014-03-31            8956210
                     2014-04-30           10604432
                     2014-05-31           10207829
                     2014-06-30           10104134
                     2014-07-31            7074641
                     2014-08-31            7768101
                     2014-09-30           12061074
                     2014-10-31           13060791
                     2014-11-30            8306606
                     2014-12-31            7132246
Ghana                2014-01-31            2389385
Guinea               2014-04-30            2098146
                     2014-05-31            2179330
Ireland              2014-01-31           57621249
                     2014-02-28           53529377
                     2014-03-31           52525722
                     2014-04-30           55134986
                     2014-05-31           57244611
                     2014-06-30           56814970
                     2014-07-31           52322023
                     2014-08-31           45421969
                     2014-09-30           51185200
                     2014-10-31           38818201
                     2014-11-30           37431831
                     2014-12-31           37494188
Lebanon              2014-07-31            2359805
Netherlands          2014-01-31           15376408
                     2014-02-28            9160546
                     2014-03-31           11064742
                     2014-04-30           15584558
                     2014-05-31           13182208
                     2014-06-30           14262841
                     2014-07-31           10843821
                     2014-08-31            7521907
                     2014-09-30            8164473
                     2014-10-31           13886896
                     2014-11-30           14965454
                     2014-12-31            6844463
Nigeria              2014-08-31            4676807
Poland               2014-09-30            2680608
                     2014-11-30            2694120
Spain                2014-01-31            2075305
                     2014-09-30            3185937
                     2014-10-31            2421800
                     2014-11-30            2318918
World                2014-01-31          139512730
                     2014-02-28          111789785
                     2014-03-31          131100878
                     2014-04-30          139406387
                     2014-05-31          144276262
                     2014-06-30          144420208
                     2014-07-31          117675469
                     2014-08-31          102032532
                     2014-09-30          117302843
                     2014-10-31          113368963
                     2014-11-30          106377174
                     2014-12-31           95273667
Yemen                2014-08-31            3311725

Resources

Comtrade Data Analysis - This is where I found out how to get the data
UN Comtrade Database - Data available here
- Type of Product: goods
- Frequency: monthly
- Periods: all of 2014
- Reporter: United Kingdom
- Partners: all
- Flows: imports and exports
- HS (as reported) commodity codes: 0401 (Milk and cream, neither concentrated nor sweetened) and 0402 (Milk and cream, concentrated or sweetened)
- Clicking on 'Preview' results in a message that the data exceeds 500 rows. Data was downloaded using the Download CSV button and the download file renamed appropriately.

R: Calculate sum over a column based on groups for panel data where one group has no data

EDIT: OP wants to keep the data that is Category == NA, so maybe this solution?

data_noNA <- data %>%
  group_by(Category, Date) %>%
  dplyr::summarize(Sum_Size = sum(Size, na.rm = TRUE)) %>%
  filter(!is.na(Category)) %>%
  # add back in info from missing columns after summarize
  left_join(data, by = c("Category", "Date"))

data2 <- bind_rows(data_noNA, data %>% filter(is.na(Category))); data2
# A tibble: 18 x 5
# Groups:   Category [5]
   Category Date       Sum_Size Name   Size
      <int> <chr>         <int> <chr> <int>
 1        1 01.09.2018       34 A        34
 2        1 02.09.2018       23 A        23
 3        2 02.09.2018       23 C        23
 4        2 05.11.2021       12 C        12
 5        2 06.11.2021       35 A        23
 6        2 06.11.2021       35 C        12
 7        2 07.11.2021       53 A        53
 8        3 01.09.2018       23 B        23
 9        3 02.09.2018       54 B        54
10        3 03.09.2018       65 B        65
11        4 01.09.2018       45 C        45
12        4 07.11.2021       45 B        45
13       NA 03.09.2018       NA A        12
14       NA 05.11.2021       NA A        53
15       NA 05.11.2021       NA B        75
16       NA 06.11.2021       NA B        67
17       NA 03.09.2018       NA C        23
18       NA 07.11.2021       NA C        NA

Something like this?

library(tidyverse)
data <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "B", "B", 
                        "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), Date = c("01.09.2018", 
                                                                                    "02.09.2018", "03.09.2018", "05.11.2021", "06.11.2021", "07.11.2021", 
                                                                                    "01.09.2018", "02.09.2018", "03.09.2018", "05.11.2021", "06.11.2021", 
                                                                                    "07.11.2021", "01.09.2018", "02.09.2018", "03.09.2018", "05.11.2021", 
                                                                                    "06.11.2021", "07.11.2021"), Category = c(1L, 1L, NA, NA, 2L, 
                                                                                                                              2L, 3L, 3L, 3L, NA, NA, 4L, 4L, 2L, NA, 2L, 2L, NA), Size = c(34L, 
                                                                                                                                                                                            23L, 12L, 53L, 23L, 53L, 23L, 54L, 65L, 75L, 67L, 45L, 45L, 23L, 
                                                                                                                                                                                            23L, 12L, 12L, NA)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                     -18L))
data2 <- data %>%
  group_by(Category, Date) %>%
  dplyr::summarize(Sum_Size = sum(Size, na.rm = TRUE)) %>%
  filter(!is.na(Category)); data2
#> `summarise()` has grouped output by 'Category'. You can override using the
#> `.groups` argument.
#> # A tibble: 11 x 3
#> # Groups:   Category [4]
#>    Category Date       Sum_Size
#>       <int> <chr>         <int>
#>  1        1 01.09.2018       34
#>  2        1 02.09.2018       23
#>  3        2 02.09.2018       23
#>  4        2 05.11.2021       12
#>  5        2 06.11.2021       35
#>  6        2 07.11.2021       53
#>  7        3 01.09.2018       23
#>  8        3 02.09.2018       54
#>  9        3 03.09.2018       65
#> 10        4 01.09.2018       45
#> 11        4 07.11.2021       45

^{Created on 2022-04-16 by the reprex package (v2.0.1)}

R- filter rows depending on value range across several columns

First test if values in columns are greater or equal 5 and less or equal than 10, then look for rows with 3 or more that fit the condition.

dat[ rowSums( dat >= 5 & dat <= 10 ) >= 3, ]
  column1 column2 column3 column4 column5
1       7       4      10       9       2

Data

dat <- structure(list(column1 = c(7L, 4L), column2 = c(4L, 8L), column3 = c(10L, 
2L), column4 = c(9L, 6L), column5 = c(2, 2)), class = "data.frame", row.names = c(NA, 
-2L))

Filter Group of Rows Based on Sum of Values from Different Column