Inserting a New Row to Data Frame for Each Group Id

Inserting a new row to data frame for each group id

We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', get the last row with tail, assign the 'latitude' and 'longitude' with the new values, rbind with the original dataset and order by 'id'.

library(data.table)
rbind(setDT(df1), df1[, tail(.SD, 1) , by = id
][, c("latitude", "longitude") := .(394681.4, 6017550)
])[order(id)]
# id time latitude longitude
#1: A 11:10 381746.0 6008345
#2: A 11:11 381726.2 6008294
#3: A 11:11 394681.4 6017550
#4: B 10:56 381703.0 6008214
#5: B 10:57 381679.7 6008134
#6: B 10:57 394681.4 6017550
#7: C 4:30 381654.4 6008083
#8: C 4:31 381629.2 6008033
#9: C 4:31 394681.4 6017550

Or using dplyr, with similar methodology

library(dplyr)
df1 %>%
group_by(id) %>%
summarise(time = last(time)) %>%
mutate(latitude = 394681.4, longitude = 6017550) %>%
bind_rows(df1, .) %>%
arrange(id)

Add a new row for each id in dataframe for ALL variables

library(dplyr)
library(purrr)
df %>% mutate_if(is.factor, as.character) %>%
group_split(id) %>%
map_dfr(~bind_rows(.x, data.frame(id=.x$id[1], trt="base", stringsAsFactors = FALSE)))

#Note that group_modify is Experimental
df %>% mutate_if(is.factor, as.character) %>%
group_by(id) %>%
group_modify(~bind_rows(.x, data.frame(trt="base", stringsAsFactors = FALSE)))

How to add a row to each group and assign values

According to the documentation of the function group_modify, if you use a formula, you must use ". or .x to refer to the subset of rows of .tbl for the given group;" that's why you used .x inside the add_row function. To be entirely consistent, you have to do it also within the first function.

df %>% 
group_by(id) %>%
group_modify(~ add_row(A=4, B=first(.x$B), .x))

# A tibble: 6 x 3
# Groups: id [3]
id A B
<chr> <dbl> <dbl>
1 one 1 4
2 one 4 4
3 three 3 6
4 three 4 6
5 two 2 5
6 two 4 5

Using first(.$B) or first(df$B) will provide the same results.

Add row for each group

You can try

library(tidyverse)
d <- data.frame(Serial_number=gl(5, 2), Amplification=1:10, Voltage=20:11)

d %>%
split(.$Serial_number) %>%
map(~add_row(., Serial_number=unique(.$Serial_number), Amplification=NA, Voltage=NA)) %>%
bind_rows()
Serial_number Amplification Voltage
1 1 1 20
2 1 2 19
3 1 NA NA
4 2 3 18
5 2 4 17
6 2 NA NA
7 3 5 16
8 3 6 15
9 3 NA NA
10 4 7 14
11 4 8 13
12 4 NA NA
13 5 9 12
14 5 10 11
15 5 NA NA

Using base R you can try

do.call("rbind", lapply(split(d, d$Serial_number), function(x) rbind(x, c(1, NA, NA))))

Of course you can rbind a new dataframe as mentioned in the comments by Shique

rbind(d,
data.frame(Serial_number=unique(d$Serial_number), Amplification=NA, Voltage=NA))

Then order the dataframe using d[order(d$Serial_number),]

Add row in each group using dplyr and add_row()

If you want to use a grouped operation, you need do like JasonWang described in his comment, as other functions like mutate or summarise expect a result with the same number of rows as the grouped data frame (in your case, 50) or with one row (e.g. when summarising).

As you probably know, in general do can be slow and should be a last resort if you cannot achieve your result in another way. Your task is quite simple because it only involves adding extra rows in your data frame, which can be done by simple indexing, e.g. look at the output of iris[NA, ].

What you want is essentially to create a vector

indices <- c(NA, 1:50, NA, 51:100, NA, 101:150)

(since the first group is in rows 1 to 50, the second one in 51 to 100 and the third one in 101 to 150).

The result is then iris[indices, ].

A more general way of building this vector uses group_indices.

indices <- seq(nrow(iris)) %>% 
split(group_indices(iris, Species)) %>%
map(~c(NA, .x)) %>%
unlist

(map comes from purrr which I assume you have loaded as you have tagged this with tidyverse).

R insert row with mean after group of values

You can do this (although I can't really understand why you would want your data in this format):

bind_rows(
dat %>% mutate(id = as.character(id)),
dat %>% group_by(date) %>%
summarize(price=mean(price)) %>%
mutate(id = "mean")
) %>%
arrange(date,id)

Output:

       id       date    price
<char> <IDat> <num>
1: 1 2022-01-01 4.000000
2: 2 2022-01-01 2.000000
3: 3 2022-01-01 2.000000
4: mean 2022-01-01 2.666667
5: 1 2022-01-02 5.000000
6: 2 2022-01-02 3.000000
7: 3 2022-01-02 1.000000
8: mean 2022-01-02 3.000000

Perhaps better just to do this:

dat %>% group_by(date) %>% mutate(mean = mean(price))

Output:

     id date       price  mean
<int> <date> <int> <dbl>
1 1 2022-01-01 4 2.67
2 2 2022-01-01 2 2.67
3 3 2022-01-01 2 2.67
4 1 2022-01-02 5 3
5 2 2022-01-02 3 3
6 3 2022-01-02 1 3

Add rows of data to each group in a Spark dataframe

See my attempt below. Could have made it shorter but felt should be as explicit as I can so I dint chain the soultions. code below

from pyspark.sql import functions as F
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

# Convert week of the year to date
s=data_df.withColumn("week", expr("cast (week as string)")).withColumn('date', F.to_date(F.concat("week",F.lit("6")), "yyyywwu"))

s = (s.groupby('item', 'store').agg(F.collect_list('sales').alias('sales'),F.collect_list('date').alias('date'))#Put sales and dates in an array
.withColumn("id", sequence(lit(0), lit(6)))#Create sequence ids with the required expansion range per group
)

#Explode datframe back with each item/store combination in a row
s =s.selectExpr('item','store','inline(arrays_zip(date,id,sales))')

#Create partition window broadcasting from start to end for each item/store combination
w = Window.partitionBy('item','store').orderBy('id').rowsBetween(-sys.maxsize, sys.maxsize)

#Create partition window broadcasting from start to end for each item/store/date combination. the purpose here is to aggregate over null dates as group
w1 = Window.partitionBy('item','store','date').orderBy('id').rowsBetween(Window.unboundedPreceding, Window.currentRow)

s=(s.withColumn('increment', F.when(col('date').isNull(),(row_number().over(w1))*7).otherwise(0))#Create increment values per item/store combination

.withColumn('date1', F.when(col('date').isNull(),max('date').over(w)).otherwise(col('date')))#get last date in each item/store combination

)

# #Compute the week of year and drop columns not wanted
s = s.withColumn("weekofyear", expr("weekofyear(date_add(date1, cast(increment as int)))")).drop('date','increment','date1').na.fill(0)


s.show(truncate=False)

Outcome

+----+-----+---+-----+----------+
|item|store|id |sales|weekofyear|
+----+-----+---+-----+----------+
|1 |1 |0 |3 |5 |
|1 |1 |1 |5 |6 |
|1 |1 |2 |7 |7 |
|1 |1 |3 |2 |8 |
|1 |1 |4 |0 |9 |
|1 |1 |5 |0 |10 |
|1 |1 |6 |0 |11 |
|2 |2 |0 |3 |50 |
|2 |2 |1 |0 |51 |
|2 |2 |2 |1 |52 |
|2 |2 |3 |1 |1 |
|2 |2 |4 |0 |2 |
|2 |2 |5 |0 |3 |
|2 |2 |6 |0 |4 |
+----+-----+---+-----+----------+

Pandas: add rows to each group until condition is met

A different approach with pandas

  1. construct a Dataframe that is Cartesian product of ID and second
  2. outer join it back to original data frame
  3. fill missing values based on your spec

No groupby() no loops.

df = pd.DataFrame({"ID":["A","A","A","B","B","B","B","C","C"],"second":["1","2","3","1","2","3","4","1","2"],"speaker1":["1","1","1","1","1","1","1","1","1"],"speaker2":["1","1","1","1","1","1","1","1","1"],"company":["name1","name1","name1","name2","name2","name2","name2","name3","name3"]})

df2 = pd.DataFrame({"ID":df["ID"].unique()}).assign(foo=1).merge(\
pd.DataFrame({"second":df["second"].unique()}).assign(foo=1)).drop("foo", 1)\
.merge(df, on=["ID","second"], how="outer")

df2["company"] = df2["company"].fillna(method="ffill")
df2.fillna(0)

output

    ID  second  speaker1    speaker2    company
0 A 1 1 1 name1
1 A 2 1 1 name1
2 A 3 1 1 name1
3 A 4 0 0 name1
4 B 1 1 1 name2
5 B 2 1 1 name2
6 B 3 1 1 name2
7 B 4 1 1 name2
8 C 1 1 1 name3
9 C 2 1 1 name3
10 C 3 0 0 name3
11 C 4 0 0 name3



Related Topics



Leave a reply



Submit