Inserting a new row to data frame for each group id
We can do this with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', get the last row with tail
, assign the 'latitude' and 'longitude' with the new values, rbind
with the original dataset and order
by 'id'.
library(data.table)
rbind(setDT(df1), df1[, tail(.SD, 1) , by = id
][, c("latitude", "longitude") := .(394681.4, 6017550)
])[order(id)]
# id time latitude longitude
#1: A 11:10 381746.0 6008345
#2: A 11:11 381726.2 6008294
#3: A 11:11 394681.4 6017550
#4: B 10:56 381703.0 6008214
#5: B 10:57 381679.7 6008134
#6: B 10:57 394681.4 6017550
#7: C 4:30 381654.4 6008083
#8: C 4:31 381629.2 6008033
#9: C 4:31 394681.4 6017550
Or using dplyr
, with similar methodology
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(time = last(time)) %>%
mutate(latitude = 394681.4, longitude = 6017550) %>%
bind_rows(df1, .) %>%
arrange(id)
Add a new row for each id in dataframe for ALL variables
library(dplyr)
library(purrr)
df %>% mutate_if(is.factor, as.character) %>%
group_split(id) %>%
map_dfr(~bind_rows(.x, data.frame(id=.x$id[1], trt="base", stringsAsFactors = FALSE)))
#Note that group_modify is Experimental
df %>% mutate_if(is.factor, as.character) %>%
group_by(id) %>%
group_modify(~bind_rows(.x, data.frame(trt="base", stringsAsFactors = FALSE)))
How to add a row to each group and assign values
According to the documentation of the function group_modify
, if you use a formula, you must use ".
or .x
to refer to the subset of rows of .tbl
for the given group;" that's why you used .x
inside the add_row
function. To be entirely consistent, you have to do it also within the first
function.
df %>%
group_by(id) %>%
group_modify(~ add_row(A=4, B=first(.x$B), .x))
# A tibble: 6 x 3
# Groups: id [3]
id A B
<chr> <dbl> <dbl>
1 one 1 4
2 one 4 4
3 three 3 6
4 three 4 6
5 two 2 5
6 two 4 5
Using first(.$B)
or first(df$B)
will provide the same results.
Add row for each group
You can try
library(tidyverse)
d <- data.frame(Serial_number=gl(5, 2), Amplification=1:10, Voltage=20:11)
d %>%
split(.$Serial_number) %>%
map(~add_row(., Serial_number=unique(.$Serial_number), Amplification=NA, Voltage=NA)) %>%
bind_rows()
Serial_number Amplification Voltage
1 1 1 20
2 1 2 19
3 1 NA NA
4 2 3 18
5 2 4 17
6 2 NA NA
7 3 5 16
8 3 6 15
9 3 NA NA
10 4 7 14
11 4 8 13
12 4 NA NA
13 5 9 12
14 5 10 11
15 5 NA NA
Using base R you can try
do.call("rbind", lapply(split(d, d$Serial_number), function(x) rbind(x, c(1, NA, NA))))
Of course you can rbind
a new dataframe as mentioned in the comments by Shique
rbind(d,
data.frame(Serial_number=unique(d$Serial_number), Amplification=NA, Voltage=NA))
Then order the dataframe using d[order(d$Serial_number),]
Add row in each group using dplyr and add_row()
If you want to use a grouped operation, you need do
like JasonWang described in his comment, as other functions like mutate
or summarise
expect a result with the same number of rows as the grouped data frame (in your case, 50) or with one row (e.g. when summarising).
As you probably know, in general do
can be slow and should be a last resort if you cannot achieve your result in another way. Your task is quite simple because it only involves adding extra rows in your data frame, which can be done by simple indexing, e.g. look at the output of iris[NA, ]
.
What you want is essentially to create a vector
indices <- c(NA, 1:50, NA, 51:100, NA, 101:150)
(since the first group is in rows 1 to 50, the second one in 51 to 100 and the third one in 101 to 150).
The result is then iris[indices, ]
.
A more general way of building this vector uses group_indices
.
indices <- seq(nrow(iris)) %>%
split(group_indices(iris, Species)) %>%
map(~c(NA, .x)) %>%
unlist
(map
comes from purrr
which I assume you have loaded as you have tagged this with tidyverse
).
R insert row with mean after group of values
You can do this (although I can't really understand why you would want your data in this format):
bind_rows(
dat %>% mutate(id = as.character(id)),
dat %>% group_by(date) %>%
summarize(price=mean(price)) %>%
mutate(id = "mean")
) %>%
arrange(date,id)
Output:
id date price
<char> <IDat> <num>
1: 1 2022-01-01 4.000000
2: 2 2022-01-01 2.000000
3: 3 2022-01-01 2.000000
4: mean 2022-01-01 2.666667
5: 1 2022-01-02 5.000000
6: 2 2022-01-02 3.000000
7: 3 2022-01-02 1.000000
8: mean 2022-01-02 3.000000
Perhaps better just to do this:
dat %>% group_by(date) %>% mutate(mean = mean(price))
Output:
id date price mean
<int> <date> <int> <dbl>
1 1 2022-01-01 4 2.67
2 2 2022-01-01 2 2.67
3 3 2022-01-01 2 2.67
4 1 2022-01-02 5 3
5 2 2022-01-02 3 3
6 3 2022-01-02 1 3
Add rows of data to each group in a Spark dataframe
See my attempt below. Could have made it shorter but felt should be as explicit as I can so I dint chain the soultions. code below
from pyspark.sql import functions as F
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
# Convert week of the year to date
s=data_df.withColumn("week", expr("cast (week as string)")).withColumn('date', F.to_date(F.concat("week",F.lit("6")), "yyyywwu"))
s = (s.groupby('item', 'store').agg(F.collect_list('sales').alias('sales'),F.collect_list('date').alias('date'))#Put sales and dates in an array
.withColumn("id", sequence(lit(0), lit(6)))#Create sequence ids with the required expansion range per group
)
#Explode datframe back with each item/store combination in a row
s =s.selectExpr('item','store','inline(arrays_zip(date,id,sales))')
#Create partition window broadcasting from start to end for each item/store combination
w = Window.partitionBy('item','store').orderBy('id').rowsBetween(-sys.maxsize, sys.maxsize)
#Create partition window broadcasting from start to end for each item/store/date combination. the purpose here is to aggregate over null dates as group
w1 = Window.partitionBy('item','store','date').orderBy('id').rowsBetween(Window.unboundedPreceding, Window.currentRow)
s=(s.withColumn('increment', F.when(col('date').isNull(),(row_number().over(w1))*7).otherwise(0))#Create increment values per item/store combination
.withColumn('date1', F.when(col('date').isNull(),max('date').over(w)).otherwise(col('date')))#get last date in each item/store combination
)
# #Compute the week of year and drop columns not wanted
s = s.withColumn("weekofyear", expr("weekofyear(date_add(date1, cast(increment as int)))")).drop('date','increment','date1').na.fill(0)
s.show(truncate=False)
Outcome
+----+-----+---+-----+----------+
|item|store|id |sales|weekofyear|
+----+-----+---+-----+----------+
|1 |1 |0 |3 |5 |
|1 |1 |1 |5 |6 |
|1 |1 |2 |7 |7 |
|1 |1 |3 |2 |8 |
|1 |1 |4 |0 |9 |
|1 |1 |5 |0 |10 |
|1 |1 |6 |0 |11 |
|2 |2 |0 |3 |50 |
|2 |2 |1 |0 |51 |
|2 |2 |2 |1 |52 |
|2 |2 |3 |1 |1 |
|2 |2 |4 |0 |2 |
|2 |2 |5 |0 |3 |
|2 |2 |6 |0 |4 |
+----+-----+---+-----+----------+
Pandas: add rows to each group until condition is met
A different approach with pandas
- construct a Dataframe that is Cartesian product of
ID
andsecond
- outer join it back to original data frame
- fill missing values based on your spec
No groupby()
no loops.
df = pd.DataFrame({"ID":["A","A","A","B","B","B","B","C","C"],"second":["1","2","3","1","2","3","4","1","2"],"speaker1":["1","1","1","1","1","1","1","1","1"],"speaker2":["1","1","1","1","1","1","1","1","1"],"company":["name1","name1","name1","name2","name2","name2","name2","name3","name3"]})
df2 = pd.DataFrame({"ID":df["ID"].unique()}).assign(foo=1).merge(\
pd.DataFrame({"second":df["second"].unique()}).assign(foo=1)).drop("foo", 1)\
.merge(df, on=["ID","second"], how="outer")
df2["company"] = df2["company"].fillna(method="ffill")
df2.fillna(0)
output
ID second speaker1 speaker2 company
0 A 1 1 1 name1
1 A 2 1 1 name1
2 A 3 1 1 name1
3 A 4 0 0 name1
4 B 1 1 1 name2
5 B 2 1 1 name2
6 B 3 1 1 name2
7 B 4 1 1 name2
8 C 1 1 1 name3
9 C 2 1 1 name3
10 C 3 0 0 name3
11 C 4 0 0 name3
Related Topics
Change a Column from Birth Date to Age in R
Dplyr: How to Apply Do() on Result of Group_By
Error in Get(As.Character(Fun), Mode = "Function", Envir = Envir)
Extract Name of Data.Frame in R as Character
Generating a Color Legend with Shifted Labels Using Ggplot2
R Ggplot2: Labeling a Horizontal Line Without Associating the Label with a Series
Is There an R Library That Estimates a Multivariate Natural Cubic Spline (Or Similar) Function
Convert Data Frame Common Rows to Columns
Create Line Graph with Ggplot2, Using Time Periods as X-Variable
Ordered Factors in Ggplot2 Bar Chart
Marginal Effects of Mlogit in R
Integrate a Very Peaked Function
Collapse a Data.Frame into a Vector
Possible Issue About Random Number Generator
Transposition of a Tibble Using Pivot_Longer() and Pivot_Wider (Tidyverse)