Fill in Data Frame with Values from Rows Above

Fill in data frame with values from rows above

If you need only to carry forward the values from the VALUE column, then I think you can use na.lofc() function from zoo package. Here is an example:

a<-c(1,NA,NA,2,NA)
na.locf(a)
[1] 1 1 1 2 2

Fill subset of rows with values from row above

We can group_by id and use replace to change the values where timepoint = 0 & var1 is NA from the corresponding value of var1 where timepoint = 1 in each group.

library(dplyr)

df %>%
group_by(id) %>%
mutate(var2 = replace(var1, timepoint == 0 & is.na(var1), var1[timepoint == 1]))

# id timepoint var1 var2
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 NA 9
# 2 1 1 9 9
# 3 1 2 8 8
# 4 1 3 10 10
# 5 2 0 NA 10
# 6 2 1 10 10
# 7 2 2 NA NA
# 8 2 3 12 12
# 9 3 0 NA NA
#10 3 1 NA NA
#11 3 2 12 12
#12 3 3 11 11
#13 4 0 NA 12
#14 4 1 12 12
#15 4 2 12 12
#16 4 3 NA NA

Fill blank cells in Pandas dataframe with value from cell above it

You need Pandas ffill():

df.ffill()

See the pandas documentation for parameters: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html

R: How to fill out values in a DF which are dependent on previous rows

You can use indexing to solve the first two problems:

> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
+ D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
+ GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> # View df
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60 0
2 2 97 4 10 31 NA NA 0
3 1 102 5 5 31 NA NA 0
4 3 325 4 2 31 NA NA 0
5 2 78 3 14 30 NA NA 0
6 1 269 4 8 30 NA NA 0

> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1] + df$RM[-nrow(df)]))

> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]

The third problem can be solved with a loop.

> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
+ df$RM[i] <- df$R[i] + df$RM[i-1] * (df$DA[i] != 2)
+ }

> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60.000000 60.000000
2 2 97 4 10 31 743 9.587097 9.587097
3 1 102 5 5 31 845 27.258065 36.845161
4 3 325 4 2 31 1170 75.483871 112.329032
5 2 78 3 14 30 1248 8.914286 8.914286
6 1 269 4 8 30 1517 25.283333 34.197619

Do these results look correct?

Update: Assuming RM should = R unless DA = 1, and in that case RM = sum of current row and previous R up to (not including) the above row with DA = 1, try the following loop.

df$RM[1] <- cs <- df$R[1]
for (i in 2:nrow(df)) {
df$RM[i] <- df$R[i] + cs * (df$DA[i] == 1)
cs <- cs * (df$DA[i] != 1) + df$R[i]
}

Fill select row above and below index value

Here's an approach with dplyr and tidyr:

library(dplyr)
df2 <- df %>%
mutate(X = if_else(is.na(X), lead(X), X)) %>%
tidyr::fill(X)

Fill rows with consecutive values and above rows using pandas

Here is one way using set_index() and reindex and ffill:

df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1)).ffill().reset_index()

#df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1),method='ffill')\
#.reset_index()

   col1 col2
0 1 A
1 2 A
2 3 B
3 4 B
4 5 B
5 6 A
6 7 A
7 8 A
8 9 A
9 10 C

fill values from cell above in a given column

We can transform to NaN using Series.mask and then drop duplicated:

df['Category']=df['Category'].mask(df['Category'].eq('')|df['Category'].isnull()).ffill()

Add missing value in column with value from row above

The tidyr packages has the fill() function which does the trick.

df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)


Related Topics



Leave a reply



Submit