Fill missing values rowwise (right / left)
We can do a gather
into 'long' format, do the fill
grouped by the row number and then spread
back to 'wide' format
library(tidyverse)
rownames_to_column(d, 'rn') %>%
gather(key, val, -rn) %>%
group_by(rn) %>%
fill(val) %>%
spread(key, val) %>%
ungroup %>%
select(-rn)
# A tibble: 5 x 3
# c1 c2 c3
# <chr> <chr> <chr>
#1 a a a
#2 1 2 3
#3 2 2 4
#4 3 4 4
#5 4 5 6
or another option without reshaping would be doing rowwise fill with na.locf
library(zoo)
d %>%
mutate(c1 = as.character(c1)) %>%
pmap_dfr(., ~ na.locf(c(...)) %>%
as.list %>%
as_tibble)
Also, if we use na.locf
, it run columnwise, so the data can be transposed and apply na.locf
directly
d[] <- t(na.locf(t(d)))
d
# c1 c2 c3
#1 a a a
#2 1 2 3
#3 2 2 4
#4 3 4 4
#5 4 5 6
As @G.Grothendieck mentioned in the comments, inorder to take care of the elements that are NA at the beginning of the row, use na.locf0
instead of na.locf
Filling NA row values with nearest right side row value in R
Update
As there was lot of confusion on the expected output, updating the answer as suggested by @DavidArenburg using a tidyverse
solution
library(dplyr)
library(tidyr)
df %>%
add_rownames() %>%
gather(variable, value, -rowname) %>%
filter(!is.na(value)) %>%
group_by(rowname) %>%
mutate(indx = row_number()) %>%
select(-variable) %>%
spread(indx, value)
# rowname `1` `2`
#* <chr> <dbl> <dbl>
#1 BAKERY_Total 28 84.04
#2 CHICKEN_PUFF 16 88.24
#3 VEG_PUFF 12 78.43
Another solution could be
library(data.table)
temp <- apply(df, 1, function(x) data.frame(matrix(x[!is.na(x)], nrow = 1)))
rbindlist(temp, fill = T)
Previous Answer
If I have understand you correctly, you are trying to replace NA
values in a row with the latest non-NA value in the same row
We can use na.locf
with fromLast
set as TRUE
t(apply(df, 1, function(x) na.locf(x, fromLast = T, na.rm = F)))
# c1 c2 c3 c4 c5
#VEG_PUFF 12 12 78.43 78.43 78.43
#CHICKEN_PUFF 16 16 88.24 88.24 NA
#BAKERY_Total 28 28 28.00 84.04 84.04
R: fill missing value with prior values
Using tidyr
we can use fill(data, vars)
:
library(tidyr)
fill(d, county)
Fill missing values with previous values by row using dplyr
One solution could be using na.locf
function from package zoo
combining with purrr::pmap
function in a row-wise operation. na.locf
takes the most recent non-NA
value and replace all the upcoming NA
values by that. Just as a reminder c(...)
in both solutions captures all values of V1:V4
in each row in every iteration. However, I excluded id
column in both as it is not involved in the our calculations.
library(zoo)
library(purrr)
df %>%
mutate(pmap_df(., ~ na.locf(c(...)[-1])))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or we can use coalesce
function from dplyr
. We can replace every NA
values in each row with the last non-NA
value, something we did earlier with na.locf
. However this solution is a bit verbose:
df %>%
mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))];
coalesce(c(...), x[length(x)])}))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or you could also use this:
library(purrr)
df %>%
mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
The warning message can be ignored. It is in fact produced because we have 6 NA
values but the result of applying dplyr::coalesce
on every vector is 1 element resulting in 4 elements to replace 6 slots.
Rowwise duplicate to missing for second degree neighbors
To get the desired output we could do:
df1 <- t(apply(df, 1, function(x) replace(x, duplicated(x), NA)))
x <- df1 %>%
as_tibble() %>%
pivot_longer(
everything()
) %>%
group_by(value) %>%
mutate(id = row_number()-1,
value = paste0("X.",value,"."),
value = ifelse(value == "X.NA." & id > 0, paste0(NA, "..", id), value),
value = ifelse(value == "X.NA.", NA, value)) %>%
select(-id) %>%
mutate(value = str_replace(value, " ", ".")) %>%
pivot_wider(
names_from = name,
values_from = value
)
colnames(df1) <- x
df1
X.Ashanti. X.Brong.Ahafo. X.Central. X.Eastern. X.Western. <NA> NA..1 NA..2 X.Northern. X.Volta. NA..3
[1,] "Ashanti" "Brong Ahafo" "Central" "Eastern" "Western" NA NA NA "Northern" "Volta" NA
Fill in NA column values with the last value that was not NA (na.locf by column)
apply na.locf
rowwise :
DF[] <- t(apply(DF, 1, zoo::na.locf, na.rm = FALSE))
DF
# A tibble: 20 x 7
# toberevised ...2 ...3 ...4 ...5 ...6 ...7
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 [Money amounts are in th… UNITED ST… UNITED ST… UNITED STATES … UNITED STATES … UNITED STATES … UNITED STATES…
# 2 NA NA NA NA NA NA NA
# 3 NA NA NA Size of adjust… Size of adjust… Size of adjust… Size of adjus…
# 4 NA NA NA NA NA NA NA
# 5 Item All retur… Under 50000 75000 100000 200000
# 6 NA NA $50,000 [… under under under or more
# 7 NA NA NA 75000 100000 200000 200000
# 8 NA 1 2 3 4 5 6
# 9 NA NA NA NA NA NA NA
#10 Number of returns 135257620 92150166 18221115 10499106 10797979 3589254
#11 Number of joint returns 52607676 20743943 11329459 8296546 9193700 3044028
#12 Number with paid prepare… 80455243 53622647 11025624 6260725 6678965 2867282
#13 Number of exemptions 273738434 159649737 44189517 28555195 30919226 10424759
#14 Adjusted gross income (A… 7364640131 1797097083 1119634632 905336768 1429575727 2112995921
#15 Salaries and wages in AG… 114060887 75422766 16299827 9520214 9782173 3035907
#16 Salaries and wages in AG… 5161583318 1541276272 896339313 721137490 1083175205 919655038
#17 Taxable interest: Number 59553985 28527550 10891905 7636612 9092673 3405245
#18 Taxable interest: Amount 161324824 39043002 16353293 12852148 23160862 69915518
#19 Ordinary dividends: Num… 31158675 13174923 5255958 4095938 5824522 2807334
#20 Ordinary dividends: Amou… 164247298 23867893 12810282 11524298 25842394 90202431
As suggested by @G. Grothendieck na.locf0
is a better candidate here.
DF[] <- t(apply(DF, 1, zoo::na.locf0))
How to show names of missing variables rowwise?
Without a great view of what your data looks like, it is difficult to assess. However, you may try the sapply() function. This function can loop through variables in a data frame and return a list object, which is quite flexible in terms of what it stores. Here is an example that might fit your scenario:
# construct silly data.frame
temp <- data.frame("a"=1:10, "aa"=rep(1:5, 2), "b"=rnorm(10),
"c"=sample(c("good", "bad", "ugly"), 10, replace=TRUE))
# build in some missing values
temp$a[c(1,5)] <- NA
temp$b[c(3,7, 9)] <- NA
temp$c[c(2,5)] <- NA
# take a peek at the data
temp
# construct empty list to store names of missing vars
missingVars <- list()
# loop through observations
for(i in 1:nrow(temp)) {
# subset to one row data set
obs.row <- temp[i,]
# fill in missing var list with names of variables that are missing
missingVars[[paste0("obs.",i)]] <-
names(obs.row)[unlist(sapply(obs.row, is.na))]
}
This should work given what you have described. You can then extract the names of the missing variables either by using the row number:
missingVars[[1]]
or by using the name of the list element:
missingVars[["obs.1"]]
would both extract the names of missing variables for the first observation.
Replace NAs with previous day value for returns
You can replace NAs
with the previous values at the start of your pipe using fill()
like this:
library(tidyverse)
df %>%
fill(MDAXClosing) %>%
dplyr::mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
week = cut.Date(Date, breaks = "1 week", labels = FALSE)) %>%
dplyr::group_by(Underlying, week) %>%
dplyr::summarise(Stockreturn = log(ClosingPrice[1] / ClosingPrice[n()]),
MDAXreturn = log(MDAXClosing[1] / MDAXClosing[n()]))
# A tibble: 3 x 4
# Groups: Underlying [1]
Underlying week Stockreturn MDAXreturn
<chr> <int> <dbl> <dbl>
1 DE0005089031 1 0.0354 0.0472
2 DE0005089031 2 0.117 0.0226
3 DE0005089031 3 -0.00780 0.0184
MDAXreturn
can be calculated by calculating it in the same summarise
statement as Stockreturn
Data
df <- tibble::tribble(
~Underlying, ~Date, ~ClosingPrice, ~MDAXClosing,
"DE0005089031", "04.01.2016", 49.501, 20256.14,
"DE0005089031", "05.01.2016", 49.7855, 20228.06,
"DE0005089031", "06.01.2016", 49.0595, 19989.88,
"DE0005089031", "07.01.2016", 47.7785, 19537.39,
"DE0005089031", "08.01.2016", 47.7435, 19321.93,
"DE0005089031", "09.01.2016", 47.816, NA,
"DE0005089031", "10.01.2016", 47.777, NA,
"DE0005089031", "11.01.2016", 48.8095, 19219.43,
"DE0005089031", "12.01.2016", 48.9545, 19627.76,
"DE0005089031", "13.01.2016", 48.0195, 19587.69,
"DE0005089031", "14.01.2016", 47.146, 19296.48,
"DE0005089031", "15.01.2016", 43.558, 18789.76,
"DE0005089031", "16.01.2016", 43.4, NA,
"DE0005089031", "17.01.2016", 43.4, NA,
"DE0005089031", "18.01.2016", 44.4815, 18662.69,
"DE0005089031", "19.01.2016", 45.6485, 19029.23,
"DE0005089031", "20.01.2016", 44.83, 18322.99
)
Replacing missing values
One dplyr
and tidyr
possibility could be:
df %>%
group_by(quarter = substr(Period, 5, 6)) %>%
mutate(Sales_temp = replace_na(Sales, last(na.omit(Sales)))) %>%
group_by(quarter, na = is.na(Sales)) %>%
mutate(constant = 1.05,
Sales_temp = Sales_temp * cumprod(constant),
Sales = coalesce(Sales, Sales_temp)) %>%
ungroup() %>%
select(1:2)
Period Sales
<chr> <dbl>
1 1999Q1 353.
2 1999Q2 426.
3 1999Q3 358.
4 1999Q4 364.
5 2000Q1 303.
6 2000Q2 394.
7 2000Q3 435.
8 2000Q4 388.
9 2001Q1 318.
10 2001Q2 414.
11 2001Q3 457.
12 2001Q4 407.
13 2002Q1 334.
14 2002Q2 435.
15 2002Q3 480.
16 2002Q4 428.
17 2003Q1 351.
18 2003Q2 456.
19 2003Q3 504.
20 2003Q4 449.
Or with just dplyr
:
df %>%
group_by(quarter = substr(Period, 5, 6)) %>%
mutate(Sales_temp = if_else(is.na(Sales), last(na.omit(Sales)), Sales)) %>%
group_by(quarter, na = is.na(Sales)) %>%
mutate(constant = 1.05,
Sales_temp = Sales_temp * cumprod(constant),
Sales = coalesce(Sales, Sales_temp)) %>%
ungroup() %>%
select(1:2)
Related Topics
Combine (Bind) Existing PDF Files in R
Fill Missing Values in The Data.Frame with The Data from The Same Data Frame
Shiny Sliderinput from Max to Min
Change Font Size for All Inline Equations R Markdown
How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By
R:Binary Matrix for All Possible Unique Results
Character Extraction from String
What Does Na.Rm=True Actually Means
Create Group Based on Fuzzy Criteria
How to Round Percentage to 2 Decimal Places in Ggplot2
How to Create a Rank Variable Under Certain Conditions
Joining Two Data.Tables in R Based on Multiple Keys and Duplicate Entries
Show Source Code for a Function in a Package in R
Get Country (And Continent) from Longitude and Latitude Point in R
Manually Defining The Colours of a Wireframe