How do I remove NAs with the tidyr::unite function?
You could use regex to remove the NAs after they are created:
library(dplyr)
library(tidyr)
df <- data_frame(a = paste0("A.", rep(1, 3)),
b = " ",
c = c("C.1", "C.3", " "),
d = "D.4", e = "E.5")
cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",") %>%
dplyr::mutate(new = stringr::str_replace_all(new, 'NA,?', '')) # New line
Output:
# A tibble: 3 x 3
a new e
<chr> <chr> <chr>
1 A.1 C.1,D.4 E.5
2 A.1 C.3,D.4 E.5
3 A.1 D.4 E.5
How to remove missing values (NA) when uniting columns?
You have got couple of problems,
1) the NA
s are not reals NA
's (Check is.na(df$Parent2)
)
2) Your columns are factors
While constructing the dataframe use stringsAsFactors = FALSE
df <- data.frame(Name, Postalcode, Parent, Parent2, Parent3, Parent4,
Parent5, stringsAsFactors = FALSE)
and then replace NA
and use unite
library(dplyr)
df %>%
na_if('NA') %>%
tidyr::unite(Parent_full, Parent:Parent5, sep = "|", na.rm = TRUE)
# Name Postalcode Parent_full
#1 Paul 4732 Mother
#2 Edward 9045 Father|Mother
#3 Mary 3476 Mother|Father|Stepmother
If the data is already loaded, we can change them by using mutate_if
df %>%
mutate_if(is.factor, as.character) %>%
na_if('NA') %>%
tidyr::unite(Parent_full, Parent:Parent5, sep = "|", na.rm = TRUE)
R - Unite without NA values
We can use unite
with na.rm
library(tidyverse)
mtcars %>%
rownames_to_column('rn') %>%
mutate_at(vars(starts_with("NA")), as.character) %>%
unite(Var1, NA_1, NA_2, na.rm = TRUE) %>%
mutate(Var1 = na_if(Var1, "")) %>%
column_to_rownames('rn')
Or another option is coalesce
instead of unite
mtcars %>%
mutate(Var1 = str_c(coalesce(NA_1, NA_2), coalesce(NA_2, NA_1), sep="_"))
Or another option is
mtcars %>%
mutate_at(vars(starts_with("NA")), list(~ replace_na(., ''))) %>%
mutate(Var1 = str_remove(na_if(str_c(NA_1, NA_2, sep="_"), '_'), '^_|_$') ) %>%
select(-NA_1, NA_2)
How to remove NAs with the conditions in R?
df_A %>%
group_by(product_name) %>%
filter(!is.na(id) |
is.na(id) & is.na(clicks))
Using the unite function in R and removing duplicated values
I am not sure if deduplicating is possible with unite
, however you can use apply
row-wise.
input$ALL <- apply(input[-1], 1, function(x) toString(na.omit(unique(x))))
Or a tidyverse
way could be using pmap
library(tidyverse)
input %>%
mutate(ALL = pmap_chr(select(., -id), ~toString(unique(na.omit(c(...))))))
# id `2017` `2018` `2019` ALL
# <chr> <chr> <chr> <chr> <chr>
#1 aa tv tv NA tv
#2 ss NA web web web
#3 dd NA NA book book
#4 qq web NA tv web, tv
Or getting the data in long format and then joining
input %>%
pivot_longer(cols = -id, values_drop_na = TRUE) %>%
group_by(id) %>%
summarise(ALL = toString(unique(value))) %>%
left_join(input)
Collapsing columns and removing NAs
Using dplyr::coalesce
we can do the following:
df %>%
mutate(Comb = coalesce(w,x,y,z)) %>%
select(A, Comb)
which gives the following output:
A Comb
<dbl> <dbl>
1 0.23 1
2 0.12 2
3 0.45 2
4 0.89 3
5 0.12 4
Combine column to remove NA's
A dplyr::coalesce
based solution could be as:
data %>% mutate(mycol = coalesce(x,y,z)) %>%
select(a, mycol)
# a mycol
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
# 5 E 5
Data
data <- data.frame('a' = c('A','B','C','D','E'),
'x' = c(1,2,NA,NA,NA),
'y' = c(NA,NA,3,NA,NA),
'z' = c(NA,NA,NA,4,5))
Dealing with Spaces and NA's when Uniting Multiple Columns with Tidyr
From getAnywhere("unite_.data.frame")
, unite is calling do.call("paste", c(data[from], list(sep = sep)))
underhood, and paste
as far as I know doesn't provide a functionality to omit NAs unless manually implemented in some way;
Nevertheless, you can use a regular expression method as follows with gsub
from base R to clean up the result column:
gsub("^\\s;\\s|;\\s{2}", "", Days$BestDays)
# [1] "Monday" "Tuesday; Wednesday"
# [3] "Tuesday; Wednesday" "Monday; Wednesday"
# [5] "Monday; Tuesday; Thursday; Friday"
This removes either ^\\s;\\s
pattern or ;\\s{2}
pattern, the former handle the case when the string starts with space string where we can just remove the space and it's following ;\\s
, otherwise remove ;\\s{2}
which can handle cases where \\s
are both in the middle of the string and at the end of the string.
Related Topics
Select Columns by Class (E.G. Numeric) from a Data.Table
Replace Rbind in For-Loop with Lapply? (2Nd Circle of Hell)
Installing R on Osx Big Sur (Edit: and Apple M1) for Use with Rcpp and Openmp
R Dynamically Build "List" in Data.Table (Or Ddply)
How to Remove Na Data in Only One Columns
How to Replace Multiple Values at Once
R Corpus Is Messing Up My Utf-8 Encoded Text
Subset a Data.Frame with Multiple Conditions
Applying Rolling Mean by Group in R
How to Add Annotation on Each Facet
Check If a String Contains at Least One Numeric Character in R
How to Combine Multiple .CSV Files in R
Arranging Arrows Between Points Nicely in Ggplot2
How to Melt R Data.Frame and Plot Group by Bar Plot
Ggplot2: Change Factor Order in Legend