How to Have Na's Displayed First Using Arrange()

How to have NA's displayed first using arrange()

You could also do:

 m %>%
arrange(!is.na(wt), wt) #@Spacedman's dataset
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.7 8 360.0 175 3.15 NA 17.02 0 0 3 2
#2 24.4 4 146.7 62 3.69 NA 20.00 1 0 4 2
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#5 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#6 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#7 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#8 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#9 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#10 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4

How to sort putting NAs first in dplyr?

We can arrange on the logical vector first before arranging the 'val' column

tbl %>%
arrange(!is.na(val), val)
# A tibble: 10 × 2
# id val
# <chr> <dbl>
#1 f NA
#2 i 0.1346666
#3 c 0.2861395
#4 g 0.5190959
#5 e 0.6417455
#6 j 0.6569923
#7 h 0.7365883
#8 d 0.8304476
#9 a 0.9148060
#10 b 0.9370754

How do I use dplyr::arrange to sort NA's first?

This was fixed by downloading the latest version of dplyr_0.8.0:

devtools::install_github("tidyverse/dplyr")

dplyr arrange() function sort by missing values

We can wrap it with desc to get the missing values at the start

flights %>% 
arrange(desc(is.na(dep_time)),
desc(is.na(dep_delay)),
desc(is.na(arr_time)),
desc(is.na(arr_delay)),
desc(is.na(tailnum)),
desc(is.na(air_time)))

The NA values were only found in those variables based on

names(flights)[colSums(is.na(flights)) >0]
#[1] "dep_time" "dep_delay" "arr_time" "arr_delay" "tailnum" "air_time"

Instead of passing each variable name at a time, we can also use NSE arrange_

nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))")

r1 <- flights %>%
arrange_(.dots = nm1)

r1 %>%
head()
#year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
# <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr>
#1 2013 1 2 NA 1545 NA NA 1910 NA AA 133 <NA>
#2 2013 1 2 NA 1601 NA NA 1735 NA UA 623 <NA>
#3 2013 1 3 NA 857 NA NA 1209 NA UA 714 <NA>
#4 2013 1 3 NA 645 NA NA 952 NA UA 719 <NA>
#5 2013 1 4 NA 845 NA NA 1015 NA 9E 3405 <NA>
#6 2013 1 4 NA 1830 NA NA 2044 NA 9E 3716 <NA>
#Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
# time_hour <time>.

Update

With the newer versions of tidyverse (dplyr_0.7.3, rlang_0.1.2) , we can also make use of arrange_at, arrange_all, arrange_if

nm1 <- names(flights)[colSums(is.na(flights)) >0]
r2 <- flights %>%
arrange_at(vars(nm1), funs(desc(is.na(.))))

Or use arrange_if

f <- rlang::as_function(~ any(is.na(.)))
r3 <- flights %>%
arrange_if(f, funs(desc(is.na(.))))

identical(r1, r2)
#[1] TRUE

identical(r1, r3)
#[1] TRUE

Can't use dplyr::arrange() to sort a column in the form of a date in r

Instead of the double quoted column name, use backquote

library(dplyr)
values %>%
dplyr::arrange(`2022-03-01`)

-output

   2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4

If we want to pass as string, either use within across

values %>%
dplyr::arrange(across("2022-03-01"))
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4

Or convert to symbol and evaluate (!!)

values %>%
dplyr::arrange(!! rlang::sym("2022-03-01"))
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4

Or with .data

values %>% 
dplyr::arrange(.data[["2022-03-01"]])

How to move NA to the top of the column of an R data.frame?

Perhaps, something like this?

DF$F <- c(rep(NA, sum(is.na(DF$F))), na.omit(DF$F))

Add all the NA's first and then append all the non-NA values.

Arranging summarized NAs in descending order after calculating them in each column in tidyverse

I think the issue is caused by the summarise function returning a single column ("col_name : value") instead of two columns ("col_name" and "variable"). One potential solution is to use the pivot_longer() tidyverse function to split the output into two columns, e.g.

library(tidyverse)

data(airquality)
dat1 <- airquality
dat1 %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(), names_to = "names", values_to = "values") %>%
arrange(desc(values))
#> # A tibble: 6 x 2
#> names values
#> <chr> <int>
#> 1 Ozone 37
#> 2 Solar.R 7
#> 3 Wind 0
#> 4 Temp 0
#> 5 Month 0
#> 6 Day 0

Created on 2021-07-21 by the reprex package (v2.0.0)



Related Topics



Leave a reply



Submit