How to Have Na's Displayed First Using Arrange()

How to have NA's displayed first using arrange()

You could also do:

 m %>%
 arrange(!is.na(wt), wt) #@Spacedman's dataset
 #    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
 #1  18.7   8 360.0 175 3.15    NA 17.02  0  0    3    2
 #2  24.4   4 146.7  62 3.69    NA 20.00  1  0    4    2
 #3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
 #4  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
 #5  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
 #6  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
 #7  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
 #8  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
 #9  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
 #10 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4

How to sort putting NAs first in dplyr?

We can arrange on the logical vector first before arranging the 'val' column

tbl %>%
   arrange(!is.na(val), val)
# A tibble: 10 × 2
#      id       val
#   <chr>     <dbl>
#1      f        NA
#2      i 0.1346666
#3      c 0.2861395
#4      g 0.5190959
#5      e 0.6417455
#6      j 0.6569923
#7      h 0.7365883
#8      d 0.8304476
#9      a 0.9148060
#10     b 0.9370754

How do I use dplyr::arrange to sort NA's first?

This was fixed by downloading the latest version of dplyr_0.8.0:

devtools::install_github("tidyverse/dplyr")

dplyr arrange() function sort by missing values

We can wrap it with desc to get the missing values at the start

flights %>% 
    arrange(desc(is.na(dep_time)),
           desc(is.na(dep_delay)),
           desc(is.na(arr_time)), 
           desc(is.na(arr_delay)),
           desc(is.na(tailnum)),
           desc(is.na(air_time)))

The NA values were only found in those variables based on

names(flights)[colSums(is.na(flights)) >0]
#[1] "dep_time"  "dep_delay" "arr_time"  "arr_delay" "tailnum"   "air_time"

Instead of passing each variable name at a time, we can also use NSE arrange_

nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))")

r1 <- flights %>%
        arrange_(.dots = nm1) 

r1 %>%
   head()
#year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
#  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>
#1  2013     1     2       NA           1545        NA       NA           1910        NA      AA    133    <NA>
#2  2013     1     2       NA           1601        NA       NA           1735        NA      UA    623    <NA>
#3  2013     1     3       NA            857        NA       NA           1209        NA      UA    714    <NA>
#4  2013     1     3       NA            645        NA       NA            952        NA      UA    719    <NA>
#5  2013     1     4       NA            845        NA       NA           1015        NA      9E   3405    <NA>
#6  2013     1     4       NA           1830        NA       NA           2044        NA      9E   3716    <NA>
#Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#  time_hour <time>.

Update

With the newer versions of tidyverse (dplyr_0.7.3, rlang_0.1.2) , we can also make use of arrange_at, arrange_all, arrange_if

nm1 <- names(flights)[colSums(is.na(flights)) >0]
r2 <- flights %>% 
          arrange_at(vars(nm1), funs(desc(is.na(.))))

Or use arrange_if

f <- rlang::as_function(~ any(is.na(.)))
r3 <- flights %>% 
          arrange_if(f, funs(desc(is.na(.))))

identical(r1, r2)
#[1] TRUE

identical(r1, r3)
#[1] TRUE

Can't use dplyr::arrange() to sort a column in the form of a date in r

Instead of the double quoted column name, use backquote

library(dplyr)
values %>% 
   dplyr::arrange(`2022-03-01`)

-output

   2022-03-01
J        0.6
E        2.0
A        2.7
B        3.7
C        5.7
I        6.3
H        6.6
F        9.0
D        9.1
G        9.4

If we want to pass as string, either use within across

values %>%
   dplyr::arrange(across("2022-03-01"))
  2022-03-01
J        0.6
E        2.0
A        2.7
B        3.7
C        5.7
I        6.3
H        6.6
F        9.0
D        9.1
G        9.4

Or convert to symbol and evaluate (!!)

values %>%
  dplyr::arrange(!! rlang::sym("2022-03-01"))
  2022-03-01
J        0.6
E        2.0
A        2.7
B        3.7
C        5.7
I        6.3
H        6.6
F        9.0
D        9.1
G        9.4

Or with .data

values %>% 
  dplyr::arrange(.data[["2022-03-01"]])

How to move NA to the top of the column of an R data.frame?

Perhaps, something like this?

DF$F <- c(rep(NA, sum(is.na(DF$F))), na.omit(DF$F))

Add all the NA's first and then append all the non-NA values.

Arranging summarized NAs in descending order after calculating them in each column in tidyverse

I think the issue is caused by the summarise function returning a single column ("col_name : value") instead of two columns ("col_name" and "variable"). One potential solution is to use the pivot_longer() tidyverse function to split the output into two columns, e.g.

library(tidyverse)

data(airquality)
dat1 <- airquality
dat1 %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>% 
  pivot_longer(cols = everything(), names_to = "names", values_to = "values") %>% 
  arrange(desc(values))
#> # A tibble: 6 x 2
#>   names   values
#>   <chr>    <int>
#> 1 Ozone       37
#> 2 Solar.R      7
#> 3 Wind         0
#> 4 Temp         0
#> 5 Month        0
#> 6 Day          0

^{Created on 2021-07-21 by the reprex package (v2.0.0)}

How to Have Na's Displayed First Using Arrange()