R: Find Missing Columns, Add to Data Frame If Missing

R: Find missing columns, add to data frame if missing

Here's a straightforward approach

df <- data.frame(a=1:4, e=4:1)
nms <- c("a", "b", "d", "e")   # Vector of columns you want in this data.frame

Missing <- setdiff(nms, names(df))  # Find names of missing columns
df[Missing] <- 0                    # Add them, filled with '0's
df <- df[nms]                       # Put columns in desired order
#   a b d e
# 1 1 0 0 4
# 2 2 0 0 3
# 3 3 0 0 2
# 4 4 0 0 1

Add missing columns from different data.frame filled with 0

We can use setdiff to find out columns which are not present in df2 and assign the value 0 to those columns.

df2[setdiff(names(df1), names(df2))] <- 0

#  a c b d
#1 5 6 0 0

If we want to maintain the same order of columns as in df1 we can later do

df2[names(df1)]
#  a b c d
#1 5 0 6 0

Tidy way to add column if missing from data frame

Assuming you don't want to overwrite the column if it is already present in your data you can use add_column along with an if condition to check if the column is already present.

library(dplyr)

df1 <- data.frame(a=c(1:3, NA), b=c(NA,2:4))
if(!'c' %in% names(df1)) df1 <- df1 %>% add_column(c = NA)
df1

#   a  b  c
#1  1 NA NA
#2  2  2 NA
#3  3  3 NA
#4 NA  4 NA

Checking all columns in data frame for missing values in R

The anyNA function is built for this. You can apply it to all columns of a data frame with sapply(books, anyNA). To count NA values, akrun's suggestion of colSums(is.na(books)) is good.

Filling missing column data based on other column data in R

You can replace the string 'NaN' with NA using NA_if(), then sort (arrange) the data by the desired columns so that NA values per GROUP and UCR come last and finally fillNA with the values one row above.

Example data df:

df <- structure(list(ID = c(0L, 1L, 2L, 3L, 4L, 245865L, 245866L, 245867L, 
245868L, 245869L), OFFENSE = c(3126L, 3831L, 724L, 301L, 619L, 
3115L, 619L, 2629L, 2629L, 3208L), GROUP = c("NaN", "NaN", "NaN", 
"NaN", "NaN", "Aggravated Assault", "Larceny", "Harassment", 
"Harassment", "Property Lost"), DESCRIPTION = c("ASSAULT", "PROPERTY DAMAGE", 
"AUTO THEFT", "ROBBERY", "LARCENY ALL OTHERS", "ASSAULT", "LARCENY ALL OTHERS", 
"HARASSMENT", "HARASSMENT", "PROPERTY - MISSING"), UCR = c("NaN", 
"NaN", "NaN", "NaN", "NaN", "Part One", "Part One", "Part Two", 
"Part Two", NA)), class = "data.frame", row.names = c(NA, 10L
))

code:

library(tidyr)
library(dplyr)

df %>%
  na_if('NaN') %>%
  arrange(DESCRIPTION, GROUP, UCR) %>%
  fill(GROUP, UCR, .direction = 'down')

Note that fill only targets NA, hence the initial replacement of 'NaN' with NA.

Read table with missing values and columns in R

It worked with:

data <- read.table("data.txt", sep="", fill = TRUE, header=FALSE)

Now I got 5 columns and 4 rows. The empty value is filled with "NA".

fill missing columns with NA while extracting from a data.frame

set.seed(42)
 DF <- setNames(as.data.frame(matrix(sample(1:15, 15, replace=TRUE), ncol=3)), c('f', 'u', 'z') )

  DF
  #  f  u  z
  #1 14  8  7
  #2 15 12 11
  #3  5  3 15
  #4 13 10  4
  #5 10 11  7

 res <- do.call(`data.frame`,lapply(split(letters[4:26], letters[4:26]), 
       function(x){x1 <- match(x, colnames(DF)); if(!is.na(x1)) DF[,x1] else NA}))

 res    
 #  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
 #1 NA NA 14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA  8 NA NA NA NA  7
 #2 NA NA 15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12 NA NA NA NA 11
 #3 NA NA  5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA  3 NA NA NA NA 15
 #4 NA NA 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 NA NA NA NA  4
 #5 NA NA 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 NA NA NA NA  7

Using dplyr

 library(dplyr)
   DF %>% 
   do({x1 <-data.frame(., setNames(as.list(rep(NA, sum(!letters[4:26] %in% names(DF)))), 
  setdiff(letters[4:26], names(DF))))
    x1[,order(colnames(x1))] })    
  #  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
 #1 NA NA 14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA  8 NA NA NA NA  7
 #2 NA NA 15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12 NA NA NA NA 11
 #3 NA NA  5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA  3 NA NA NA NA 15
 #4 NA NA 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 NA NA NA NA  4
 #5 NA NA 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 NA NA NA NA  7

R: Find Missing Columns, Add to Data Frame If Missing