R: Find missing columns, add to data frame if missing
Here's a straightforward approach
df <- data.frame(a=1:4, e=4:1)
nms <- c("a", "b", "d", "e") # Vector of columns you want in this data.frame
Missing <- setdiff(nms, names(df)) # Find names of missing columns
df[Missing] <- 0 # Add them, filled with '0's
df <- df[nms] # Put columns in desired order
# a b d e
# 1 1 0 0 4
# 2 2 0 0 3
# 3 3 0 0 2
# 4 4 0 0 1
Add missing columns from different data.frame filled with 0
We can use setdiff
to find out columns which are not present in df2
and assign the value 0 to those columns.
df2[setdiff(names(df1), names(df2))] <- 0
# a c b d
#1 5 6 0 0
If we want to maintain the same order of columns as in df1
we can later do
df2[names(df1)]
# a b c d
#1 5 0 6 0
Tidy way to add column if missing from data frame
Assuming you don't want to overwrite the column if it is already present in your data you can use add_column
along with an if
condition to check if the column is already present.
library(dplyr)
df1 <- data.frame(a=c(1:3, NA), b=c(NA,2:4))
if(!'c' %in% names(df1)) df1 <- df1 %>% add_column(c = NA)
df1
# a b c
#1 1 NA NA
#2 2 2 NA
#3 3 3 NA
#4 NA 4 NA
Checking all columns in data frame for missing values in R
The anyNA
function is built for this. You can apply it to all columns of a data frame with sapply(books, anyNA)
. To count NA
values, akrun's suggestion of colSums(is.na(books))
is good.
Filling missing column data based on other column data in R
You can replace the string 'NaN' with NA using NA_if()
, then sort (arrange
) the data by the desired columns so that NA values per GROUP and UCR come last and finally fill
NA with the values one row above.
Example data df:
df <- structure(list(ID = c(0L, 1L, 2L, 3L, 4L, 245865L, 245866L, 245867L,
245868L, 245869L), OFFENSE = c(3126L, 3831L, 724L, 301L, 619L,
3115L, 619L, 2629L, 2629L, 3208L), GROUP = c("NaN", "NaN", "NaN",
"NaN", "NaN", "Aggravated Assault", "Larceny", "Harassment",
"Harassment", "Property Lost"), DESCRIPTION = c("ASSAULT", "PROPERTY DAMAGE",
"AUTO THEFT", "ROBBERY", "LARCENY ALL OTHERS", "ASSAULT", "LARCENY ALL OTHERS",
"HARASSMENT", "HARASSMENT", "PROPERTY - MISSING"), UCR = c("NaN",
"NaN", "NaN", "NaN", "NaN", "Part One", "Part One", "Part Two",
"Part Two", NA)), class = "data.frame", row.names = c(NA, 10L
))
code:
library(tidyr)
library(dplyr)
df %>%
na_if('NaN') %>%
arrange(DESCRIPTION, GROUP, UCR) %>%
fill(GROUP, UCR, .direction = 'down')
Note that fill
only targets NA, hence the initial replacement of 'NaN' with NA.
Read table with missing values and columns in R
It worked with:
data <- read.table("data.txt", sep="", fill = TRUE, header=FALSE)
Now I got 5 columns and 4 rows. The empty value is filled with "NA".
fill missing columns with NA while extracting from a data.frame
set.seed(42)
DF <- setNames(as.data.frame(matrix(sample(1:15, 15, replace=TRUE), ncol=3)), c('f', 'u', 'z') )
DF
# f u z
#1 14 8 7
#2 15 12 11
#3 5 3 15
#4 13 10 4
#5 10 11 7
res <- do.call(`data.frame`,lapply(split(letters[4:26], letters[4:26]),
function(x){x1 <- match(x, colnames(DF)); if(!is.na(x1)) DF[,x1] else NA}))
res
# d e f g h i j k l m n o p q r s t u v w x y z
#1 NA NA 14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 NA NA NA NA 7
#2 NA NA 15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12 NA NA NA NA 11
#3 NA NA 5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA 15
#4 NA NA 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 NA NA NA NA 4
#5 NA NA 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 NA NA NA NA 7
Using dplyr
library(dplyr)
DF %>%
do({x1 <-data.frame(., setNames(as.list(rep(NA, sum(!letters[4:26] %in% names(DF)))),
setdiff(letters[4:26], names(DF))))
x1[,order(colnames(x1))] })
# d e f g h i j k l m n o p q r s t u v w x y z
#1 NA NA 14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 NA NA NA NA 7
#2 NA NA 15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12 NA NA NA NA 11
#3 NA NA 5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA 15
#4 NA NA 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 NA NA NA NA 4
#5 NA NA 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 NA NA NA NA 7
Related Topics
Update an Entire Row in Data.Table in R
How to Set the Latex Path for Sweave in R
Fitting Logarithmic Curve in R
Different Y-Axis Labels Facet_Grid and Sizes
Shading Area Between Two Lines in R
Incremental Nested Lists in Rmarkdown
Regression Line for the Entire Data Set Together with Regression Lines Based on Groups
Syntax Highlighting for Python Chunks Does Not Work
Shiny - How to Change the Font Size in Select Tags
How to Change the Size of the Strip on Facets in a Ggplot
Object Not Found Error with Ggplot2
Inline Function Code Doesn't Compile
Calculating Prediction Accuracy of a Tree Using Rpart's Predict Method
Why Do Rapply and Lapply Handle Null Differently