Count number of non-NA values for every column in a dataframe
You can also call is.na
on the entire data frame (implicitly coercing to a logical matrix) and call colSums
on the inverted response:
# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...
colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70
Count non-na values by row and save total to a new variable in pandas
You just need to use count()
with axis=1
:
df['Total'] = df.count(axis=1)
Yields:
x1 x2 x3 Total
0 Yes Yes NaN 2
1 Yes NaN NaN 1
2 No Yes No 3
How to count the number of non-NA observations of a dataframe using Dplyr R (like df.count() in Python Pandas)
summarise_all
and funs
are both deprecated. You can do this with across
-
library(dplyr)
mtcars %>% summarise(across(.fns = ~sum(!is.na(.))))
Or in base R -
colSums(!is.na(mtcars))
From an R dataframe: count non-NA values by column, grouped by one of the columns
We can use summarise_all
library(dplyr)
litmus %>%
group_by(grouping) %>%
summarise_all(funs(sum(!is.na(.))))
Count number of rows that are not NA
After grouping by the columns of interest, get the sum
of logical vector as the count i.e. - is.na(valor)
returns a logical vector with TRUE where there are NA and FALSE for non-NA, negate (!
) to reverse it and get the sum
of the logical such as each TRUE (-> 1
) represents one non-NA element
library(dplyr)
df1 %>%
group_by(id_station, id_parameter, year, day, month) %>%
summarise(Count = sum(!is.na(valor)))
Find out the percentage of missing values in each column in the given dataset
How about this? I think I actually found something similar on here once before, but I'm not seeing it now...
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
'percent_missing': percent_missing})
And if you want the missing percentages sorted, follow the above with:
missing_value_df.sort_values('percent_missing', inplace=True)
As mentioned in the comments, you may also be able to get by with just the first line in my code above, i.e.:
percent_missing = df.isnull().sum() * 100 / len(df)
Select set of columns so that each row has at least one non-NA entry
Using a while
loop, this should work to get the minimum set of variables with at least one non-NA per row.
best <- function(df){
best <- which.max(colSums(sapply(df, complete.cases)))
while(any(rowSums(sapply(df[best], complete.cases)) == 0)){
best <- c(best, which.max(sapply(df[is.na(df[best]), ], \(x) sum(complete.cases(x)))))
}
best
}
testing
best(df)
#d c
#4 3
df[best(df)]
# d c
#1 1 1
#2 1 NA
#3 1 NA
#4 1 NA
#5 NA 1
First, select the column with the least NAs (stored in best
). Then, update the vector with the column that has the highest number of non-NA rows on the remaining rows (where best has still NAs), until you get every rows with a complete case.
Count the number of non empty columns in R
We may use vectorized rowSums
on a logical matrix
data$Total <- rowSums(!is.na(data[1:3]))
data$Total
[1] 3 2 2 1 1
Related Topics
R Shiny Table Not Rendering HTML
How to Detect Free Variable Names in R Functions
Solving Non-Square Linear System with R
How to Repeat the Grubbs Test and Flag the Outliers
R Plot Filled Longitude-Latitude Grid Cells on Map
Equivalent to Rowmeans() for Min()
How to Check If a Column Is a Date in R
Make a Rectangular Legend, with Rows and Columns Labeled, in Grid
Splitting a Data Frame into Equal Parts
Ggplot: Boxplot of Multiple Column Values
Create Sparse Matrix from a Data Frame
How to Get the Number of Rows in a CSV File Without Opening It
Select Along One of N Dimensions in Array