Count Number of Non-Na Values for Every Column in a Dataframe

Count number of non-NA values for every column in a dataframe

You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))

str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...

colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70

Count non-na values by row and save total to a new variable in pandas

You just need to use count() with axis=1:

df['Total'] = df.count(axis=1)

Yields:

    x1   x2   x3  Total
0 Yes Yes NaN 2
1 Yes NaN NaN 1
2 No Yes No 3

How to count the number of non-NA observations of a dataframe using Dplyr R (like df.count() in Python Pandas)

summarise_all and funs are both deprecated. You can do this with across -

library(dplyr)
mtcars %>% summarise(across(.fns = ~sum(!is.na(.))))

Or in base R -

colSums(!is.na(mtcars))

From an R dataframe: count non-NA values by column, grouped by one of the columns

We can use summarise_all

library(dplyr)
litmus %>%
group_by(grouping) %>%
summarise_all(funs(sum(!is.na(.))))

Count number of rows that are not NA

After grouping by the columns of interest, get the sum of logical vector as the count i.e. - is.na(valor) returns a logical vector with TRUE where there are NA and FALSE for non-NA, negate (!) to reverse it and get the sum of the logical such as each TRUE (-> 1) represents one non-NA element

library(dplyr)
df1 %>%
group_by(id_station, id_parameter, year, day, month) %>%
summarise(Count = sum(!is.na(valor)))

Find out the percentage of missing values in each column in the given dataset

How about this? I think I actually found something similar on here once before, but I'm not seeing it now...

percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
'percent_missing': percent_missing})

And if you want the missing percentages sorted, follow the above with:

missing_value_df.sort_values('percent_missing', inplace=True)

As mentioned in the comments, you may also be able to get by with just the first line in my code above, i.e.:

percent_missing = df.isnull().sum() * 100 / len(df)

Select set of columns so that each row has at least one non-NA entry

Using a while loop, this should work to get the minimum set of variables with at least one non-NA per row.

best <- function(df){
best <- which.max(colSums(sapply(df, complete.cases)))
while(any(rowSums(sapply(df[best], complete.cases)) == 0)){
best <- c(best, which.max(sapply(df[is.na(df[best]), ], \(x) sum(complete.cases(x)))))
}
best
}

testing

best(df)
#d c
#4 3

df[best(df)]
# d c
#1 1 1
#2 1 NA
#3 1 NA
#4 1 NA
#5 NA 1

First, select the column with the least NAs (stored in best). Then, update the vector with the column that has the highest number of non-NA rows on the remaining rows (where best has still NAs), until you get every rows with a complete case.

Count the number of non empty columns in R

We may use vectorized rowSums on a logical matrix

data$Total <- rowSums(!is.na(data[1:3]))
data$Total
[1] 3 2 2 1 1


Related Topics



Leave a reply



Submit