How to Remove "Rows" with a Na Value

Remove rows with all or some NAs (missing values) in data.frame

Also check complete.cases :

> final[complete.cases(final), ]
             gene hsap mmul mmus rnor cfam
2 ENSG00000199674    0    2    2    2    2
6 ENSG00000221312    0    1    2    3    2

na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:

> final[complete.cases(final[ , 5:6]),]
             gene hsap mmul mmus rnor cfam
2 ENSG00000199674    0    2    2    2    2
4 ENSG00000207604    0   NA   NA    1    2
6 ENSG00000221312    0    1    2    3    2

Your solution can't work. If you insist on using is.na, then you have to do something like:

> final[rowSums(is.na(final[ , 5:6])) == 0, ]
             gene hsap mmul mmus rnor cfam
2 ENSG00000199674    0    2    2    2    2
4 ENSG00000207604    0   NA   NA    1    2
6 ENSG00000221312    0    1    2    3    2

but using complete.cases is quite a lot more clear, and faster.

How to remove row if it has a NA value in one certain column

The easiest solution is to use is.na():

df[!is.na(df$B), ]

which gives you:

remove Rows with complete set of NA

We can use dplyr. With the example by @lovalery:

library(dplyr)

df %>% filter(!if_all(V2:V3, is.na))

#>   V1 V2 V3
#> 1  3  3 NA
#> 2 NA  1 NA
#> 3  3  5 NA

We can use many different selection statements inside if_all. Check the documentation for more examples.

Deleting rows corresponding to particular string, but keep NA values

We can create a condition with is.na to return those elements that are NA as well with |

 subset(df, Col1 != 'string' | is.na(Col1))

Remove rows which have na values

I never used datatable but pandas.DataFrame has isna() to select rows with na, and drop() to remove rows (or it can use del for this) and I found similar functions for datatable.

Official documentation: Delete Rows/Columns
Other place (found with Google): Filtering out Empty Rows

datatable can use del to remove selected rows. It can use also .isna() or == None to select rows with na. Problem is that it can filter it only on one column - so it may need for-loop to check different columns.

columns = dt.f[:]

for n in range(1, 5):
    rows = (dt.f[f'B_lag_{n}'] == None)
    del df[rows, columns]

print(df)

This removes values from datatable but not rows and it create empty rows like this

   |     A          B    B_lag_1    B_lag_2    B_lag_3    B_lag_4
   | int64    float64    float64    float64    float64    float64
-- + -----  ---------  ---------  ---------  ---------  ---------
 0 |    NA  NA         NA         NA         NA         NA       
 1 |    NA  NA         NA         NA         NA         NA       
 2 |    NA  NA         NA         NA         NA         NA       
 3 |    NA  NA         NA         NA         NA         NA       
 4 |     0  -0.234153   1.52303    0.647689  -0.138264   0.496714
 5 |    NA  NA         NA         NA         NA         NA       
 6 |    NA  NA         NA         NA         NA         NA       
 7 |    NA  NA         NA         NA         NA         NA       
 8 |    NA  NA         NA         NA         NA         NA       
 9 |     1   0.54256   -0.469474   0.767435   1.57921   -0.234137
[10 rows x 6 columns]

It can be better to keep rows which don't have None

columns = dt.f[:]

for n in range(1, 5):
    rows = (dt.f[f'B_lag_{n}'] != None)
    df = df[rows, columns]

print(df)

Result:

   |     A          B    B_lag_1   B_lag_2    B_lag_3    B_lag_4
   | int64    float64    float64   float64    float64    float64
-- + -----  ---------  ---------  --------  ---------  ---------
 0 |     0  -0.234153   1.52303   0.647689  -0.138264   0.496714
 1 |     1   0.54256   -0.469474  0.767435   1.57921   -0.234137
[2 rows x 6 columns]

But you can use & (as operator AND) and | (as operator OR) to do the same without for-loop.

columns = dt.f[:]

rows = (dt.f['B_lag_1'] != None) & (dt.f['B_lag_2'] != None) & (dt.f['B_lag_3'] != None) & (dt.f['B_lag_4'] != None)

df = df[rows, columns]

print(df)

But later I found that datatable has dt.rowall() and dt.rowany() to work with many columns and code can be simpler.

Other place (found with Google): Filtering across Multiple Columns

rowall() works like operator AND, rowany() works like operator OR.

columns = dt.f[:]

rows = dt.rowall(dt.f['B_lag_1', 'B_lag_2', 'B_lag_3', 'B_lag_4'] != None)
#rows = dt.rowall(dt.f['B_lag_1':'B_lag_4'] != None)  # range of columns
#rows = dt.rowall(dt.f[:] != None)                    # all columns

df = df[rows, columns]

print(df)

Full working code:

I took code from my previous answer Create many lagged variables

import datatable as dt
import numpy as np

def test1(df):
    print('\n--- test 1 ---\n')
    
    df = df.copy()

    #columns = dt.f['A', 'B', 'B_lag_1', 'B_lag_2', 'B_lag_3', 'B_lag_4']
    #columns = df.keys()
    columns = dt.f[:]
    
    for n in range(1, 5):
        rows = (dt.f[f'B_lag_{n}'] == None)
        del df[rows, columns]

    print(df)    
        
def test2(df):
    print('\n--- test 2 ---\n')
    
    df = df.copy()

    #columns = dt.f['A', 'B', 'B_lag_1', 'B_lag_2', 'B_lag_3', 'B_lag_4']
    #columns = df.keys()
    columns = dt.f[:]

    for n in range(1, 5):
        rows = (dt.f[f'B_lag_{n}'] != None)
        df = df[rows, columns]
    
    print(df)

def test3(df):
    print('\n--- test 3 ---\n')
    
    df = df.copy()
    
    rows = (dt.f['B_lag_1'] != None) & (dt.f['B_lag_2'] != None) & (dt.f['B_lag_3'] != None) & (dt.f['B_lag_4'] != None)

    columns = dt.f[:]
    
    df = df[rows, columns]
    
    print(df)

def test4(df):
    print('\n--- test 4 ---\n')
    
    df = df.copy()

    columns = dt.f[:]
    
    #rows = dt.rowall(dt.f['B_lag_1', 'B_lag_2', 'B_lag_3', 'B_lag_4'] != None) # use columns in some range
    #rows = dt.rowall(dt.f['B_lag_1':'B_lag_4'] != None) # use columns in some range
    #rows = dt.rowall(dt.f[float] != None)               # use columns which have float values
    rows = dt.rowall(dt.f[:] != None)                    # use all columns

    df = df[rows, columns]
    
    print(df)
    
    
# --- main ---

np.random.seed(42)

df = dt.Frame({
    "A": np.repeat(np.arange(0, 2), 5), 
    "B": np.random.normal(0, 1, 10)
})

for n in range(1, 5):
    df[f'B_lag_{n}'] = df[:, dt.shift(dt.f.B, n), dt.by('A')]['B']

# --- tests ---

test1(df)
test2(df)
test3(df)
test4(df)

How to remove rows with a NA value?

dat <- data.frame(x1 = c(1,2,3, NA, 5), x2 = c(100, NA, 300, 400, 500))

na.omit(dat)
  x1  x2
1  1 100
3  3 300
5  5 500

remove rows from dataframe based on value, ignoring NAs

We need to select the rows where d is not greater than 0 OR there is NA in d

df[with(df, !d > 0 | is.na(d)), ]

#   a  b c  d
#1  1  2 3  0
#2  4 NA 1 NA
#4 NA  5 0  0

Or we can also use subset

subset(df, !d > 0 | is.na(d))

or dplyr filter

library(dplyr)
df %>% filter(!d > 0 | is.na(d))

The !d > 0 part can also be reversed to

subset(df, d < 1 | is.na(d))

to get the same result.

Remove rows where all columns except one have NA values?

We may use if_all in filter- select the columns a to b in if_all, apply the is.na (check for NA), the output will be TRUE for a row if both a and b have NA, negate (!) to convert TRUE-> FALSE and FALSE->TRUE

library(dplyr)
df %>%
   filter(!if_all(a:b, is.na))

-output

ID    a    b
1  1   ab <NA>
2  1 <NA>   ab

Or instead of negating (!), we may use complete.cases with if_any

df %>% 
  filter(if_any(a:b, complete.cases))
  ID    a    b
1  1   ab <NA>
2  1 <NA>   ab

Regarding the issue in OP's code, the logic is created by looking whether there is atleast one NA (> 0) which is true for all the rows. Instead, it should be all NA and then negate

na_rows <- df %>% 
  select(-"ID") %>% 
  is.na() %>% 
  {rowSums(.) == ncol(.)}

data

df <- structure(list(ID = c(1L, 1L, 1L), a = c("ab", NA, NA), b = c(NA, 
"ab", NA)), class = "data.frame", row.names = c(NA, -3L))

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Don't drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()]