How to Delete All Columns in Dataframe Except Certain Ones

Delete all but one column of pandas dataframe?

You can simply write:

df = df[['bob']]

and the other columns will be garbage collected.

Pandas.Drop all columns with missing values except 1 column

This should do what you need:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Age' : [5,np.nan,12,43],
'Name' : ['Alice','Bob','Charly','Dan'],
'Sex' : ['F','M','M',np.nan]})

df_filt = df.loc[:,(-df.isnull().any()) | (df.columns.isin(['Age']))]

Explanation:

df.isnull().any()) checks for all columns if any value is None or NaN, the - means that only those columns are selected that do not meet that criterion.

df.columns.isin(['Age']) checks for all columns if their name is 'Age', so that this column is selected in any case.

Both conditions are connected by an OR (|) so that if either condition applies the column is selected.

How to select all columns except one in pandas?

When the columns are not a MultiIndex, df.columns is just an array of column names so you can do:

df.loc[:, df.columns != 'b']

a c d
0 0.561196 0.013768 0.772827
1 0.882641 0.615396 0.075381
2 0.368824 0.651378 0.397203
3 0.788730 0.568099 0.869127

Selecting/excluding sets of columns in pandas

You can either Drop the columns you do not need OR Select the ones you need

# Using DataFrame.drop
df.drop(df.columns[[1, 2]], axis=1, inplace=True)

# drop by Name
df1 = df1.drop(['B', 'C'], axis=1)

# Select the ones you want
df1 = df[['a','d']]

Delete all values except the column name row Pandas

You cannot have an empty column in pandas. The least you can have is null values.
If you want to fill null values (or any value) you may use the below code:

df.suggested_long = np.NaN
df.suggested_short = np.NaN

How to take all columns except one column in Data Frame Python?

You can use np.r_ with df.iloc like below for position based indexing with slicing:

pos_of_col= 17
df.iloc[:,np.r_[range(pos_of_col-1),range(pos_of_col,len(df.columns))]]

Demo , dropping column at position 4 (column 3 since python indexing starts at 0)

np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,20,(5,10))).add_prefix("col_")
print(df,'\n')

pos_of_col= 4
print(df.iloc[:,np.r_[range(pos_of_col-1),range(pos_of_col,len(df.columns))]])



col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9
0 12 15 0 3 3 7 9 19 18 4
1 6 12 1 6 7 14 17 5 13 8
2 9 19 16 19 5 15 15 0 18 3
3 17 19 19 19 14 7 0 1 9 0
4 10 3 11 18 2 0 0 4 5 6

col_0 col_1 col_2 col_4 col_5 col_6 col_7 col_8 col_9
0 12 15 0 3 7 9 19 18 4
1 6 12 1 7 14 17 5 13 8
2 9 19 16 5 15 15 0 18 3
3 17 19 19 14 7 0 1 9 0
4 10 3 11 2 0 0 4 5 6

Remove rows where all columns except one have NA values?

We may use if_all in filter- select the columns a to b in if_all, apply the is.na (check for NA), the output will be TRUE for a row if both a and b have NA, negate (!) to convert TRUE-> FALSE and FALSE->TRUE

library(dplyr)
df %>%
filter(!if_all(a:b, is.na))

-output

ID    a    b
1 1 ab <NA>
2 1 <NA> ab

Or instead of negating (!), we may use complete.cases with if_any

df %>% 
filter(if_any(a:b, complete.cases))
ID a b
1 1 ab <NA>
2 1 <NA> ab

Regarding the issue in OP's code, the logic is created by looking whether there is atleast one NA (> 0) which is true for all the rows. Instead, it should be all NA and then negate

na_rows <- df %>% 
select(-"ID") %>%
is.na() %>%
{rowSums(.) == ncol(.)}

data

df <- structure(list(ID = c(1L, 1L, 1L), a = c("ab", NA, NA), b = c(NA, 
"ab", NA)), class = "data.frame", row.names = c(NA, -3L))


Related Topics



Leave a reply



Submit