How to Extract Certain Columns from a List of Data Frames

Extracting specific columns from pandas.dataframe


import pandas as pd

input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]

Here specify your column numbers which you want to select. In dataframe, column start from index = 0

cols = []

You can select column by name wise also. Just use following line

df = df[["Column Name","Column Name2"]]

Select columns from data frames in a list using a column names vectors list

With purrr:

purrr::map2(all_trials, trials_to_select, dplyr::select)
$data1
trial1 trial3 trial4
1 1 11 16
2 2 12 17
3 3 13 18
4 4 14 19
5 5 15 20

$data2
trial2 trial5
1 1 9
2 2 10
3 3 11
4 4 12

Extract certain columns from data frame R


base R

newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 2 1
# 3 x3 1 2
# 4 x4 2 2
# 5 x5 3 1

newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1

Walk-through:

  • first, we determine which columns are numbers and contain the numbers 1 or 3:

    sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))
    # x s1 s2 s3 s4
    # FALSE TRUE FALSE TRUE FALSE

    This will exclude any column that is not numeric, meaning that a character column that contains a literal "1" or "3" will not be retained. This is complete inference on my end; if you want to accept the string versions then remove the is.numeric(z) component.

  • second, we extract the names of those that are true, and prepend "x"

    c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))
    # [1] "x" "s1" "s3"
  • wrap that in unique(.) if, for some reason, "x" is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)

  • select those columns, defensively adding drop=FALSE so that if only one column is matched, it still returns a full data.frame

  • replace just those columns (excluding the first column which is "x") with 0 or 1; the z == 1 returns logical, and the wrapping +(..) converts logical to 0 (false) or 1 (true).

dplyr

library(dplyr)
df %>%
select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
mutate(across(-x, ~ +(. == 1)))
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1

R List of Dataframes - How to select certain columns in every entry?

This works fine for me.

for(i in 1:length(data_list)){
x2=data_list[[i]][,c(FALSE,TRUE)]
x2[data_list[[i]][,c(TRUE,FALSE)][,-1]<=0]<-NA
data_list[[i]][,c(FALSE,TRUE)]<-x2}

You can also just use lapply.

lapply(data_list,function(x){
x2=x[,c(FALSE,TRUE)]
x2[x[,c(TRUE,FALSE)][,-1]<=0]<-NA
x[,c(FALSE,TRUE)]<-x2
x})

Result

>data_list
[[1]]
V1 V2 V3 V4 V5
1 20000608 NA -1 10 1
2 20000609 NA 0 NA -1
3 20000610 15 1 NA 0
4 20000611 NA -1 40 1
5 20000612 NA 0 NA -1
6 20000613 30 1 NA 0

[[2]]
V1 V2 V3 V4 V5
1 20030608 NA 0 NA -1
2 20100609 NA -1 14.4 1
3 20060610 34.4 1 NA 0
4 20040611 NA 0 NA -1
5 20009612 NA -1 48.6 1
6 20002613 80.0 1 NA 0

[[3]]
V1 V2 V3 V4 V5
1 20030602 NA -1 9.0 1
2 20100606 NA 0 NA 0
3 20060610 57.4 1 NA -1
4 20040511 NA -1 31.8 1
5 20007612 NA 0 NA 0
6 20002624 133.0 1 NA -1

Extracting specific columns from a data frame

Using the dplyr package, if your data.frame is called df1:

library(dplyr)

df1 %>%
select(A, B, E)

This can also be written without the %>% pipe as:

select(df1, A, B, E)

How can I extract from a list of pandas dataframes, a specific column?

As first remark, I think you should pass index_col=0 to pd.read_csv.
Regarding accessing the column 3, this may be a number, thus the following should work df[3] or df.loc[:,3]

R: Select columns from a list of dataframes while some columns do not exist in few dataframes

You can use the intersect function:

> intersect(c("a", "b", "c"), c("a", "b"))
[1] "a" "b"

I.e. modify your function like this:

> lapply(ls, function(x) subset(x, select = intersect(keep, colnames(x))))
[[1]]
b cc
1 5 10
2 6 11
3 7 12
4 8 13
5 9 14
6 10 15

[[2]]
b
1 5
2 6
3 7
4 8
5 9
6 10

Extract the last column from a list of data frames

Instead of using the lengths you can ask of the number of columns by ncol. This should work:

lapply(list_with_df,function(x) x[,ncol(x)])

Edit

Just for some clarification: The reason why you got the same column number for each data frame is because you have always selected the column number according to the first element of lengths vector by using lengths(list_with_df)[1]. It was always the length of the first data.frame



Related Topics



Leave a reply



Submit