Extracting specific columns from pandas.dataframe
import pandas as pd
input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]
Here specify your column numbers which you want to select. In dataframe, column start from index = 0
cols = []
You can select column by name wise also. Just use following line
df = df[["Column Name","Column Name2"]]
Select columns from data frames in a list using a column names vectors list
With purrr:
purrr::map2(all_trials, trials_to_select, dplyr::select)
$data1
trial1 trial3 trial4
1 1 11 16
2 2 12 17
3 3 13 18
4 4 14 19
5 5 15 20
$data2
trial2 trial5
1 1 9
2 2 10
3 3 11
4 4 12
Extract certain columns from data frame R
base R
newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 2 1
# 3 x3 1 2
# 4 x4 2 2
# 5 x5 3 1
newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
Walk-through:
first, we determine which columns are numbers and contain the numbers 1 or 3:
sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))
# x s1 s2 s3 s4
# FALSE TRUE FALSE TRUE FALSEThis will exclude any column that is not numeric, meaning that a
character
column that contains a literal"1"
or"3"
will not be retained. This is complete inference on my end; if you want to accept the string versions then remove theis.numeric(z)
component.second, we extract the names of those that are true, and prepend
"x"
c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))
# [1] "x" "s1" "s3"wrap that in
unique(.)
if, for some reason,"x"
is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)select those columns, defensively adding
drop=FALSE
so that if only one column is matched, it still returns a fulldata.frame
replace just those columns (excluding the first column which is
"x"
) with 0 or 1; thez == 1
returnslogical
, and the wrapping+(..)
converts logical to 0 (false) or 1 (true).
dplyr
library(dplyr)
df %>%
select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
mutate(across(-x, ~ +(. == 1)))
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
R List of Dataframes - How to select certain columns in every entry?
This works fine for me.
for(i in 1:length(data_list)){
x2=data_list[[i]][,c(FALSE,TRUE)]
x2[data_list[[i]][,c(TRUE,FALSE)][,-1]<=0]<-NA
data_list[[i]][,c(FALSE,TRUE)]<-x2}
You can also just use lapply
.
lapply(data_list,function(x){
x2=x[,c(FALSE,TRUE)]
x2[x[,c(TRUE,FALSE)][,-1]<=0]<-NA
x[,c(FALSE,TRUE)]<-x2
x})
Result
>data_list
[[1]]
V1 V2 V3 V4 V5
1 20000608 NA -1 10 1
2 20000609 NA 0 NA -1
3 20000610 15 1 NA 0
4 20000611 NA -1 40 1
5 20000612 NA 0 NA -1
6 20000613 30 1 NA 0
[[2]]
V1 V2 V3 V4 V5
1 20030608 NA 0 NA -1
2 20100609 NA -1 14.4 1
3 20060610 34.4 1 NA 0
4 20040611 NA 0 NA -1
5 20009612 NA -1 48.6 1
6 20002613 80.0 1 NA 0
[[3]]
V1 V2 V3 V4 V5
1 20030602 NA -1 9.0 1
2 20100606 NA 0 NA 0
3 20060610 57.4 1 NA -1
4 20040511 NA -1 31.8 1
5 20007612 NA 0 NA 0
6 20002624 133.0 1 NA -1
Extracting specific columns from a data frame
Using the dplyr package, if your data.frame is called df1
:
library(dplyr)
df1 %>%
select(A, B, E)
This can also be written without the %>%
pipe as:
select(df1, A, B, E)
How can I extract from a list of pandas dataframes, a specific column?
As first remark, I think you should pass index_col=0
to pd.read_csv
.
Regarding accessing the column 3
, this may be a number, thus the following should work df[3]
or df.loc[:,3]
R: Select columns from a list of dataframes while some columns do not exist in few dataframes
You can use the intersect
function:
> intersect(c("a", "b", "c"), c("a", "b"))
[1] "a" "b"
I.e. modify your function like this:
> lapply(ls, function(x) subset(x, select = intersect(keep, colnames(x))))
[[1]]
b cc
1 5 10
2 6 11
3 7 12
4 8 13
5 9 14
6 10 15
[[2]]
b
1 5
2 6
3 7
4 8
5 9
6 10
Extract the last column from a list of data frames
Instead of using the lengths
you can ask of the number of columns by ncol
. This should work:
lapply(list_with_df,function(x) x[,ncol(x)])
Edit
Just for some clarification: The reason why you got the same column number for each data frame is because you have always selected the column number according to the first element of lengths
vector by using lengths(list_with_df)[1]
. It was always the length of the first data.frame
Related Topics
R: How to Run Some Code on Load of Package
Sparse Matrix to a Data Frame in R
Ggplot Replace Count with Percentage in Geom_Bar
R Grep: Is There an and Operator
Protect/Encrypt R Package Code for Distribution
What Does "Error: Object '<Myvariable>' Not Found" Mean
Use a Variable Within a Plotmath Expression
Make Readline Wait for Input in R
Create a Time Interval of 15 Minutes from Minutely Data in R
Plots Generated by 'Plot' and 'Ggplot' Side-By-Side
Multiple Time Series in One Plot
How to Get Top N Companies from a Data Frame in Decreasing Order
Call by Reference in R (Using Function to Modify an Object)
Rcpp Function Check If Missing Value
Rstudio Shiny List from Checking Rows in Datatables
Is There a Logical Way to Think About List Indexing
Ggplot2: Connecting Points in Polar Coordinates with a Straight Line 2