Extracting specific columns from pandas.dataframe
import pandas as pd
input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]
Here specify your column numbers which you want to select. In dataframe, column start from index = 0
cols = []
You can select column by name wise also. Just use following line
df = df[["Column Name","Column Name2"]]
How do I extract a single column from a data.frame as a data.frame?
Use drop=FALSE
> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30
From the documentation (see ?"["
) you can find:
If drop=TRUE the result is coerced to the lowest possible dimension.
Extract single column from Pandas DataFrame in two ways, difference?
Square brackets are important
df['floor_temperature']
represents a series. pd.Series
objects are one-dimensional. The argument feeding pd.DataFrame.__getitem__
, for which []
is syntactic sugar, is a scalar.
df[['floor_temperature']]
represents a dataframe. pd.DataFrame
objects are two-dimensional, indicated by the argument being a list.
What you are seeing is the difference between a single isolated series and a dataframe with a single series.
Extract row from a data frame and make it a new data frame and change its index as like column value
Use rename
for convert indices:
s = df.iloc[0].rename(df.iloc[0])
print (s)
0 0
2 2
5 5
3 3
Name: 0, dtype: int32
If need one column DataFrame
:
df1=df.iloc[0].rename(df.iloc[0]).to_frame('col')
print (df1)
col
0 0
2 2
5 5
3 3
Or:
s = pd.Series(df.iloc[0].to_numpy(), index=df.iloc[0])
#for oldier pandas versions
#s = pd.Series(df.iloc[0].values, index=df.iloc[0])
print (s)
0 0
2 2
5 5
3 3
dtype: int32
R - How to extract an element from a single column data frame?
You can read only the first column from a data frame like this:
x <- df[1,, drop = FALSE]
Extracting specific columns from a data frame
Using the dplyr package, if your data.frame is called df1
:
library(dplyr)
df1 %>%
select(A, B, E)
This can also be written without the %>%
pipe as:
select(df1, A, B, E)
Extract row from a data frame and make it a new data frame
Use DataFrame.iloc
like:
df1 = df.iloc[0].to_frame('row')
For Series
:
s = df.iloc[0]
Subset one column from data frame keeping the subset as a data frame
We need drop = FALSE
X[, paste0("filter",c(selected_filters)), drop = FALSE]
# filter2
#1 1
#2 1
#3 0
If we look at ?Extract
, the Usage shows
x[i, j, ... , drop = TRUE]
and in the description, it says
drop - For matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples). This only works for extracting elements, not for the replacement. See drop for further details.
Note that subset
behavior is different because it is by default drop = FALSE
subset(X, select = paste0("filter",c(selected_filters)))
Python pandas: Keep selected column as DataFrame instead of Series
As @Jeff mentions there are a few ways to do this, but I recommend using loc/iloc to be more explicit (and raise errors early if you're trying something ambiguous):
In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [11]: df
Out[11]:
A B
0 1 2
1 3 4
In [12]: df[['A']]
In [13]: df[[0]]
In [14]: df.loc[:, ['A']]
In [15]: df.iloc[:, [0]]
Out[12-15]: # they all return the same thing:
A
0 1
1 3
The latter two choices remove ambiguity in the case of integer column names (precisely why loc/iloc were created). For example:
In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])
In [17]: df
Out[17]:
A 0
0 1 2
1 3 4
In [18]: df[[0]] # ambiguous
Out[18]:
A
0 1
1 3
Extract a pattern from column and make a new one in R data frame
You can extract the word after "gene_id"
:
dat$y <- sub('.*gene_id\\s"(\\w+)";.*', '\\1', dat$y)
dat
# x y
#1 1 ENSG00000224818
#2 2 ENSG00000261067
#3 3 ENSG00000261067
Related Topics
Convert Type of Multiple Columns of a Dataframe At Once
Addressing X and Y in Aes by Variable Number
How to Insert Elements into a Vector
Define and Apply Custom Bins on a Dataframe
Memory Allocation "Error: Cannot Allocate Vector of Size 75.1 Mb"
Latitude Longitude Coordinates to State Code in R
Customize Ggplot2 Axis Labels With Different Colors
Concatenate Row-Wise Across Specific Columns of Dataframe
Melt/Reshape in Excel Using Vba
How to Set Up Conda-Installed R For Use With Rstudio
How to Uninstall R and Rstudio With All Packages, Settings and Everything Else
How to Assign Values to Dynamic Names Variables
Extract the First 2 Characters in a String
Sample N Random Rows Per Group in a Dataframe