How to Extract a Single Column from a Data.Frame as a Data.Frame

Extracting specific columns from pandas.dataframe

import pandas as pd

input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]

Here specify your column numbers which you want to select. In dataframe, column start from index = 0

cols = []

You can select column by name wise also. Just use following line

df = df[["Column Name","Column Name2"]]

How do I extract a single column from a data.frame as a data.frame?

Use drop=FALSE

> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30

From the documentation (see ?"[") you can find:

If drop=TRUE the result is coerced to the lowest possible dimension.

Extract single column from Pandas DataFrame in two ways, difference?

Square brackets are important

df['floor_temperature'] represents a series. pd.Series objects are one-dimensional. The argument feeding pd.DataFrame.__getitem__, for which [] is syntactic sugar, is a scalar.

df[['floor_temperature']] represents a dataframe. pd.DataFrame objects are two-dimensional, indicated by the argument being a list.

What you are seeing is the difference between a single isolated series and a dataframe with a single series.

Extract row from a data frame and make it a new data frame and change its index as like column value

Use rename for convert indices:

s = df.iloc[0].rename(df.iloc[0])
print (s)
0 0
2 2
5 5
3 3
Name: 0, dtype: int32

If need one column DataFrame:

df1=df.iloc[0].rename(df.iloc[0]).to_frame('col')
print (df1)
col
0 0
2 2
5 5
3 3

Or:

s = pd.Series(df.iloc[0].to_numpy(), index=df.iloc[0])
#for oldier pandas versions
#s = pd.Series(df.iloc[0].values, index=df.iloc[0])
print (s)
0 0
2 2
5 5
3 3
dtype: int32

R - How to extract an element from a single column data frame?

You can read only the first column from a data frame like this:

x <- df[1,, drop = FALSE]

Extracting specific columns from a data frame

Using the dplyr package, if your data.frame is called df1:

library(dplyr)

df1 %>%
select(A, B, E)

This can also be written without the %>% pipe as:

select(df1, A, B, E)

Extract row from a data frame and make it a new data frame

Use DataFrame.iloc like:

df1 = df.iloc[0].to_frame('row')

For Series:

s = df.iloc[0]

Subset one column from data frame keeping the subset as a data frame

We need drop = FALSE

X[, paste0("filter",c(selected_filters)), drop = FALSE]
# filter2
#1 1
#2 1
#3 0

If we look at ?Extract, the Usage shows

x[i, j, ... , drop = TRUE]

and in the description, it says

drop - For matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples). This only works for extracting elements, not for the replacement. See drop for further details.


Note that subset behavior is different because it is by default drop = FALSE

subset(X, select = paste0("filter",c(selected_filters)))

Python pandas: Keep selected column as DataFrame instead of Series

As @Jeff mentions there are a few ways to do this, but I recommend using loc/iloc to be more explicit (and raise errors early if you're trying something ambiguous):

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
A B
0 1 2
1 3 4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]: # they all return the same thing:
A
0 1
1 3

The latter two choices remove ambiguity in the case of integer column names (precisely why loc/iloc were created). For example:

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
A 0
0 1 2
1 3 4

In [18]: df[[0]] # ambiguous
Out[18]:
A
0 1
1 3

Extract a pattern from column and make a new one in R data frame

You can extract the word after "gene_id" :

dat$y <- sub('.*gene_id\\s"(\\w+)";.*', '\\1', dat$y)
dat

# x y
#1 1 ENSG00000224818
#2 2 ENSG00000261067
#3 3 ENSG00000261067


Related Topics



Leave a reply



Submit