Convert Type of Multiple Columns of a Dataframe At Once

pandas: to_numeric for multiple columns

UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:

In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)

In [166]: df.dtypes
Out[166]:
GeoName object
ComponentName object
IndustryId int64
IndustryClassification object
Description object
2004 int64
2005 int64
2006 int64
2007 int64
2008 int64
2009 int64
2010 int64
2011 int64
2012 int64
2013 int64
2014 float64
dtype: object

If you need to convert multiple columns to numeric dtypes - use the following technique:

Sample source DF:

In [271]: df
Out[271]:
id a b c d e f
0 id_3 AAA 6 3 5 8 1
1 id_9 3 7 5 7 3 BBB
2 id_7 4 2 3 5 4 2
3 id_0 7 3 5 7 9 4
4 id_0 2 4 6 4 0 2

In [272]: df.dtypes
Out[272]:
id object
a object
b int64
c int64
d int64
e int64
f object
dtype: object

Converting selected columns to numeric dtypes:

In [273]: cols = df.columns.drop('id')

In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

In [275]: df
Out[275]:
id a b c d e f
0 id_3 NaN 6 3 5 8 1.0
1 id_9 3.0 7 5 7 3 NaN
2 id_7 4.0 2 3 5 4 2.0
3 id_0 7.0 3 5 7 9 4.0
4 id_0 2.0 4 6 4 0 2.0

In [276]: df.dtypes
Out[276]:
id object
a float64
b int64
c int64
d int64
e int64
f float64
dtype: object

PS if you want to select all string (object) columns use the following simple trick:

cols = df.columns[df.dtypes.eq('object')]

Convert multiple columns to string in pandas dataframe

To convert multiple columns to string, include a list of columns to your above-mentioned command:

df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.

That means that one way to convert all columns is to construct the list of columns like this:

all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)

Note that the latter can also be done directly (see comments).

change multiple columns in pandas dataframe to datetime

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')

How to convert multiple columns from string to integer in pandas dataframe?

Try with replace():

df_all['1981'] = df_all['1981'].replace(',','',regex=True)

Now try with astype() method:

df_all['1981'] = df_all['1981'].astype('int64')

If you want to convert multiple columns then:

df[df.columns[2:]]=df[df.columns[2:]].replace(',','',regex=True).astype('int64')

changing data types of multiple columns at once in python/pandas

Another way would be to use astype in a for loop.

cat_cols = [col for col in df.columns if col not in ['col1', 'col5']]

for col in cat_cols:
df[col] = df[col].astype('category')

How to convert multiple columns in one column in pandas?

Use melt:

>>> df.melt(var_name='route', value_name='edge')
route edge
0 route1 19.0
1 route1 47.0
2 route1 56.0
3 route1 43.0
4 route2 51.0
5 route2 46.0
6 route2 37.0
7 route2 2.0

If you have some columns to protect, use id_vars=['col1', 'col2', ...] to not flatten them.

How to convert datatype of multiple columns at once based on pattern

Iterating of pd.DataFrame.columns returns str, which does have endswiths but not contains, which is a method of pandas str accessor:

import pandas as pd

df = pd.DataFrame(columns=['a', 'btime', 'timec', 'timedtime', 'e'])
for c in df.columns:
print(type(c), hasattr(c, 'endswith'), hasattr(c, 'contains'))

Output:

# type    endswith contains
<class 'str'> True False
...

Also, df.filter(like='time').columns:

Index(['btime', 'timec', 'timedtime'], dtype='object')

returns just as desired:

like : string
Keep labels from axis for which “like in label == True”.

Convert type of multiple columns of a dataframe at once

Edit See this related question for some simplifications and extensions on this basic idea.

My comment to Brandon's answer using switch:

convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}

out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...

For truly large data frames you may want to use lapply instead of the for loop:

convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}

When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...)). Also, be aware of data.frame() and as.data.frame()s default behavior of converting character to factor.

Python Pandas - Changing some column types to categories

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
public[col] = public[col].astype('category')


Related Topics



Leave a reply



Submit