pandas: to_numeric for multiple columns
UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:
In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)
In [166]: df.dtypes
Out[166]:
GeoName object
ComponentName object
IndustryId int64
IndustryClassification object
Description object
2004 int64
2005 int64
2006 int64
2007 int64
2008 int64
2009 int64
2010 int64
2011 int64
2012 int64
2013 int64
2014 float64
dtype: object
If you need to convert multiple columns to numeric dtypes - use the following technique:
Sample source DF:
In [271]: df
Out[271]:
id a b c d e f
0 id_3 AAA 6 3 5 8 1
1 id_9 3 7 5 7 3 BBB
2 id_7 4 2 3 5 4 2
3 id_0 7 3 5 7 9 4
4 id_0 2 4 6 4 0 2
In [272]: df.dtypes
Out[272]:
id object
a object
b int64
c int64
d int64
e int64
f object
dtype: object
Converting selected columns to numeric dtypes:
In [273]: cols = df.columns.drop('id')
In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
In [275]: df
Out[275]:
id a b c d e f
0 id_3 NaN 6 3 5 8 1.0
1 id_9 3.0 7 5 7 3 NaN
2 id_7 4.0 2 3 5 4 2.0
3 id_0 7.0 3 5 7 9 4.0
4 id_0 2.0 4 6 4 0 2.0
In [276]: df.dtypes
Out[276]:
id object
a float64
b int64
c int64
d int64
e int64
f float64
dtype: object
PS if you want to select all string
(object
) columns use the following simple trick:
cols = df.columns[df.dtypes.eq('object')]
Convert multiple columns to string in pandas dataframe
To convert multiple columns to string, include a list of columns to your above-mentioned command:
df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.
That means that one way to convert all columns is to construct the list of columns like this:
all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)
Note that the latter can also be done directly (see comments).
change multiple columns in pandas dataframe to datetime
You can use apply
to iterate through each column using pd.to_datetime
data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')
As part of the changes in pandas 1.3.0, iloc
/loc
will no longer update the column dtype on assignment. Use column labels directly instead:
cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')
How to convert multiple columns from string to integer in pandas dataframe?
Try with replace()
:
df_all['1981'] = df_all['1981'].replace(',','',regex=True)
Now try with astype()
method:
df_all['1981'] = df_all['1981'].astype('int64')
If you want to convert multiple columns then:
df[df.columns[2:]]=df[df.columns[2:]].replace(',','',regex=True).astype('int64')
changing data types of multiple columns at once in python/pandas
Another way would be to use astype
in a for loop.
cat_cols = [col for col in df.columns if col not in ['col1', 'col5']]
for col in cat_cols:
df[col] = df[col].astype('category')
How to convert multiple columns in one column in pandas?
Use melt
:
>>> df.melt(var_name='route', value_name='edge')
route edge
0 route1 19.0
1 route1 47.0
2 route1 56.0
3 route1 43.0
4 route2 51.0
5 route2 46.0
6 route2 37.0
7 route2 2.0
If you have some columns to protect, use id_vars=['col1', 'col2', ...]
to not flatten them.
How to convert datatype of multiple columns at once based on pattern
Iterating of pd.DataFrame.columns
returns str
, which does have endswiths
but not contains
, which is a method of pandas
str accessor:
import pandas as pd
df = pd.DataFrame(columns=['a', 'btime', 'timec', 'timedtime', 'e'])
for c in df.columns:
print(type(c), hasattr(c, 'endswith'), hasattr(c, 'contains'))
Output:
# type endswith contains
<class 'str'> True False
...
Also, df.filter(like='time').columns
:
Index(['btime', 'timec', 'timedtime'], dtype='object')
returns just as desired:
like : string
Keep labels from axis for which “like in label == True”.
Convert type of multiple columns of a dataframe at once
Edit See this related question for some simplifications and extensions on this basic idea.
My comment to Brandon's answer using switch
:
convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}
out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
For truly large data frames you may want to use lapply
instead of the for
loop:
convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}
When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...))
. Also, be aware of data.frame()
and as.data.frame()
s default behavior of converting character to factor.
Python Pandas - Changing some column types to categories
Sometimes, you just have to use a for-loop:
for col in ['parks', 'playgrounds', 'sports', 'roading']:
public[col] = public[col].astype('category')
Related Topics
Use of Ggplot() Within Another Function in R
Using Functions of Multiple Columns in a Dplyr Mutate_At Call
How Does Predict.Lm() Compute Confidence Interval and Prediction Interval
How to Install a Package That Has Been Archived from Cran
Invalid Multibyte String in Read.Csv
Remove Legend Entries For Some Factors Levels
Generate Sequence Within Group in R
Measuring Function Execution Time in R
Remove Columns from Dataframe Where All Values Are Na
Read All Files in a Folder and Apply a Function to Each Data Frame
How to Change the Default Library Path For R Packages
Removing Empty Rows of a Data File in R
Select Multiple Columns in Data.Table by Their Numeric Indices
Import Text File as Single Character String
Omit Rows Containing Specific Column of Na