Efficiently transform multiple columns of a data frame
In the case of functions that will return a data.frame:
cols <- c("X.1","X.2")
df[cols] <- log(df[cols])
Otherwise you will need to use lapply
or a loop over the columns. These solutions will be slower than the solution above, so only use them if you must.
df[cols] <- lapply(df[cols], function(x) c(NA,diff(x)))
for(col in cols) {
df[col] <- c(NA,diff(df[col]))
}
Transform multiple columns the tidy way
We can use
df %>%
mutate_at(vars(matches("iq")), log)
One advantage with matches
is that it can take multiple patterns to be matched in a single call. For e.g., if we need to apply the function on columns that start (^
) with 'iq' or (|
) those end ($
) with 'oq', this can be passed into the single matches
df %>%
mutate_at(vars(matches('^iq|oq$'), log)
If the column names are completely different and there are n
patterns for the n
column, but if there is still some order in the position of columns, then the column position numbers can be passed into the vars
. In the current example, the 'iq' columns are the 1st and 2nd columns
df %>%
mutate_at(1:2, log)
Similarly, if the 20 columns occupy the 1st 20 positions
df %>%
mutate_at(1:20, log)
Or if the positions are 1 to 6, 8 to 12, 41:50
df %>%
mutate_at(vars(1:6, 8:12, 41:50), log)
Convert multiple columns to string in pandas dataframe
To convert multiple columns to string, include a list of columns to your above-mentioned command:
df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.
That means that one way to convert all columns is to construct the list of columns like this:
all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)
Note that the latter can also be done directly (see comments).
How to efficiently add multiple columns to pandas dataframe with values that depend on other columns
If you only have 50 conditions to check for it is probably better to iterate through the conditions and fill the cells in blocks rather than going through the whole frame row by row. By the way .assign() doesn't just accept lambda functions and the code can also be made a lot more readable than in my previous suggestion. Below is a modified version that also fills the extra columns in place. If this data frame had 10,000,000 rows and I only wanted to apply different operations to 10 groups of number ranges in column A this would be a very neat way of filling the extra columns.
import pandas as pd
import numpy as np
# Create data frame
rnd = np.random.randint(1, 10, 10)
rnd2 = np.random.randint(100, 1000, 10)
df = pd.DataFrame(
{'A': rnd, 'B': rnd2, 'C': np.nan, 'D': np.nan, 'E': np.nan })
# Define different ways of filling the extra cells
def f1():
return df['A'].mul(df['B'])
def f2():
return np.log10(df['A'])
def f3():
return df['B'] - df['A']
def f4():
return df['A'].div(df['B'])
def f5():
return np.sqrt(df['B'])
def f6():
return df['A'] + df['B']
# First assign() dependent on a boolean mask
df[df['A'] < 50] = df[df['A'] < 15].assign(C = f1(), D = f2(), E = f3())
# Second assign() dependent on a boolean mask
df[df['A'] >= 50] = df[df['A'] >= 50].assign(C = f4(), D = f5(), E = f6())
print(df)
A B C D E
0 4.0 845.0 3380.0 0.602060 841
1 3.0 967.0 2901.0 0.477121 964
2 3.0 468.0 1404.0 0.477121 465
3 2.0 548.0 1096.0 0.301030 546
4 3.0 393.0 1179.0 0.477121 390
5 7.0 741.0 5187.0 0.845098 734
6 1.0 269.0 269.0 0.000000 268
7 4.0 731.0 2924.0 0.602060 727
8 4.0 193.0 772.0 0.602060 189
9 3.0 306.0 918.0 0.477121 303
converting multiple columns from character to numeric format in r
You could try
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
# Check columns classes
sapply(DF, class)
# a b c
# "character" "character" "character"
cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)
# a b c
# "numeric" "numeric" "character"
Transforming a Column into Multiple Columns according to Their Values
Source DF:
In [204]: df
Out[204]:
Country
0 Italy
1 Indonesia
2 Canada
3 Italy
we can use pd.get_dummies():
In [205]: pd.get_dummies(df.Country)
Out[205]:
Canada Indonesia Italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
Or sklearn.feature_extraction.text.CountVectorizer:
In [211]: from sklearn.feature_extraction.text import CountVectorizer
In [212]: cv = CountVectorizer()
In [213]: r = pd.SparseDataFrame(cv.fit_transform(df.Country),
columns=cv.get_feature_names(),
index=df.index,
default_fill_value=0)
In [214]: r
Out[214]:
canada indonesia italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
Elegant and efficient way to replace values in multiple columns using pandas
You can clip
:
cols = ["los_24", "los_48", "in_24", "in_48"]
f[cols] = f[cols].clip(lower=1)
to get
person_id test_id los_24 los_48 in_24 in_48 test
0 101 123 1.00 1.0 21.000 11.3 A
1 101 123 1.00 1.0 24.000 202.0 B
2 101 124 1.00 1.0 1.000 1.0 C
3 201 321 1.01 1.0 2.300 1.0 D
4 201 321 2.00 11.0 1.000 41.0 E
5 201 321 1.00 2.0 23.000 47.0 F
6 203 456 2.00 3.0 1.001 2.0 G
Related Topics
Why Is Date Is Being Returned as Type 'Double'
Mgcv Gam() Error: Model Has More Coefficients Than Data
Plot Margins in Rmarkdown/Knitr
How to Use a Character as Attribute of a Function
Let Each Plot in Facet_Grid Have Its Own Y-Axis Value
Why Does ".." Work to Pass Column Names in a Character Vector Variable
How to Change the Size of the Strip on Facets in a Ggplot
Caret: There Were Missing Values in Resampled Performance Measures
Major and Minor Tickmarks with Plotly
Wrapping Base R Reshape for Ease-Of-Use
Data.Table Join and J-Expression Unexpected Behavior
Outputing N Tables in Shiny, Where N Depends on the Data
Can You More Clearly Explain Lazy Evaluation in R Function Operators
Directlabels: Avoid Clipping (Like Xpd=True)
Tidyr Separate Only First N Instances
Converting Between Matrix Subscripts and Linear Indices (Like Ind2Sub/Sub2Ind in Matlab)