Convert whole dataframe from lower case to upper case with Pandas
astype() will cast each series to the dtype object (string) and then call the str() method on the converted series to get the string literally and call the function upper() on it. Note that after this, the dtype of all columns changes to object.
In [17]: df
Out[17]:
regiment company deaths battles size
0 Nighthawks 1st kkk 5 l
1 Nighthawks 1st 52 42 ll
2 Nighthawks 2nd 25 2 l
3 Nighthawks 2nd 616 2 m
In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
You can later convert the 'battles' column to numeric again, using to_numeric():
In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())
In [43]: df2['battles'] = pd.to_numeric(df2['battles'])
In [44]: df2
Out[44]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
In [45]: df2.dtypes
Out[45]:
regiment object
company object
deaths object
battles int64
size object
dtype: object
Changing from upper to lower case in several data frames
Since you wish to keep all of your data frames in the global environment, this is a situation in which I would prefer using a for
loop. This allows you to operate within the global environment (lapply
requires that you return something to the global environment).
dfList <- c("df1", "df2", "df3", "df4")
for (i in dfList){
tmp <- get(i)
assign(i, setNames(tmp, tolower(names(tmp))))
}
How do I set column names to lower case for multiple dataframes?
The following should work:
dfList <- lapply(lapply(dfs,get),function(x) {colnames(x) <- tolower(colnames(x));x})
Problems like this generally stem from the fact that you haven't placed all your data frames in a single data structure, and then are forced to use something awkward, like get
.
Not that in my code, I use lapply
and get
to actually create a single list of data frames first, and then alter their colnames.
You should also be aware that your lowercols function is rather un-R like. R functions generally aren't called in such a way that they return nothing, but have side effects. If you try to write functions that way (which is possible) you will probably make your life difficult and have scoping issues. Note that in my second lapply
I explicitly return the modified data frame.
Convert from lowercase to uppercase all values in all character variables in dataframe
Starting with the following sample data :
df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)
v1 v2 v3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 e 5 n
You can use :
data.frame(lapply(df, function(v) {
if (is.character(v)) return(toupper(v))
else return(v)
}))
Which gives :
v1 v2 v3
1 A 1 J
2 B 2 K
3 C 3 L
4 D 4 M
5 E 5 N
How can I make pandas dataframe column headers all lowercase?
You can do it like this:
data.columns = map(str.lower, data.columns)
or
data.columns = [x.lower() for x in data.columns]
example:
>>> data = pd.DataFrame({'A':range(3), 'B':range(3,0,-1), 'C':list('abc')})
>>> data
A B C
0 0 3 a
1 1 2 b
2 2 1 c
>>> data.columns = map(str.lower, data.columns)
>>> data
a b c
0 0 3 a
1 1 2 b
2 2 1 c
Convert column values to lower case only if they are string
The test in your lambda function isn't quite right, you weren't far from the truth though:
df.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)
With the data frame and output:
>>> df = pd.DataFrame(
[
{'OS': 'Microsoft Windows', 'Count': 3},
{'OS': 'Mac OS X', 'Count': 4},
{'OS': 'Linux', 'Count': 234},
{'OS': 'Dont have a preference', 'Count': 0},
{'OS': 'I prefer Windows and Unix', 'Count': 3},
{'OS': 'Unix', 'Count': 2},
{'OS': 'VMS', 'Count': 1},
{'OS': 'DOS or ZX Spectrum', 'Count': 2},
]
)
>>> df = df.apply(lambda x: x.str.lower() if x.dtype=='object' else x)
>>> print(df)
OS Count
0 microsoft windows 3
1 mac os x 4
2 linux 234
3 dont have a preference 0
4 i prefer windows and unix 3
5 unix 2
6 vms 1
7 dos or zx spectrum 2
How to lowercase a pandas dataframe string column if it has missing values?
use pandas vectorized string methods; as in the documentation:
these methods exclude missing/NA values automatically
.str.lower()
is the very first example there;
>>> df['x'].str.lower()
0 one
1 two
2 NaN
Name: x, dtype: object
Related Topics
How to Get Rowsums for Selected Columns in R
Change the Class from Factor to Numeric of Many Columns in a Data Frame
Saving Output of Confusionmatrix as a .Csv Table
Dynamically Select Data Frame Columns Using $ and a Character Value
Finding All Duplicate Rows, Including "Elements With Smaller Subscripts"
Collapse Text by Group in Data Frame
Error in ≪My Code≫: Object of Type 'Closure' Is Not Subsettable
How to Convert Variable With Mixed Date Formats to One Format
R Collapse Multiple Rows into 1 Row - Same Columns
Find All Combinations of a Set of Numbers That Add Up to a Certain Total
Multiplying All Columns in Dataframe by Single Column
How to Combine Multiple Variable Data to a Single Variable Data
Grouping Functions (Tapply, By, Aggregate) and the *Apply Family
Understanding Exactly When a Data.Table Is a Reference to (Vs a Copy Of) Another Data.Table