Convert from Lowercase to Uppercase All Values in All Character Variables in Dataframe

Convert from lowercase to uppercase all values in all character variables in dataframe

Starting with the following sample data :

df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)

v1 v2 v3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 e 5 n

You can use :

data.frame(lapply(df, function(v) {
if (is.character(v)) return(toupper(v))
else return(v)
}))

Which gives :

  v1 v2 v3
1 A 1 J
2 B 2 K
3 C 3 L
4 D 4 M
5 E 5 N

Convert lowercase to uppercase in a mixed dataframe (character, vector, integer) while preserving data types in R?

To build on Jingxin's response and address Thomas Moore's follow-up question, you can change the column names to upper with the following addition:

df %>% 
mutate(across(where(is.character), str_to_upper),
across(where(is.factor), ~ factor(str_to_upper(.x)))) %>%
rename_with(str_to_upper)

Convert whole dataframe from lower case to upper case with Pandas

astype() will cast each series to the dtype object (string) and then call the str() method on the converted series to get the string literally and call the function upper() on it. Note that after this, the dtype of all columns changes to object.

In [17]: df
Out[17]:
regiment company deaths battles size
0 Nighthawks 1st kkk 5 l
1 Nighthawks 1st 52 42 ll
2 Nighthawks 2nd 25 2 l
3 Nighthawks 2nd 616 2 m

In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M

You can later convert the 'battles' column to numeric again, using to_numeric():

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())

In [43]: df2['battles'] = pd.to_numeric(df2['battles'])

In [44]: df2
Out[44]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M

In [45]: df2.dtypes
Out[45]:
regiment object
company object
deaths object
battles int64
size object
dtype: object

`stringr` to convert first letter only to uppercase in dataframe

Here is a base R solution using gsub:

words$word <- gsub("\\b([a-z])", "\\U\\1", words$word, perl=TRUE)

This will replace the first lowercase letter of every word with its uppercase version. Note that the \b word boundary will match a lowercase preceded by either whitespace or the start of the column's value.

How to change a column in pandas dataframe to uppercase when it already has some uppercase values?

You have a mixed column of both string and boolean values (and maybe some other things too), and its dtype is almost surely 'object' - you should check, and please confirm.

Solution: You can (and should) specify the dtype of a problematic column when you read it in, also specify ALL the true and false values, at read-time:

pd.read_csv(..., dtype={'use_ab': bool}),
true_values=['TRUE','True',True], false_values=['FALSE','False',False])

Note in particular that string 'False' and bool False are not the same thing! and trying to use .str does not convert the bools

Re: df.dtypes. The dtype of your column doesn't seem to be string, but it doesn't seem to be to boolean either, since the string accessor .str.upper() is throwing away most of your 'False' values, as value_counts() proves.

Also, since your series obviously has NaNs and you need to count they're not being mishandled, use .value_counts(..., dropna=False) to include them.

import pandas as pd
import numpy as np

df = pd.Series(['True',np.nan,'FALSE','TRUE',np.nan,'False',False,True,True])

# Now note that the dtype is automatically assigned to pandas 'object'!
>>> df.dtype
dtype('O')

>>> df.value_counts(dropna=False)
True 2
NaN 2
FALSE 1
TRUE 1
True 1
False 1
False 1
dtype: int64

See how mistakenly trying to use .str.upper() accessor on this mixed column is trashing those values that are actually bools, while case-transforming the strings:

>>> df.str.upper()
0 TRUE
1 NaN
2 FALSE
3 TRUE
4 NaN
5 FALSE
6 NaN <-- bool True coerced to NaN!
7 NaN <-- bool False coerced to NaN!
8 NaN <-- bool False coerced to NaN!
dtype: object

Converting the characters of strings in a single column into lowercase and getting the updated dataframe with all columns

The tolower function should do the trick like so:

df <- df %>% 
mutate(colB = tolower(colB)

Convert multiple dataframes in lower case

It looks like you're putting your DataFrames into a dictionary; that definitely helps.

But you have to assign the result of the .apply() operation to something.

As it is it's not being saved anywhere.

Try instead (with df renamed to be more clear):

df_dict = {}
for i, f in enumerate(files):
df_dict[str(i)] = pd.read_csv(f)
df_dict[str(i)] = df_dict[str(i)].apply(lambda x: x.astype(str).str.lower())


Related Topics



Leave a reply



Submit