How to Use Gsub() on Each Element of a Data Frame

Can I use gsub() on each element of a data frame?

Well I think you could do it the following way, but I don't know if it is better or cleaner than yours :

df <- data.frame(tbl)
df[,-1] <- as.numeric(gsub("%", "", as.matrix(df[,-1])))

Which gives :

R> head(df)
Date Internet.Explorer Chrome Firefox Safari Opera Mobile
1 January 2013 30.71 36.52 21.42 8.29 1.19 14.13
2 December 2012 30.78 36.42 21.89 7.92 1.26 14.55
3 November 2012 31.23 35.72 22.37 7.83 1.39 13.08
4 October 2012 32.08 34.77 22.32 7.81 1.63 12.30
5 September 2012 32.70 34.21 22.40 7.70 1.61 12.03
6 August 2012 32.85 33.59 22.85 7.39 1.63 11.78
R> sapply(df, class)
Date Internet.Explorer Chrome Firefox
"factor" "numeric" "numeric" "numeric"
Safari Opera Mobile
"numeric" "numeric" "numeric"

gsub() on all values in a dataframe with multiple replacements

lapply returns a list you can assign it to dataframe with [] to keep the dimensions.

Land_Use[] <- lapply(Land_Use, function(y) gsub("native forest", "forest", y))

Here gsub will be applied to all the column in the dataframe.

For one column you need to assign the output back to column again instead of dataframe.

Land_Use$`1972` <- gsub('native forest','forest.',Land_Use$`1972`)

If you want to change multiple values into one value you may want to look at fct_collapse function from forcats.

library(dplyr)
library(forcats)

Land_Use %>%
mutate(across(.fns = ~fct_collapse(.x, 'Forest' = c('native forest', 'exotic forest'),
'water' = c('lake', 'river', 'ocean', 'pond')))) -> Land_Use

Land_Use

apply gsub over a certain column in a list of data frames

Solution with tidyverse

library(purrr)
library(dplyr)
library(stringr)

map(results1, ~.x[]%>%
mutate(names = str_replace_all(names,"\\.\\.", "")))

[[1]]
names coefficients
1 a15.pdf 1.27679608
2 a17.pdf 1.05090176
3 a18.pdf 1.51820192
4 a21.pdf 2.30296037
5 a2TTT.pdf 1.48568732
6 a5.pdf 0.49371310
7 B11.pdf 1.02705905
8 B12.pdf 0.99974736
9 B13.pdf 2.40828102
10 B22.pdf 0.69515213

Using gsub in list of dataframes with R

Maybe you can try something like the following:

lapply(rapply(lt, function(x) 
gsub("^-$", "", x), how = "list"),
as.data.frame)
# [[1]]
# name1 name2
# 1 nd:f 21-12-2001
# 2 nd:i name
# 3 nd:c
# 4 nd:g 15
# 5 b:rd
#
# [[2]]
# name1 name2
# 1 nd:i 11-01-2001
# 2 nd:c name
# 3 nd:g 3
# 4 nd:y
# 5 a:nd

It seems like although rapply can handle keeping the data as a list, the data.frame attribute is lost (hence the extra lapply(..., as.data.frame).

By using "^_$" as our pattern in gsub, we're saying to look for exactly that pattern. Dates won't be affected.


Perhaps a better option, though, is to convert those "-"s into NA. For this, you can try my makemeNA function from my "SOfun" package.

To use this approach you would simply do:

library(SOfun)
lapply(lt, makemeNA, "-")
# [[1]]
# name1 name2
# 1 nd:f 21-12-2001
# 2 nd:i name
# 3 nd:c <NA>
# 4 nd:g 15
# 5 b:rd <NA>
#
# [[2]]
# name1 name2
# 1 nd:i 11-01-2001
# 2 nd:c name
# 3 nd:g 3
# 4 nd:y <NA>
# 5 a:nd <NA>

Applying gsub to various columns

You can use apply to apply it to the whole data.frame

apply(x, 2, function(y) as.numeric(gsub("%", "", y)))
x1 x2 x3
[1,] 10 60 1
[2,] 20 50 2
[3,] 30 40 3

Removing some text string and characters from a column in dataframe in R

We can match the .(\\. - escaped as it is a metacharacter that matches any character) and one or more digits (\\d+) till the end ($) of the string and replace with blank ("") and wrap with gsub to match the backquote ("`") and remove it

df$Regression <- gsub("`", "", sub("\\.\\d+$", '', df$Regression))
df$Regression
[1] "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A"

Using gsub or sub function to only get part of a string?

Following may help you here too.

sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)

Output will be as follows.

> sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456b"

Where Input for data frame is as follows.

dat <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456b")
df <- data.frame(dat)

Explanation: Following is only for explanation purposes.

sub("      ##using sub for global subtitution function of R here.
([^:]*) ##By mentioning () we are keeping the matched values from vector's element into 1st place of memory(which we could use later), which is till next colon comes it will match everything.
: ##Mentioning letter colon(:) here.
([^:]*) ##By mentioning () making 2nd place in memory for matched values in vector's values which is till next colon comes it will match everything.
.*" ##Mentioning .* to match everything else now after 2nd colon comes in value.
,"\\1:\\2" ##Now mentioning the values of memory holds with whom we want to substitute the element values \\1 means 1st memory place \\2 is second memory place's value.
,df$dat) ##Mentioning df$dat dataframe's dat value.

How to replace '+' using gsub() function in R

Simply replace it with fixed = TRUE (no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name:

txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf

gives

          job
1 government
2 poli+tician
3 parliament

Now replace the "+":

txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf

The result is:

         job
1 government
2 politician
3 parliament

using gsub with a column on a dataframe

You will need to escape the . with either \\. or [.]. See ?regex. So the call becomes

sub("\\..*", "", dat$Dx1)

For example,

x <- c("F20.0", "F13.2", "F31.3", "F33.1")
sub("\\..*", "", x)
# [1] "F20" "F13" "F31" "F33"

We can use sub() instead of gsub() since we are always matching the first (and only) occurrence of ..



Related Topics



Leave a reply



Submit