Replace Characters in Column Names Gsub

Replace characters in column names gsub

Rich Scriven had the answer:

Define

colClean <- function(x){ colnames(x) <- gsub("\\.\\.+", ".", colnames(x)); x }

and then do

a <- colClean(a)

to update a

Replacing entire column name with sub or gsub

Here's a tidyverse solution using rename_with and the selection helper, starts_with.

library(tidyverse)

wide_data <- data.frame(
  `NominalYieldCurve.SpotRate(Govt,10,5)` = 1,
  CreditModel.SpotSpread123 = 1
)

wide_data %>%
  rename_with(~"Spot_Rate", .cols = starts_with("NominalYieldCurve.SpotRate")) %>%
  rename_with(~"Spot_Spread", .cols = starts_with("CreditModel.SpotSpread"))
#>   Spot_Rate Spot_Spread
#> 1         1           1

removing numbers and characters from column names r

We may change the code to match one or more space (\\s+) followed by the opening parentheses (\\(, one or more digits (\\d+) and other characters (.*) and replace with blank ("")

colnames(data) <- sub("\\s+\\(\\d+.*", "", colnames(data))
colnames(data)
[1] "Subject" "ASE"     "ASD"     "AFD"

Or another option is trimws from base R

trimws(colnames(data), whitespace = "\\s+\\(.*")
[1] "Subject" "ASE"     "ASD"     "AFD"

In the OP's, code, it is matching an upper case letter followed by space and the ( is a metacharacter, which is not escaped. , thus in regex mode, it captures the digits (([0-9]+)). But, this don't match the pattern in the column names, because after a space, there is a (, which is not matched, thus it returns the same string

gsub("[A-Z] ([0-9]+)","",colnames(data))
[1] "Subject"   "ASE (232)" "ASD (121)" "AFD (313)"

data

data <- structure(list(Subject = 1L, `ASE (232)` = "1.1.", `ASD (121)` = 1.2, 
    `AFD (313)` = 1.3), class = "data.frame", row.names = c(NA, 
-1L))

How to apply gsub or similar to change column names but only if column name contain specific word

You can use

colnames(a) <- sub(".*CSF-([^._]*).*", "\\1", colnames(a))

See the regex demo. Details:

.* - any zero or more chars as many as possible
CSF- - CSF- text
([^._]*) - capturing group 1 (\1 refers to the group value from the replacement pattern): any zero or more chars other than . and _
.* - the rest of the string.

How to edit all column names to replace a certain character in R?

str_replace is vectorized. So, there is no need to loop over the column names. Also, the output of lapply is a list and not a vector along with the fact that there is no lambda call in lapply (If we are passing named arguments, there is no need for ()

library(stringr)
names(exampleData) <- str_replace(names(exampleData), "-", "_")

Or use clean_names from janitor

library(janitor)
exampleData <- exampleData %>%
            clean_names()

How can I remove certain characters from column headers in R?

We can use sub to match the . (metacharacter - so escape) followed by one or more digits (\\d+) at the end ($) of the string and replace with blank ("")

names(df) <- sub("\\.\\d+$", "", names(df))

NOTE: If the data is data.frame, duplicate column names are not allowed and is not recommended

gsub not working on colnames?

Note that df[, -1] gets you all rows and columns except the first column (see this reference). In order to modify the column names you should use colnames(df).

To replace the first literal space with a dot, use

colnames(df) <- sub(" ", ".", colnames(df), fixed=TRUE)

If there can be more than one whitespace, use a regex:

colnames(df) <- sub("\\s+", ".", colnames(df))

If you need to remove all whitespaces sequences with a single dot in the column names, use gsub:

colnames(df) <- gsub("\\s+", ".", colnames(df))

gsub() not working if I reference a column using a character vector?

gsub is being given a vector of strings, and it does what it knows: works on the strings. It doesn't know that they should be an indirect reference. (Nothing will know that it should be indirect.)

You have two options:

The canonical way in data.table for this is likely to use .SDcols.

preferences[, (cols) := lapply(.SD, gsub, pattern = "UN1", replacement = "A"), .SDcols = cols]
preferences
#                                      Pref_1
#                                      <char>
#  1:                                       A
#  2: Food and Agriculture Organization (F...
#  3: United Nations Educational, Scientif...
#  4: United Nations Development Programme...
#  5:      Commission on Narcotic Drugs (CND)
#  6:      Commission on Narcotic Drugs (CND)
#  7:              Human Rights Council (HRC)
#  8:                                       A
#  9:              Human Rights Council (HRC)
# 10:                                       A

This does two things: (i) the use of .SDcols for iterating over a dynamic set of columns is preferred and faster, and allows programmatic determination of those columns (what you need); (ii) using lapply allows you to do this to one or more columns. If you know you'll always do just one column, this still works well with very little overhead.

You can get/mget the data. This is the way to tell something to grab the contents of a variable identified in a string vector.
If you know that you will always have exactly one column, then you can use get:
```
preferences[, (cols) := gsub(get(cols), pattern = "UN1", replacement = "A")]
```
If there is even a chance that you'll have more than one, I strongly recommend mget. (Even if you think you'll always have one, this is still safe.)
```
preferences[, (cols) := lapply(mget(cols), gsub, pattern = "UN1", replacement = "A")]
```

Data

preferences <- setDT(structure(list(Pref_1 = c("UN1", "Food and Agriculture Organization (FAO)", "United Nations Educational, Scientific and Cultural Organization (UNESCO)", "United Nations Development Programme (UNDP)", "Commission on Narcotic Drugs (CND)", "Commission on Narcotic Drugs (CND)", "Human Rights Council (HRC)", "UN1", "Human Rights Council (HRC)", "UN1")), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
cols <- "Pref_1"

How to replace 2nd character of a string in a column in r

A naïve approach:

df[, 2] <- paste0(substr(df[, 2], 1, 1), df[, 1], substr(df[, 2], 3, 3))
df
#       Type    Subtype  
# [1,] "[C>A]" "A[C>A]A"
# [2,] "[C>G]" "A[C>G]T"
# [3,] "[C>T]" "A[C>T]C"

Replace Characters in Column Names Gsub