Replace characters in column names gsub
Rich Scriven had the answer:
Define
colClean <- function(x){ colnames(x) <- gsub("\\.\\.+", ".", colnames(x)); x }
and then do
a <- colClean(a)
to update a
Replacing entire column name with sub or gsub
Here's a tidyverse
solution using rename_with
and the selection helper, starts_with
.
library(tidyverse)
wide_data <- data.frame(
`NominalYieldCurve.SpotRate(Govt,10,5)` = 1,
CreditModel.SpotSpread123 = 1
)
wide_data %>%
rename_with(~"Spot_Rate", .cols = starts_with("NominalYieldCurve.SpotRate")) %>%
rename_with(~"Spot_Spread", .cols = starts_with("CreditModel.SpotSpread"))
#> Spot_Rate Spot_Spread
#> 1 1 1
removing numbers and characters from column names r
We may change the code to match one or more space (\\s+
) followed by the opening parentheses (\\(
, one or more digits (\\d+
) and other characters (.*
) and replace with blank (""
)
colnames(data) <- sub("\\s+\\(\\d+.*", "", colnames(data))
colnames(data)
[1] "Subject" "ASE" "ASD" "AFD"
Or another option is trimws
from base R
trimws(colnames(data), whitespace = "\\s+\\(.*")
[1] "Subject" "ASE" "ASD" "AFD"
In the OP's, code, it is matching an upper case letter followed by space and the (
is a metacharacter, which is not escaped. , thus in regex mode, it captures the digits (([0-9]+)
). But, this don't match the pattern in the column names, because after a space, there is a (
, which is not matched, thus it returns the same string
gsub("[A-Z] ([0-9]+)","",colnames(data))
[1] "Subject" "ASE (232)" "ASD (121)" "AFD (313)"
data
data <- structure(list(Subject = 1L, `ASE (232)` = "1.1.", `ASD (121)` = 1.2,
`AFD (313)` = 1.3), class = "data.frame", row.names = c(NA,
-1L))
How to apply gsub or similar to change column names but only if column name contain specific word
You can use
colnames(a) <- sub(".*CSF-([^._]*).*", "\\1", colnames(a))
See the regex demo. Details:
.*
- any zero or more chars as many as possibleCSF-
-CSF-
text([^._]*)
- capturing group 1 (\1
refers to the group value from the replacement pattern): any zero or more chars other than.
and_
.*
- the rest of the string.
How to edit all column names to replace a certain character in R?
str_replace
is vectorized. So, there is no need to loop over the column names. Also, the output of lapply
is a list
and not a vector
along with the fact that there is no lambda
call in lapply
(If we are passing named arguments, there is no need for ()
library(stringr)
names(exampleData) <- str_replace(names(exampleData), "-", "_")
Or use clean_names
from janitor
library(janitor)
exampleData <- exampleData %>%
clean_names()
How can I remove certain characters from column headers in R?
We can use sub
to match the .
(metacharacter - so escape) followed by one or more digits (\\d+
) at the end ($
) of the string and replace with blank (""
)
names(df) <- sub("\\.\\d+$", "", names(df))
NOTE: If the data is data.frame
, duplicate column names are not allowed and is not recommended
gsub not working on colnames?
Note that df[, -1]
gets you all rows and columns except the first column (see this reference). In order to modify the column names you should use colnames(df)
.
To replace the first literal space with a dot, use
colnames(df) <- sub(" ", ".", colnames(df), fixed=TRUE)
If there can be more than one whitespace, use a regex:
colnames(df) <- sub("\\s+", ".", colnames(df))
If you need to remove all whitespaces sequences with a single dot in the column names, use gsub
:
colnames(df) <- gsub("\\s+", ".", colnames(df))
gsub() not working if I reference a column using a character vector?
gsub
is being given a vector of strings, and it does what it knows: works on the strings. It doesn't know that they should be an indirect reference. (Nothing will know that it should be indirect.)
You have two options:
The canonical way in
data.table
for this is likely to use.SDcols
.preferences[, (cols) := lapply(.SD, gsub, pattern = "UN1", replacement = "A"), .SDcols = cols]
preferences
# Pref_1
# <char>
# 1: A
# 2: Food and Agriculture Organization (F...
# 3: United Nations Educational, Scientif...
# 4: United Nations Development Programme...
# 5: Commission on Narcotic Drugs (CND)
# 6: Commission on Narcotic Drugs (CND)
# 7: Human Rights Council (HRC)
# 8: A
# 9: Human Rights Council (HRC)
# 10: AThis does two things: (i) the use of
.SDcols
for iterating over a dynamic set of columns is preferred and faster, and allows programmatic determination of those columns (what you need); (ii) usinglapply
allows you to do this to one or more columns. If you know you'll always do just one column, this still works well with very little overhead.You can
get
/mget
the data. This is the way to tell something to grab the contents of a variable identified in a string vector.If you know that you will always have exactly one column, then you can use
get
:preferences[, (cols) := gsub(get(cols), pattern = "UN1", replacement = "A")]
If there is even a chance that you'll have more than one, I strongly recommend
mget
. (Even if you think you'll always have one, this is still safe.)preferences[, (cols) := lapply(mget(cols), gsub, pattern = "UN1", replacement = "A")]
Data
preferences <- setDT(structure(list(Pref_1 = c("UN1", "Food and Agriculture Organization (FAO)", "United Nations Educational, Scientific and Cultural Organization (UNESCO)", "United Nations Development Programme (UNDP)", "Commission on Narcotic Drugs (CND)", "Commission on Narcotic Drugs (CND)", "Human Rights Council (HRC)", "UN1", "Human Rights Council (HRC)", "UN1")), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
cols <- "Pref_1"
How to replace 2nd character of a string in a column in r
A naïve approach:
df[, 2] <- paste0(substr(df[, 2], 1, 1), df[, 1], substr(df[, 2], 3, 3))
df
# Type Subtype
# [1,] "[C>A]" "A[C>A]A"
# [2,] "[C>G]" "A[C>G]T"
# [3,] "[C>T]" "A[C>T]C"
Related Topics
Correlation Corrplot Configuration
R Shiny Error: Cannot Coerce Type 'Closure' to Vector of Type 'Double'
Convert Column in Data.Frame to Date
Ctree() - How to Get the List of Splitting Conditions for Each Terminal Node
Plot Mean and Sd of Dataset Per X Value Using Ggplot2
Shiny: How to Adjust the Width of the Tabsetpanel
Dynamically Add Function to R6 Class Instance
How to Create Design Matrix in R
Adding Text to Ggplot Geom_Jitter Points That Match a Condition
How to Plot the Relative Proportions of Two Groups Using a Fill Aesthetic in Ggplot2
R: Why Does Read.Table Stop Reading a File
Add Textbox to Facet Wrapped Layout in Ggplot2
Choosing Eps and Minpts for Dbscan (R)
Multiple Lines for Text Per Legend Label in Ggplot2