r Remove parts of column name after certain characters
We can use sub
sub("_3.*", "", df1[,1])
#[1] "col1" "col2" "col3"
remove character for all column names in a data frame
If we need to remove only 'v' the one of more digits (\\d+
) at the end ($
) is not needed as the expected output also removes 'v' from first column 'q_ve5'
library(dplyr)
library(stringr)
df %>%
rename_with(~ str_remove(., "v"), everything())
-output
# A tibble: 2 × 5
q_e5 q_f_1 q_f_2 q_e6 q_e8
<int> <int> <int> <int> <int>
1 1 3 3 5 5
2 2 4 4 6 6
Or without any packages
names(df) <- sub("v", "", names(df))
removing numbers and characters from column names r
We may change the code to match one or more space (\\s+
) followed by the opening parentheses (\\(
, one or more digits (\\d+
) and other characters (.*
) and replace with blank (""
)
colnames(data) <- sub("\\s+\\(\\d+.*", "", colnames(data))
colnames(data)
[1] "Subject" "ASE" "ASD" "AFD"
Or another option is trimws
from base R
trimws(colnames(data), whitespace = "\\s+\\(.*")
[1] "Subject" "ASE" "ASD" "AFD"
In the OP's, code, it is matching an upper case letter followed by space and the (
is a metacharacter, which is not escaped. , thus in regex mode, it captures the digits (([0-9]+)
). But, this don't match the pattern in the column names, because after a space, there is a (
, which is not matched, thus it returns the same string
gsub("[A-Z] ([0-9]+)","",colnames(data))
[1] "Subject" "ASE (232)" "ASD (121)" "AFD (313)"
data
data <- structure(list(Subject = 1L, `ASE (232)` = "1.1.", `ASD (121)` = 1.2,
`AFD (313)` = 1.3), class = "data.frame", row.names = c(NA,
-1L))
How can I remove certain characters from column headers in R?
We can use sub
to match the .
(metacharacter - so escape) followed by one or more digits (\\d+
) at the end ($
) of the string and replace with blank (""
)
names(df) <- sub("\\.\\d+$", "", names(df))
NOTE: If the data is data.frame
, duplicate column names are not allowed and is not recommended
How to remove part of characters in data frame column
There are multiple ways of doing this:
- Using
as.numeric
on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)
- If you want it to be a
character
then you can usestringr
package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")
- There is another function called
str_remove
instringr
package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")
- You can also use
sub
from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)
But if you want to remove n
number of leading zeroes, replace +
with {n}
to remove them.
For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode)
.
How to remove '.' from column names in a dataframe?
1) sqldf can deal with names having dots in them if you quote the names:
library(sqldf)
d0 <- read.csv(text = "A.B,C.D\n1,2")
sqldf('select "A.B", "C.D" from d0')
giving:
A.B C.D
1 1 2
2) When reading the data using read.table
or read.csv
use the check.names=FALSE
argument.
Compare:
Lines <- "A B,C D
1,2
3,4"
read.csv(text = Lines)
## A.B C.D
## 1 1 2
## 2 3 4
read.csv(text = Lines, check.names = FALSE)
## A B C D
## 1 1 2
## 2 3 4
however, in this example it still leaves a name that would have to be quoted in sqldf since the names have embedded spaces.
3) To simply remove the periods, if DF
is a data frame:
names(DF) <- gsub(".", "", names(DF), fixed = TRUE)
or it might be nicer to convert the periods to underscores so that it is reversible:
names(DF) <- gsub(".", "_", names(DF), fixed = TRUE)
This last line could be alternatively done like this:
names(DF) <- chartr(".", "_", names(DF))
Remove specific characters from column names in r
Another option is to use strsplit
:
sapply(strsplit(strings, "\\."), function(x)
paste0(x[c(2, 4)], collapse = "."))
[1] "loc1.tret1" "loc2.tret2" "loc100.tret100"
Sample data
(From ManuelBickel's answer)
strings = c("drop.loc1.genom1.tret1.gwas2.a",
"drop.loc2.genom1.tret2.gwas2.a",
"drop.loc100.genom1.tret100.gwas2.a")
Removing characters in column titles after .
You can escape period like this \\.
:
x <- "ENSG00000124564.16"
sub("\\..*", "", x)
#[1] "ENSG00000124564"
update:
## if you have list of strings it works
x <- c("ENSG00000124564.16", "ENSG00000257509.1")
sub("\\..*", "", x)
# [1] "ENSG00000124564" "ENSG00000257509"
## if you want to try it to change the column names it works
df <- data.frame(ENSG00000124564.16 = c(1, 2, 3), ENSG00000257509.1 = c(1, 1, 1))
names(df) <- sub("\\..*", "", names(df))
# ENSG00000124564 ENSG00000257509
#1 1 1
#2 2 1
#3 3 1
Related Topics
Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar
Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse
Change the Class from Factor to Numeric of Many Columns in a Data Frame
Multiplying All Columns in Dataframe by Single Column
How to Find the Difference in Value in Every Two Consecutive Rows in R
Convert Dataframe Column to 1 or 0 for "True"/"False" Values and Assign to Dataframe
Duplicating Rows in R Merge Function
Append Data Frames Together in a for Loop
Loop Through Data Frame and Variable Names
Split an Audio File into Pieces of an Arbitrary Size
How to Keep Columns When Grouping/Summarizing
Using Ifelse Statement on the Whole Dataset Instead of a Single Column
Combing a Categorical Variable to Create a New Categorical Variable in R
Subtracting Two Columns to Give a New Column in R
Remove Unwanted Symbols from Expression Function - R
Conditional Replacement of a Comma With a Dot in a Numeric Column