r Remove parts of column name after certain characters
We can use sub
sub("_3.*", "", df1[,1])
#[1] "col1" "col2" "col3"
Removing characters in column titles after .
You can escape period like this \\.
:
x <- "ENSG00000124564.16"
sub("\\..*", "", x)
#[1] "ENSG00000124564"
update:
## if you have list of strings it works
x <- c("ENSG00000124564.16", "ENSG00000257509.1")
sub("\\..*", "", x)
# [1] "ENSG00000124564" "ENSG00000257509"
## if you want to try it to change the column names it works
df <- data.frame(ENSG00000124564.16 = c(1, 2, 3), ENSG00000257509.1 = c(1, 1, 1))
names(df) <- sub("\\..*", "", names(df))
# ENSG00000124564 ENSG00000257509
#1 1 1
#2 2 1
#3 3 1
How can I remove certain characters from column headers in R?
We can use sub
to match the .
(metacharacter - so escape) followed by one or more digits (\\d+
) at the end ($
) of the string and replace with blank (""
)
names(df) <- sub("\\.\\d+$", "", names(df))
NOTE: If the data is data.frame
, duplicate column names are not allowed and is not recommended
Remove part of string from column names
We can use sub
names(df1)[-1] <- sub(".*\\.", "", names(df1)[-1])
If we need the .
as well, replace with .
names(df1)[-1] <- sub(".*\\.", ".", names(df1)[-1])
To match the pattern exactly, we can also match zero or more characters that are not a do t([^.]*
) from the start (^
) of the string followed by a dot (\\.
- escape the dot as it is a metacharacter implying any character) and replace it with blank (""
)
sub("^[^.]*\\.", "", names(df1)[-1])
#[1] "Gr_1" "Gr_10" "Gr_11" "Gr_12" "Gr_13" "Gr_14" "Gr_15" "Gr_16"
#[9] "Gr_17" "Gr_18" "Gr_19" "Gr_2" "Gr_20" "Gr_21"
As it is already mentioned above 'ToRemove',
sub("ToRemove.", "", names(df1)[-1], fixed = TRUE)
Also, if we need to remove all characters including .
sub("\\..*", "", names(df1)[-1])
How to remove part of characters in data frame column
There are multiple ways of doing this:
- Using
as.numeric
on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)
- If you want it to be a
character
then you can usestringr
package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")
- There is another function called
str_remove
instringr
package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")
- You can also use
sub
from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)
But if you want to remove n
number of leading zeroes, replace +
with {n}
to remove them.
For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode)
.
Remove part of column name
We can use sub
to remove the .v1
from the end of the string. (If we only need to remove 'v1', just remove the \\.
from the pattern to match, but I think a .
at the end of column name may not look that good). Here, we match the dot (\\.
) followed by one of more characters that are not a dot ([^.]+
) until the end of the string ($
) and replace it with ""
.
colnames(df) <- sub('\\.[^.]+$', '', colnames(df))
colnames(df)
#[1] "a.b.c" "d.e.f" "h.j.k"
How to remove '.' from column names in a dataframe?
1) sqldf can deal with names having dots in them if you quote the names:
library(sqldf)
d0 <- read.csv(text = "A.B,C.D\n1,2")
sqldf('select "A.B", "C.D" from d0')
giving:
A.B C.D
1 1 2
2) When reading the data using read.table
or read.csv
use the check.names=FALSE
argument.
Compare:
Lines <- "A B,C D
1,2
3,4"
read.csv(text = Lines)
## A.B C.D
## 1 1 2
## 2 3 4
read.csv(text = Lines, check.names = FALSE)
## A B C D
## 1 1 2
## 2 3 4
however, in this example it still leaves a name that would have to be quoted in sqldf since the names have embedded spaces.
3) To simply remove the periods, if DF
is a data frame:
names(DF) <- gsub(".", "", names(DF), fixed = TRUE)
or it might be nicer to convert the periods to underscores so that it is reversible:
names(DF) <- gsub(".", "_", names(DF), fixed = TRUE)
This last line could be alternatively done like this:
names(DF) <- chartr(".", "_", names(DF))
Changing the column name based on a partial string or substring
Put the dataframes in a list and use lapply
/map
to change name of every dataframe. list2env
to transfer those changes from the list to individual dataframes.
library(dplyr)
library(purrr)
list_df <- lst(Apple, Mango, Banana, Potato, Tomato)
list_df <- map(list_df,
~.x %>% rename_with(~'Growth', matches('Growth Level Judgement')))
list2env(list_df, .GlobalEnv)
To run it on single dataframe you can do -
Apple %>% rename_with(~'Growth', matches('Growth Level Judgement')))
Or in base R -
names(Apple)[grep('Growth Level Judgement', names(Apple))] <- 'Growth'
remove . from end of column names in R
If your data is called df
you can use regex to remove the last "." in the column names. Try :
names(df) <- sub('\\.$', '', names(df))
Related Topics
How to Convert Entire Dataframe to Numeric While Preserving Decimals
How to Specify "Does Not Contain" in Dplyr Filter
Change the Color and Font of Text in Shiny App
What Is the Correct Way to Ask for User Input in an R Program
Replace Accented Characters in R with Non-Accented Counterpart (Utf-8 Encoding)
How to Fill Nas with Locf by Factors in Data Frame, Split by Country
Functions Available for Tufte Boxplots in R
How to Change a Single Value in a Data.Frame
Replace Na with 0 in a Data Frame Column
How to Adjust Facet Size Manually
Plotting a Curve Around a Set of Points
Data.Table Alternative for Dplyr Case_When
Creating Professional Looking Powerpoints in R
Optimal/Efficient Plotting of Survival/Regression Analysis Results
R - How to Find Points Within Specific Contour
Producing a Boxplot in Ggplot2 Using Summary Statistics