R Remove Parts of Column Name After Certain Characters

r Remove parts of column name after certain characters

We can use sub

sub("_3.*", "", df1[,1])
#[1] "col1" "col2" "col3"

Removing characters in column titles after .

You can escape period like this \\.:

x <- "ENSG00000124564.16"
sub("\\..*", "", x)
#[1] "ENSG00000124564"

update:

## if you have list of strings it works
x <- c("ENSG00000124564.16", "ENSG00000257509.1")
sub("\\..*", "", x)
# [1] "ENSG00000124564" "ENSG00000257509"

## if you want to try it to change the column names it works
df <- data.frame(ENSG00000124564.16 = c(1, 2, 3), ENSG00000257509.1 = c(1, 1, 1))
names(df) <- sub("\\..*", "", names(df))
# ENSG00000124564 ENSG00000257509
#1 1 1
#2 2 1
#3 3 1

How can I remove certain characters from column headers in R?

We can use sub to match the . (metacharacter - so escape) followed by one or more digits (\\d+) at the end ($) of the string and replace with blank ("")

names(df) <- sub("\\.\\d+$", "", names(df))

NOTE: If the data is data.frame, duplicate column names are not allowed and is not recommended

Remove part of string from column names

We can use sub

names(df1)[-1] <- sub(".*\\.", "", names(df1)[-1])

If we need the . as well, replace with .

names(df1)[-1] <- sub(".*\\.", ".", names(df1)[-1])

To match the pattern exactly, we can also match zero or more characters that are not a do t([^.]*) from the start (^) of the string followed by a dot (\\. - escape the dot as it is a metacharacter implying any character) and replace it with blank ("")

sub("^[^.]*\\.", "", names(df1)[-1])
#[1] "Gr_1" "Gr_10" "Gr_11" "Gr_12" "Gr_13" "Gr_14" "Gr_15" "Gr_16"
#[9] "Gr_17" "Gr_18" "Gr_19" "Gr_2" "Gr_20" "Gr_21"

As it is already mentioned above 'ToRemove',

sub("ToRemove.", "", names(df1)[-1], fixed = TRUE)

Also, if we need to remove all characters including .

sub("\\..*", "", names(df1)[-1])

How to remove part of characters in data frame column

There are multiple ways of doing this:

  1. Using as.numeric on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)

  1. If you want it to be a character then you can use stringr package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")

  1. There is another function called str_remove in stringr package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")

  1. You can also use sub from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)

But if you want to remove n number of leading zeroes, replace + with {n} to remove them.

For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode).

Remove part of column name

We can use sub to remove the .v1 from the end of the string. (If we only need to remove 'v1', just remove the \\. from the pattern to match, but I think a . at the end of column name may not look that good). Here, we match the dot (\\.) followed by one of more characters that are not a dot ([^.]+) until the end of the string ($) and replace it with "".

colnames(df) <- sub('\\.[^.]+$', '', colnames(df))
colnames(df)
#[1] "a.b.c" "d.e.f" "h.j.k"

How to remove '.' from column names in a dataframe?

1) sqldf can deal with names having dots in them if you quote the names:

library(sqldf)
d0 <- read.csv(text = "A.B,C.D\n1,2")
sqldf('select "A.B", "C.D" from d0')

giving:

  A.B C.D
1 1 2

2) When reading the data using read.table or read.csv use the check.names=FALSE argument.

Compare:

Lines <- "A B,C D
1,2
3,4"
read.csv(text = Lines)
## A.B C.D
## 1 1 2
## 2 3 4
read.csv(text = Lines, check.names = FALSE)
## A B C D
## 1 1 2
## 2 3 4

however, in this example it still leaves a name that would have to be quoted in sqldf since the names have embedded spaces.

3) To simply remove the periods, if DF is a data frame:

names(DF) <- gsub(".", "", names(DF), fixed = TRUE)

or it might be nicer to convert the periods to underscores so that it is reversible:

names(DF) <- gsub(".", "_", names(DF), fixed = TRUE)

This last line could be alternatively done like this:

names(DF) <- chartr(".", "_", names(DF))

Changing the column name based on a partial string or substring

Put the dataframes in a list and use lapply/map to change name of every dataframe. list2env to transfer those changes from the list to individual dataframes.

library(dplyr)
library(purrr)

list_df <- lst(Apple, Mango, Banana, Potato, Tomato)

list_df <- map(list_df,
~.x %>% rename_with(~'Growth', matches('Growth Level Judgement')))

list2env(list_df, .GlobalEnv)

To run it on single dataframe you can do -

Apple %>% rename_with(~'Growth', matches('Growth Level Judgement')))

Or in base R -

names(Apple)[grep('Growth Level Judgement', names(Apple))] <- 'Growth'

remove . from end of column names in R

If your data is called df you can use regex to remove the last "." in the column names. Try :

names(df) <- sub('\\.$', '', names(df))


Related Topics



Leave a reply



Submit