Removing Whitespace from a Whole Data Frame in R

Removing Whitespace From a Whole Data Frame in R

If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:

 apply(myData,2,function(x)gsub('\\s+', '',x))

Hope this works.

This will return a matrix however, if you want to change it to data frame then do:

as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))

EDIT In 2020:

Using lapply and trimws function with both=TRUE can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.

DATA:

df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)

# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).

(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))

## situation: 1 (Using data.table, removing only leading and trailing blanks)

library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

Output from situation1:

    val val1 num num1
1:  abc  klm   1    2
2: kl m gdfs   2    3
3: dfsd  123   3    4

## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]

Output from situation2:

    val val1 num num1
1:  abc  klm   1    2
2:  klm gdfs   2    3
3: dfsd  123   3    4

Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).

I hope this helps , Thanks

Removing the white space at the start of each variable IN A LIST

You can do:

sapply(input, trimws)

Mind however that the result is a matrix, and trimws tunrs everything to character.

Removing white space from data frame in R

You could try

gsub( "\\[[^]]*\\]\\W*", "", "[N] Team Name")

Remove white space from a data frame column and add path

We can use gsub to remove the white space. We select one or more spaces (\\s+) and replace it with ''.

 df$names <- gsub('\\s+', '', df$names)
 df$names
 #[1] "stock1"       "stockstock12" "stock2"

Then, we use paste to join the strings together

  path <- "C:/Desktop/stock_files"
  df$names <- paste(path, df$names, sep="/")
  df$names
  #[1] "C:/Desktop/stock_files/stock1"       "C:/Desktop/stock_files/stockstock12"
  #[3] "C:/Desktop/stock_files/stock2"

Trim leading/trailing whitespaces from a data frame column where the column name comes as a variable

Use mutate_at

library(dplyr)
employ.data %>% mutate_at(abc, trimws)

#     employee salary  startdate
#1    John Doe  21000 2010-11-01
#2 Peter  Gynn  23400 2008-03-25
#3 Jolie  Hope  26800 2007-03-14

Or you can directly do, if you have only one column

employ.data[[abc]] <- trimws(employ.data[[abc]])

If there are multiple columns you can use lapply

employ.data[abc] <- lapply(employ.data[abc], trimws)

Unable to remove white space from data frame and hence was not able to find mean

Your strings contain a whitespace other than a regular ASCII space (decimal value 32). Thus, you need a regex that will match any Unicode whitespace. It is curious that a simple gsub("[[:space:]]*°C", "", newtemp) does not work in all R environments.

What usually works is a PCRE regex:

gsub("(*UCP)\\s*°C", "", newtemp, perl=TRUE)

Here, (*UCP) is a PCRE verb making the shorthand character classes Unicode-aware and \s can match any Unicode whitespaces. The perl=TRUE argument makes R use a PCRE regex engine rather than the default TRE regex engine.

How can I trim leading and trailing white space?

Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

If you want to clean strings afterwards you could use one of these functions:

# Returns string without leading white space
trim.leading <- function (x)  sub("^\\s+", "", x)

# Returns string without trailing white space
trim.trailing <- function (x) sub("\\s+$", "", x)

# Returns string without leading or trailing white space
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

To use one of these functions on myDummy$country:

 myDummy$country <- trim(myDummy$country)

To 'show' the white space you could use:

 paste(myDummy$country)

which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.

How to Trim all what space from a list of data frames R

Use purrr and its map function to iterate over the list of data frames, then map_df to iterate over the columns in each data frame, which will return the results as data_frames.

library(purrr)
ParsedFile %>% map(~map_df(., ~trimws(.)))

Types of Whitespace in R

You can use the tools::showNonASCII function to display non-ascii characters. Here's what I see:

> tools::showNonASCII(head(reps$District))
1: Alabama<c2><a0>1
2: Alabama<c2><a0>2
3: Alabama<c2><a0>3
4: Alabama<c2><a0>4
5: Alabama<c2><a0>5
6: Alabama<c2><a0>6

So these entries have the UTF-8 code C2 A0, which is a non-breaking space. You can convert it to a standard space using

reps$District <- sub("\ua0", " ", reps$District)

(UTF-8 C2 A0 is code point 00A0 according to http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=c2+a0&mode=bytes).

Your question title was "Types of Whitespace in R", which isn't really well defined. Different functions use different definitions. You'll have to read the documentation or source code to find out what the separate function thinks '\\s' means. Base R supports several regex styles; see ?regex.

Removing Whitespace from a Whole Data Frame in R