handling special characters e.g. accents in R
You've read in a page encoded in UTF-8. if x
is your column of names, use Encoding(x) <- "UTF-8"
.
Replace accented characters in R with non-accented counterpart (UTF-8 encoding)
The below answers are basically taken from elsewhere. The key is getting your unwanted_array
in the right format. You might want it as a list
:
unwanted_array = list( 'Š'='S', 'š'='s', 'Ž'='Z', 'ž'='z', 'À'='A', 'Á'='A', 'Â'='A', 'Ã'='A', 'Ä'='A', 'Å'='A', 'Æ'='A', 'Ç'='C', 'È'='E', 'É'='E',
'Ê'='E', 'Ë'='E', 'Ì'='I', 'Í'='I', 'Î'='I', 'Ï'='I', 'Ñ'='N', 'Ò'='O', 'Ó'='O', 'Ô'='O', 'Õ'='O', 'Ö'='O', 'Ø'='O', 'Ù'='U',
'Ú'='U', 'Û'='U', 'Ü'='U', 'Ý'='Y', 'Þ'='B', 'ß'='Ss', 'à'='a', 'á'='a', 'â'='a', 'ã'='a', 'ä'='a', 'å'='a', 'æ'='a', 'ç'='c',
'è'='e', 'é'='e', 'ê'='e', 'ë'='e', 'ì'='i', 'í'='i', 'î'='i', 'ï'='i', 'ð'='o', 'ñ'='n', 'ò'='o', 'ó'='o', 'ô'='o', 'õ'='o',
'ö'='o', 'ø'='o', 'ù'='u', 'ú'='u', 'û'='u', 'ý'='y', 'ý'='y', 'þ'='b', 'ÿ'='y' )
You can do this easily with iconv
or chartr
:
> iconv(string, to='ASCII//TRANSLIT')
[1] "Holmer"
> chartr(paste(names(unwanted_array), collapse=''),
paste(unwanted_array, collapse=''),
string)
[1] "Holmer"
Otherwise you have to loop through all of replacements because mapply
or similar wouldn't account for symbols already replaced by previous gsub
operations.:
# the loop:
out <- string
for(i in seq_along(unwanted_array))
out <- gsub(names(unwanted_array)[i],unwanted_array[i],out)
The result:
> out
[1] "Holmer"
Handling xts objects that contain special characters
The object names with special characters can be backquoted. Also, if there are NA
values, specify the na.rm = TRUE
along with na.action = NULL
as aggregate
can remove the whole row if there is NA
in any of the columns
out <- aggregate(`CL=F`$`CL=F.Close`, list(format(index(`CL=F`), "%Y-%m")),
mean, na.rm = TRUE, na.action = NULL)
-output
> head(out)
2000-08 32.54571
2000-09 33.87100
2000-10 32.97318
2000-11 34.26450
2000-12 28.35500
2001-01 29.26667
> tail(out)
2021-02 59.06105
2021-03 62.35739
2021-04 61.70381
2021-05 65.15700
2021-06 71.35273
2021-07 72.43048
RStudio: keeping special characters in a script
You don't say what OS you're using, but this kind of thing really only happens on Windows nowadays, so I'll assume that.
The problem is that Windows has a local encoding that is not UTF-8. It is commonly something like Latin1 in English-speaking countries. I'm not sure what encoding people use in German-speaking countries, if that's where you are. From the junk you saw, it looks as though you saved the file in UTF-8, then read it using your local encoding. The encodings for writing and reading have to match if you want things to work.
In RStudio you can try "Reopen with encoding..." and specify UTF-8, and you'll probably get your original back, as long as you haven't saved it after the bad read. If you did that, you've got a much harder cleanup to do.
Related Topics
Creating a Pareto Chart with Ggplot2 and R
How to Change the Background Color of the Shiny Dashboard Body
Manipulating Multiple Files in R
How to Split the Main Title of a Plot in 2 or More Lines
How to Replicate a Ddply Behavior That Uses a Custom Function with Dplyr
Any Way to Pause at Specific Frames/Time Points with Transition_Reveal in Gganimate
How to Convert Utm Coordinates to Lat and Long in R
Difference Between As.Data.Frame(X) and Data.Frame(X)
Loop Over Rows of Dataframe Applying Function with If-Statement
Read Lines by Number from a Large File
HTML with Multicolumn Table in Markdown Using Knitr
Street Address to Geolocation Lat/Long
How to Know If R Is Running on 64 Bits Versus 32
R "Stats" Citation for a Scientific Paper
How to Plot the Results of a Mixed Model
Label Minimum and Maximum of Scale Fill Gradient Legend with Text: Ggplot2
R: Replace All Values in a Dataframe Lower Than a Threshold with Na
Why Is This Naive Matrix Multiplication Faster Than Base R'S