Weird Characters Added to First Column Name After Reading a Toad-Exported CSV File

Weird characters added to first column name after reading a toad-exported csv file

Try this:

d <- read.csv("test_file.csv", fileEncoding="UTF-8-BOM")

This works in R 3.0.0+ and removes the BOM if present in the file (common for files generated from Microsoft applications: Excel, SQL server)

Why is 'ï..' at the front of the first colname when imported a csv into r?

Please review previous answer here regarding weird characters when importing data from Excel: Weird characters added to first column name after reading a toad-exported csv file

unrecognized character in header of csv

Yes, that's a BOM, U+FEFF BYTE ORDER MARK. OP's file is probably encoded UTF-8, but OP appears to be decoding it as CP-1252.

I say that because the three-byte sequence for a UTF-8-encoded BOM is \xEF\xBB\xBF and appears as ï»¿ when (wrongly?) decoded as CP-1252^1:

Encoding	Representation (hexadecimal)	Representation (decimal)	Bytes as CP1252 characters
UTF-8	`EF BB BF`	`239 187 191`	`ï»¿`

What's an a-hat mean when importing a csv into r (and how do I get rid of it)?

Not sure if this is what you want:

data example:

df <- structure(list(V1 = c("", "Race3 and Hispanic Origin", "Whiteâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦", 
                            "   White, not Hispanicâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦", 
                            "Blackâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦", 
                            "Asianâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦"
), V2 = c("", "", "245,985", "195,221", "41,962", "18,879"), 
V3 = c("", "", "27,113", "17,263", "9,234", "1,908"), V4 = c("", 
                                                             "", "547", "493", "388", "175"), V5 = c("", "", "11.0", "8.8", 
                                                                                                     "22.0", "10.1"), V6 = c("", "", "0.2", "0.3", "0.9", "0.9"
                                                                                                     ), V7 = c("", "", "247,272", "195,256", "42,474", "19,475"
                                                                                                     ), V8 = c("", "", "26,436", "16,993", "8,993", "1,953"), 
V9 = c("", "", "714", "571", "373", "190"), V10 = c("", "", 
                                                    "10.7", "8.7", "21.2", "10.0"), V11 = c("", "", "0.3", "0.3", 
                                                                                            "0.9", "1.0"), V12 = c("", "", "-677", "-270", "-241", "45"
                                                                                            ), V13 = c("", "", "*-0.3", "-0.1", "-0.8", "-0.1")), row.names = c(NA, 
                                                                                                                                                                6L), class = "data.frame")

remove the character:

df[] <- lapply(df, gsub, pattern='a€¦', replacement='')

results:

df
                         V1      V2     V3  V4   V5  V6      V7     V8  V9  V10 V11  V12   V13
1                                                                                             
2 Race3 and Hispanic Origin                                                                   
3                     White 245,985 27,113 547 11.0 0.2 247,272 26,436 714 10.7 0.3 -677 *-0.3
4       White, not Hispanic 195,221 17,263 493  8.8 0.3 195,256 16,993 571  8.7 0.3 -270  -0.1
5                     Black  41,962  9,234 388 22.0 0.9  42,474  8,993 373 21.2 0.9 -241  -0.8
6                     Asian  18,879  1,908 175 10.1 0.9  19,475  1,953 190 10.0 1.0   45  -0.1

Need help seperating column names in imported csv file without manually doing it in excel

data.table's fread reads this accurately :

url <- "http://barttorvik.com/2021_team_results.csv"
data <- data.table::fread(url, quote = '')

You may want to clean up the column names though because quotes are present in the last 2 column names.

colnames(data) <- gsub('"', '', names(data))

Sqldf in R - error with first column names

So I figured it out by reading through the above comments.

I'm on a Windows 10 machine using Excel for Office 365. The special characters will go away by changing how I saved the file from a "CSV UTF-8 (Comma Delimited)" to just "CSV (Comma delimited)".

PSQL to CSV with column alias leads to corrupted file

After testing various other configurations of the query, I found the issue. Apparently Excel interprets a file starting with "ID" as some SYLK format instead of CSV... Renaming the column alias to e.g. "MyID" fixed the issue.

Reference here: annalear.ca/2010/06/10/why-excel-thinks-your-csv-is-a-sylk

Weird Characters Added to First Column Name After Reading a Toad-Exported CSV File