Weird characters added to first column name after reading a toad-exported csv file
Try this:
d <- read.csv("test_file.csv", fileEncoding="UTF-8-BOM")
This works in R 3.0.0+ and removes the BOM if present in the file (common for files generated from Microsoft applications: Excel, SQL server)
Why is 'ï..' at the front of the first colname when imported a csv into r?
Please review previous answer here regarding weird characters when importing data from Excel: Weird characters added to first column name after reading a toad-exported csv file
unrecognized character in header of csv
Yes, that's a BOM, U+FEFF BYTE ORDER MARK
. OP's file is probably encoded UTF-8, but OP appears to be decoding it as CP-1252.
I say that because the three-byte sequence for a UTF-8-encoded BOM is \xEF\xBB\xBF
and appears as 
when (wrongly?) decoded as CP-1252^1:
Encoding | Representation (hexadecimal) | Representation (decimal) | Bytes as CP1252 characters |
---|---|---|---|
UTF-8 | EF BB BF | 239 187 191 |  |
What's an a-hat mean when importing a csv into r (and how do I get rid of it)?
Not sure if this is what you want:
data example:
df <- structure(list(V1 = c("", "Race3 and Hispanic Origin", "Whiteâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦",
" White, not Hispanicâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦",
"Blackâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦",
"Asianâ\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦â\200¦"
), V2 = c("", "", "245,985", "195,221", "41,962", "18,879"),
V3 = c("", "", "27,113", "17,263", "9,234", "1,908"), V4 = c("",
"", "547", "493", "388", "175"), V5 = c("", "", "11.0", "8.8",
"22.0", "10.1"), V6 = c("", "", "0.2", "0.3", "0.9", "0.9"
), V7 = c("", "", "247,272", "195,256", "42,474", "19,475"
), V8 = c("", "", "26,436", "16,993", "8,993", "1,953"),
V9 = c("", "", "714", "571", "373", "190"), V10 = c("", "",
"10.7", "8.7", "21.2", "10.0"), V11 = c("", "", "0.3", "0.3",
"0.9", "1.0"), V12 = c("", "", "-677", "-270", "-241", "45"
), V13 = c("", "", "*-0.3", "-0.1", "-0.8", "-0.1")), row.names = c(NA,
6L), class = "data.frame")
remove the character:
df[] <- lapply(df, gsub, pattern='a€¦', replacement='')
results:
df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1
2 Race3 and Hispanic Origin
3 White 245,985 27,113 547 11.0 0.2 247,272 26,436 714 10.7 0.3 -677 *-0.3
4 White, not Hispanic 195,221 17,263 493 8.8 0.3 195,256 16,993 571 8.7 0.3 -270 -0.1
5 Black 41,962 9,234 388 22.0 0.9 42,474 8,993 373 21.2 0.9 -241 -0.8
6 Asian 18,879 1,908 175 10.1 0.9 19,475 1,953 190 10.0 1.0 45 -0.1
Need help seperating column names in imported csv file without manually doing it in excel
data.table's fread
reads this accurately :
url <- "http://barttorvik.com/2021_team_results.csv"
data <- data.table::fread(url, quote = '')
You may want to clean up the column names though because quotes are present in the last 2 column names.
colnames(data) <- gsub('"', '', names(data))
Sqldf in R - error with first column names
So I figured it out by reading through the above comments.
I'm on a Windows 10 machine using Excel for Office 365. The special characters will go away by changing how I saved the file from a "CSV UTF-8 (Comma Delimited)" to just "CSV (Comma delimited)".
PSQL to CSV with column alias leads to corrupted file
After testing various other configurations of the query, I found the issue. Apparently Excel interprets a file starting with "ID" as some SYLK format instead of CSV... Renaming the column alias to e.g. "MyID" fixed the issue.
Reference here: annalear.ca/2010/06/10/why-excel-thinks-your-csv-is-a-sylk
Related Topics
Passing Parameters to R Markdown
How to Change the Now Deprecated Dplyr::Funs() Which Includes an Ifelse Argument
Index Unique Values in Data.Table
Deleting Specific Rows from a Data Frame
Twitter Data Analysis - Error in Term Document Matrix
In R, Getting the Following Error: "Attempt to Replicate an Object of Type 'Closure'"
Shapes and Linetypes in Ggplot
Given Value of Matrix, Getting It's Coordinate
Adding Prefix or Suffix to Most Data.Frame Variable Names in Piped R Workflow
How to One-Hot-Encode Factor Variables with Data.Table
How to Add a Condition to the Geom_Point Size
How to Calculate Adjacency Matrices in R
Calling a User-Defined R Function from C++ Using Rcpp
Dictionary() Is Not Supported Anymore in Tm Package. How to Emend Code