How to check if CSV file has a comma or a semicolon as separator?
Here are a few approaches assuming that the only difference among the format of the files is whether the separator is semicolon and the decimal is a comma or the separator is a comma and the decimal is a point.
1) fread As mentioned in the comments fread
in data.table package will automatically detect the separator for common separators and then read the file in using the separator it detected. This can also handle certain other changes in format such as automatically detecting whether the file has a header.
2) grepl Look at the first line and see if it has a comma or semicolon and then re-read the file:
L <- readLines("myfile", n = 1)
if (grepl(";", L)) read.csv2("myfile") else read.csv("myfile")
3) count.fields We can assume semicolon and then count the fields in the first line. If there is one field then it is comma separated and if not then it is semicolon separated.
L <- readLines("myfile", n = 1)
numfields <- count.fields(textConnection(L), sep = ";")
if (numfields == 1) read.csv("myfile") else read.csv2("myfile")
Update Added (3) and made improvements to all three.
Python - How can I check if a CSV file has a comma or a semicolon as a separator?
Say that you would like to read an arbitrary CSV, named input.csv
, and you do not know whether the separator is a comma or a semicolon.
You could open your file using the csv
module. The Sniffer
class is then used to deduce its format, like in the following code:
import csv
with open(input.csv, newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read())
For this module, the dialect
class is a container class whose attributes contain information for how to handle delimiters (among other things like doublequotes, whitespaces, etc). You can check the delimiter
attribute by using the following code:
print(dialect.delimiter)
# This will be either a comma or a semicolon, depending on what the input is
Therefore, in order to do a smart CSV reading, you could use something like the following:
if dialect.delimiter == ',':
df = pd.read_csv(input.csv) # Import the csv with a comma as the separator
elif dialect.delimiter == ';':
df = pd.read_csv(input.csv, sep=';') # Import the csv with a semicolon as the separator
More information can be found here.
CSV with comma or semicolon?
In Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.
Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not using the above mentioned separator. All applications that use Windows regional settings will have this behavior.
If you are writing a program for Windows that will require importing the CSV in other applications and you know that the list separator set for your target machines is ,
, then go for it, otherwise I prefer ;
since it causes less problems with decimal points, digit grouping and does not appear in much text.
Data Importation in R but change the delimiter from comma to semicolon
Try without the delim=";"
also avoid +
before the delim=";"
Example:
library(readr)
example <- read_delim("example.csv", ";",
escape_double = FALSE, trim_ws = TRUE)
Read CSV file with semicolon as delimiter
If you're using semicolons (;
) as your csv-file separator instead of commas (,
), you can adjust that first line:
wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)
The problem with your list comprehension is that [x.split(';') for x in wine_data_]
iterates over the column names.
That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.
wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))
Related Topics
How to Read CSV File in R Where Some Values Contain the Percent Symbol (%)
Counting the Frequency of an Element in a Data Frame
Scraping with Rvest - Complete with Nas When Tag Is Not Present
Adding New Column with Diff() Function When There Is One Less Row in R
Collapse Continuous Integer Runs to Strings of Ranges
Set One or More of Coefficients to a Specific Integer
How to Change the Color in Geom_Point or Lines in Ggplot
Count How Many Values in Some Cells of a Row Are Not Na (In R)
How to Check If CSV File Has a Comma or a Semicolon as Separator
How to Make a Discontinuous Axis in R with Ggplot2
How to Change Type of Target Column When Doing := by Group in a Data.Table in R
Reasons That Ggplot2 Legend Does Not Appear
How to Find Out Which Package Version Is Loaded in R