How to Check If CSV File Has a Comma or a Semicolon as Separator

How to check if CSV file has a comma or a semicolon as separator?

Here are a few approaches assuming that the only difference among the format of the files is whether the separator is semicolon and the decimal is a comma or the separator is a comma and the decimal is a point.

1) fread As mentioned in the comments fread in data.table package will automatically detect the separator for common separators and then read the file in using the separator it detected. This can also handle certain other changes in format such as automatically detecting whether the file has a header.

2) grepl Look at the first line and see if it has a comma or semicolon and then re-read the file:

L <- readLines("myfile", n = 1)
if (grepl(";", L)) read.csv2("myfile") else read.csv("myfile")

3) count.fields We can assume semicolon and then count the fields in the first line. If there is one field then it is comma separated and if not then it is semicolon separated.

L <- readLines("myfile", n = 1)
numfields <- count.fields(textConnection(L), sep = ";")
if (numfields == 1) read.csv("myfile") else read.csv2("myfile")

Update Added (3) and made improvements to all three.

Python - How can I check if a CSV file has a comma or a semicolon as a separator?

Say that you would like to read an arbitrary CSV, named input.csv, and you do not know whether the separator is a comma or a semicolon.

You could open your file using the csv module. The Sniffer class is then used to deduce its format, like in the following code:

import csv
with open(input.csv, newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read())

For this module, the dialect class is a container class whose attributes contain information for how to handle delimiters (among other things like doublequotes, whitespaces, etc). You can check the delimiter attribute by using the following code:

print(dialect.delimiter)
# This will be either a comma or a semicolon, depending on what the input is

Therefore, in order to do a smart CSV reading, you could use something like the following:

if dialect.delimiter == ',':
df = pd.read_csv(input.csv) # Import the csv with a comma as the separator
elif dialect.delimiter == ';':
df = pd.read_csv(input.csv, sep=';') # Import the csv with a semicolon as the separator

More information can be found here.

CSV with comma or semicolon?

In Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.

Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not using the above mentioned separator. All applications that use Windows regional settings will have this behavior.

If you are writing a program for Windows that will require importing the CSV in other applications and you know that the list separator set for your target machines is ,, then go for it, otherwise I prefer ; since it causes less problems with decimal points, digit grouping and does not appear in much text.

Data Importation in R but change the delimiter from comma to semicolon

Try without the delim=";" also avoid + before the delim=";"

Example:

library(readr)

example <- read_delim("example.csv", ";",
escape_double = FALSE, trim_ws = TRUE)

Read CSV file with semicolon as delimiter

If you're using semicolons (;) as your csv-file separator instead of commas (,), you can adjust that first line:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)

The problem with your list comprehension is that [x.split(';') for x in wine_data_] iterates over the column names.

That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))


Related Topics



Leave a reply



Submit