How to Find Out If CSV File Fields Are Tab Delimited or Comma Delimited

How to determine the delimiter in CSV file

univocity-parsers supports automatic detection of the delimiter (also line endings and quotes). Just use it instead of fighting with your code:

CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically();

CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(new File("/path/to/your.csv"));

// if you want to see what it detected
CsvFormat format = parser.getDetectedFormat();

Disclaimer: I'm the author of this library and I made sure all sorts of corner cases are covered. It's open source and free (Apache 2.0 license)

Hope this helps.

How to determine a file is tab delimited in PowerShell?

Another approach would be to use Select-String to check for tab character and set delimiter.

if(Get-Content $csvfile -First 1 | Select-String -Pattern "`t")
{
$delim = "`t"
}
else
{
$delim = ','
}

Import-Csv $csvfile -Delimiter $delim

How to check if CSV file has a comma or a semicolon as separator?

Here are a few approaches assuming that the only difference among the format of the files is whether the separator is semicolon and the decimal is a comma or the separator is a comma and the decimal is a point.

1) fread As mentioned in the comments fread in data.table package will automatically detect the separator for common separators and then read the file in using the separator it detected. This can also handle certain other changes in format such as automatically detecting whether the file has a header.

2) grepl Look at the first line and see if it has a comma or semicolon and then re-read the file:

L <- readLines("myfile", n = 1)
if (grepl(";", L)) read.csv2("myfile") else read.csv("myfile")

3) count.fields We can assume semicolon and then count the fields in the first line. If there is one field then it is comma separated and if not then it is semicolon separated.

L <- readLines("myfile", n = 1)
numfields <- count.fields(textConnection(L), sep = ";")
if (numfields == 1) read.csv("myfile") else read.csv2("myfile")

Update Added (3) and made improvements to all three.

How should I detect which delimiter is used in a text file?

You could show them the results in preview window - similar to the way Excel does it. It's pretty clear when the wrong delimiter is being used in that case. You could then allow them to select a range of delimiters and have the preview update in real time.

Then you could just make a simple guess as to the delimiter to start with (e.g. does a comma or a tab come first).



Related Topics



Leave a reply



Submit