Include Comma in a Field Value of CSV (Delimeter as Comma) in Java

Is there a way to include commas in CSV columns without breaking the formatting?

Enclose the field in quotes, e.g.

field1_value,field2_value,"field 3,value",field4, etc...

See wikipedia.

Updated:

To encode a quote, use ", one double quote symbol in a field will be encoded as "", and the whole field will become """". So if you see the following in e.g. Excel:

---------------------------------------
| regular_value |,,,"| ,"", |""" |"|
---------------------------------------

the CSV file will contain:

regular_value,",,,""",","""",","""""""",""""

A comma is simply encapsulated using quotes, so , becomes ",".

A comma and quote needs to be encapsulated and quoted, so "," becomes """,""".

read csv file with a delimiter specified in the value itself

You cannot just simply use split to parse CSV file. You can use a library (e.g. SuperCSV or OpenCSV) or you will have to follow all the CSV format rules. For instance, here you should ignore the delimiter (i.e. comma) between the double quotes (You can use regex for this).

Dealing with commas in a CSV file

As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.

By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.

You can use it as follows:

using System;
public class test
{
public static void Main()
{
using ( CsvReader reader = new CsvReader( "data.csv" ) )
{
foreach( string[] values in reader.RowEnumerator )
{
Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
}
}
Console.ReadLine();
}
}

Here are the classes. Note that you can use the Csv.Escape function to write valid CSV as well.

using System.IO;
using System.Text.RegularExpressions;

public sealed class CsvReader : System.IDisposable
{
public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
{
}

public CsvReader( Stream stream )
{
__reader = new StreamReader( stream );
}

public System.Collections.IEnumerable RowEnumerator
{
get {
if ( null == __reader )
throw new System.ApplicationException( "I can't start reading without CSV input." );

__rowno = 0;
string sLine;
string sNextLine;

while ( null != ( sLine = __reader.ReadLine() ) )
{
while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
sLine += "\n" + sNextLine;

__rowno++;
string[] values = rexCsvSplitter.Split( sLine );

for ( int i = 0; i < values.Length; i++ )
values[i] = Csv.Unescape( values[i] );

yield return values;
}

__reader.Close();
}
}

public long RowIndex { get { return __rowno; } }

public void Dispose()
{
if ( null != __reader ) __reader.Dispose();
}

//============================================


private long __rowno = 0;
private TextReader __reader;
private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
private static Regex rexRunOnLine = new Regex( @"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}

public static class Csv
{
public static string Escape( string s )
{
if ( s.Contains( QUOTE ) )
s = s.Replace( QUOTE, ESCAPED_QUOTE );

if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
s = QUOTE + s + QUOTE;

return s;
}

public static string Unescape( string s )
{
if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
{
s = s.Substring( 1, s.Length - 2 );

if ( s.Contains( ESCAPED_QUOTE ) )
s = s.Replace( ESCAPED_QUOTE, QUOTE );
}

return s;
}


private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
}

Apache commons CSV | How can I ignore/include semicolon, comma in a field?

You can configure TAB as delimiter instead of using DEFAULT delimiter -

CSVPrinter printer = new CSVPrinter(writer, CSVFormat.TDF.withHeader(HEADERS));

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#TDF

How should I escape commas and speech marks in CSV files so they work in Excel?

We eventually found the answer to this.

Excel will only respect the escaping of commas and speech marks if the column value is NOT preceded by a space. So generating the file without spaces like this...

Reference,Title,Description
1,"My little title","My description, which may contain ""speech marks"" and commas."
2,"My other little title","My other description, which may also contain ""speech marks"" and commas."

... fixed the problem. Hope this helps someone!



Related Topics



Leave a reply



Submit