C++ CSV Line with Commas and Strings Within Double Quotes

C++ CSV line with commas and strings within double quotes

You need to interpret the comma depending on whether you're betwwen the quote or not. This is too complexfor getline().

The solution would be to read the full line with getline(), and parse the line by iterating through the string character by character, and maintaing an indicator whether you're between double quotes or not.

Here is a first "raw" example (double quotes are not removed in the fields and escape characters are not interpreted):

string line; 
while (std::getline(cin, line)) { // read full line
const char *mystart=line.c_str(); // prepare to parse the line - start is position of begin of field
bool instring{false};
for (const char* p=mystart; *p; p++) { // iterate through the string
if (*p=='"') // toggle flag if we're btw double quote
instring = !instring;
else if (*p==',' && !instring) { // if comma OUTSIDE double quote
csvColumn.push_back(string(mystart,p-mystart)); // keep the field
mystart=p+1; // and start parsing next one
}
}
csvColumn.push_back(string(mystart)); // last field delimited by end of line instead of comma
}

Online demo

How do I parse a CSV with commas embedded in quoted fields?

This is far from a complete CSV parser and could be made more efficient, but it does the job, parses your file correctly and deals with double quotes as well.

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>

int main()
{
std::string line;
std::vector<std::vector<std::string>> lines;
std::ifstream file("/Users/darla/Desktop/Programs/seals.csv");

if (file)
{
while (std::getline(file, line))
{
size_t n = lines.size();
lines.resize(n + 1);

std::istringstream ss(line);
std::string field, push_field("");
bool no_quotes = true;

while (std::getline(ss, field, ','))
{
if (static_cast<size_t>(std::count(field.begin(), field.end(), '"')) % 2 != 0)
{
no_quotes = !no_quotes;
}

push_field += field + (no_quotes ? "" : ",");

if (no_quotes)
{
lines[n].push_back(push_field);
push_field.clear();
}
}
}
}

for (auto line : lines)
{
for (auto field : line)
{
std::cout << "| " << field << " |";
}

std::cout << std::endl << std::endl;
}

return 0;
}

Sample Image

An explanation. The program reads file lines and tries to parse each line by fields, separated by commas, then stores the results in vector of vectors. If a field with double quotes encountered and double quotes are at odd number, this means it is an open field so more fields are read in until closing field is found, then the complete filed is stored. If field contains even number of double quotes or none, it is stored straight away. Hope this helps.

Dealing with commas in a CSV file

As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.

By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.

You can use it as follows:

using System;
public class test
{
public static void Main()
{
using ( CsvReader reader = new CsvReader( "data.csv" ) )
{
foreach( string[] values in reader.RowEnumerator )
{
Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
}
}
Console.ReadLine();
}
}

Here are the classes. Note that you can use the Csv.Escape function to write valid CSV as well.

using System.IO;
using System.Text.RegularExpressions;

public sealed class CsvReader : System.IDisposable
{
public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
{
}

public CsvReader( Stream stream )
{
__reader = new StreamReader( stream );
}

public System.Collections.IEnumerable RowEnumerator
{
get {
if ( null == __reader )
throw new System.ApplicationException( "I can't start reading without CSV input." );

__rowno = 0;
string sLine;
string sNextLine;

while ( null != ( sLine = __reader.ReadLine() ) )
{
while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
sLine += "\n" + sNextLine;

__rowno++;
string[] values = rexCsvSplitter.Split( sLine );

for ( int i = 0; i < values.Length; i++ )
values[i] = Csv.Unescape( values[i] );

yield return values;
}

__reader.Close();
}
}

public long RowIndex { get { return __rowno; } }

public void Dispose()
{
if ( null != __reader ) __reader.Dispose();
}

//============================================

private long __rowno = 0;
private TextReader __reader;
private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
private static Regex rexRunOnLine = new Regex( @"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}

public static class Csv
{
public static string Escape( string s )
{
if ( s.Contains( QUOTE ) )
s = s.Replace( QUOTE, ESCAPED_QUOTE );

if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
s = QUOTE + s + QUOTE;

return s;
}

public static string Unescape( string s )
{
if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
{
s = s.Substring( 1, s.Length - 2 );

if ( s.Contains( ESCAPED_QUOTE ) )
s = s.Replace( ESCAPED_QUOTE, QUOTE );
}

return s;
}

private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
}

Ignore Comma between double quotes while reading CSV file

You can fix this by replacing the Split function with the regex split function

Table.Rows.Add(row.Split(','));

Should be replaced with

Table.Rows.Add(Regex.Split(row, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"));

And add the assembly at the top

using System.Text.RegularExpressions;

This will fix your problem

Parse csv with quotes and commas

I found the solution on another post. All I gotta do is add 2 attributes to read_csv: pd.read_csv('dataset.csv', escapechar='\\', encoding='utf-8'). It's working fine now.

How to split csv whose columns may contain comma

Use the Microsoft.VisualBasic.FileIO.TextFieldParser class. This will handle parsing a delimited file, TextReader or Stream where some fields are enclosed in quotes and some are not.

For example:

using Microsoft.VisualBasic.FileIO;

string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";

TextFieldParser parser = new TextFieldParser(new StringReader(csv));

// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");

parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");

string[] fields;

while (!parser.EndOfData)
{
fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}

parser.Close();

This should result in the following output:


2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34

See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.

You need to add a reference to Microsoft.VisualBasic in the Add References .NET tab.



Related Topics



Leave a reply



Submit