C#, regular expressions : how to parse comma-separated values, where some values might be quoted strings themselves containing commas
Try with this Regex:
"[^"\r\n]*"|'[^'\r\n]*'|[^,\r\n]*
Regex regexObj = new Regex(@"""[^""\r\n]*""|'[^'\r\n]*'|[^,\r\n]*");
Match matchResults = regexObj.Match(input);
while (matchResults.Success)
{
Console.WriteLine(matchResults.Value);
matchResults = matchResults.NextMatch();
}
Ouputs:
- cat
- dog
- "0 = OFF, 1 = ON"
- lion
- tiger
- 'R = red, G = green, B = blue'
- bear
Note: This regex solution will work for your case, however I recommend you to use a specialized library like FileHelpers.
C# Regex Split - commas outside quotes
You could split on all commas, that do have an even number of quotes following them , using the following Regex to find them:
",(?=(?:[^']*'[^']*')*[^']*$)"
You'd use it like
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
C# split comma separated values
It's because of the capture group. Just turn it into a non-capture group:
",(?=(?:[^""]*""[^""]*"")*[^""]*$)"
^^
The capture group is including the captured part in your results.
ideone demo
var regexObj = new Regex(@",(?=(?:[^""]*""[^""]*"")*[^""]*$)");
regexObj.Split(input).Select(s => s.Trim('\"', ' ')).ForEach(Console.WriteLine);
And just trim the results.
C# text split logic comma separator and string identifier
This is the situation where TextFieldParser
in the Microsoft.VisualBasic.FileIO
library is best fit.
using Microsoft.VisualBasic.FileIO; //add this
static void Main(string[] args)
{
string text = System.IO.File.ReadAllText(@"D://dtl.txt"); //note this
List<string[]> param = new List<string[]>();
string[] words; //add intermediary reference
using (TextFieldParser parser = new TextFieldParser(new StringReader(text))) {
parser.Delimiters = new string[] { "," }; //the parameter must be comma
parser.HasFieldsEnclosedInQuotes = true;
while ((words = parser.ReadFields()) != null)
param.Add(words);
}
var x = param; // for debug
}
And you shall get what you need. Read this.
Output:
array :
[0] : "AWD_CODE","AWD_NAME","AWD_TYPE","ADF_REF","FLG_SUM","FLG"
[1] : "DMM","PETCH","01","REF 2/2015","",""
[2] : "TRR","TUCTH","01","REF 2/2015","WD_TRK","F"
[3] : "TGC","DHYTH","02","REF 3/2015","WD_TRK,WD_TRI","F"
To use it, you need to include Microsoft.VisualBasic
in your reference.
Using Regular Expressions for Pattern Finding with Replace
Personally, I'd avoid regexes here - assuming that there aren't nested quote marks, this is quite simple to write up as a for-loop, which I think will be more efficient:
var inQuotes = false;
var sb = new StringBuilder(someText.Length);
for (var i = 0; i < someText.Length; ++i)
{
if (someText[i] == '"')
{
inQuotes = !inQuotes;
}
if (inQuotes && someText[i] == ',')
{
sb.Append('$');
}
else
{
sb.Append(someText[i]);
}
}
Related Topics
How to Create a Wpf Usercontrol with Named Content
Why Is Httpcontext.Current Null After Await
Passing a Value from One Form to Another Form
When Is a Custom Attribute's Constructor Run
Setting the User-Agent Header for a Webclient Request
Most Elegant Xml Serialization of Color Structure
Are Lambda Expressions in C# Closures
Parallel.Foreach Slower Than Foreach
Kill Process Tree Programmatically in C#
Is There a Good Way to Convert Between Bitmapsource and Bitmap
Property(With No Extra Processing) VS Public Field
How to Make Two Transparent Layer with C#
How to Pass an Object from Form1 to Form2 and Back to Form1
Why Is .Contains Slow? Most Efficient Way to Get Multiple Entities by Primary Key