How to split but ignore separators in quoted strings, in python?
Most of the answers seem massively over complicated. You don't need back references. You don't need to depend on whether or not re.findall gives overlapping matches. Given that the input cannot be parsed with the csv module so a regular expression is pretty well the only way to go, all you need is to call re.split with a pattern that matches a field.
Note that it is much easier here to match a field than it is to match a separator:
import re
data = """part 1;"this is ; part 2;";'this is ; part 3';part 4;this "is ; part" 5"""
PATTERN = re.compile(r'''((?:[^;"']|"[^"]*"|'[^']*')+)''')
print PATTERN.split(data)[1::2]
and the output is:
['part 1', '"this is ; part 2;"', "'this is ; part 3'", 'part 4', 'this "is ; part" 5']
As Jean-Luc Nacif Coelho correctly points out this won't handle empty groups correctly. Depending on the situation that may or may not matter. If it does matter it may be possible to handle it by, for example, replacing ';;'
with ';<marker>;'
where <marker>
would have to be some string (without semicolons) that you know does not appear in the data before the split. Also you need to restore the data after:
>>> marker = ";!$%^&;"
>>> [r.replace(marker[1:-1],'') for r in PATTERN.split("aaa;;aaa;'b;;b'".replace(';;', marker))[1::2]]
['aaa', '', 'aaa', "'b;;b'"]
However this is a kludge. Any better suggestions?
How to split but ignore separators in quoted and braced strings, in python?
you can use regular expressions like so:
import re
str = '''1;2;"3;4"; [5;6];7'''
matcher = re.compile(r'''(\".+?\"|\[.+?\]|\(.+?\)|\{.+?\}|[^\"[({]+?)(?:;|$)''')
print(matcher.findall(str)) # returns ['1', '2', '"3;4"', '[5;6]', '7']
This regex supports bracketing with ", [, (, { and the delimiter ;
Python, split a string at commas, except within quotes, ignoring whitespace
You can use the regular expression
".+?"|[\w-]+
This will match double-quotes, followed by any characters, until the next double-quote is found - OR, it will match word characters (no commas nor quotes).
https://regex101.com/r/IThYf7/1
import re
s = 'abc,def, ghi, "jkl, mno, pqr","stu"'
for r in re.findall(r'".+?"|[\w-]+', s):
print(r)
If you want to get rid of the "
s around the quoted sections, the best I could figure out by using the regex
module (so that \K
was usable) was:
(?:^"?|, ?"?)\K(?:(?<=").+?(?=")|[\w-]+)
https://regex101.com/r/IThYf7/3
Split on comma not enclosed by quotes
You could try the below code,
>>> import re
>>> string = '"first, element", second element, third element, "fourth, element", fifth element'
>>> m = re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', string)
>>> m
['"first, element"', 'second element', 'third element, "fourth, element"', 'fifth element']
Regex stolen from here :-)
Split a string by spaces -- preserving quoted substrings -- in Python
You want split
, from the built-in shlex
module.
>>> import shlex
>>> shlex.split('this is "a test"')
['this', 'is', 'a test']
This should do exactly what you want.
If you want to preserve the quotation marks, then you can pass the posix=False
kwarg.
>>> shlex.split('this is "a test"', posix=False)
['this', 'is', '"a test"']
split string on commas but ignore commas with in single quotes and create a dictionary after string split in python
Try this regular expression ,(?=(?:[^']*\'[^']*\')*[^']*$)
for splitting:
import re
re.split(",(?=(?:[^']*\'[^']*\')*[^']*$)",s)
# ["someVariable1='9'",
# "someVariable2='some , value, comma,present'",
# "somevariable5='N/A'",
# "someVariable6='some text,comma,= present,'"]
- This uses look ahead syntax
(?=...)
to find out specific comma to split; - The look up pattern is
(?:[^']*\'[^']*\')*[^']*$
$
matches the end of string and optionally matches non'
characters[^']*
- Use non-captured group
(?:..)
to define a double quote pattern[^']*\'[^']*\'
which could appear behind the comma that can acts as a delimiter.
This assumes the quotes are always paired.
To convert the above to a dictionary, you can split each sub expression by =
:
lst = re.split(",(?=(?:[^']*\'[^']*\')*[^']*$)",s)
dict_ = {k: v for exp in lst for k, v in [re.split("=(?=\')", exp)]}
dict_
# {'someVariable1': "'9'",
# 'someVariable2': "'some , value, comma,present'",
# 'someVariable6': "'some text,comma,= present,'",
# 'somevariable5': "'N/A'"}
dict_.get('someVariable2')
# "'some , value, comma,present'"
Related Topics
Passing Multiple Arguments from Django Template Href Link to View
How to Remove Name and Dtype from Pandas Output
Pandas Counting and Summing Specific Conditions
How to Save a Pandas Dataframe Table as a Png
How to Find 3 Immediate Words After Keyword Match Using Python
How to Install a Module for All Users With Pip on Linux
How to Convert Strings With Billion or Million Abbreviation into Integers in a List
How to Check If a String Column in Pyspark Dataframe Is All Numeric
Selecting Specific Rows and Columns from Numpy Array
Finding the Index of the First Occurrence of Any Item in a List
Find Value in Dictionary Using Regex in Python
Make a Batch File Run a Python Code With Arguments
Django: Check Whether an Object Already Exists Before Adding
Programme to Print Mulitples of 5 in a Range Specified by User
How to Run Two Python Scripts Simultaneously from a Master Script