Parse a Tuple from a String

Parse a tuple from a string?

It already exists!

>>> from ast import literal_eval as make_tuple
>>> make_tuple("(1,2,3,4,5)")
(1, 2, 3, 4, 5)

Be aware of the corner-case, though:

>>> make_tuple("(1)")
1
>>> make_tuple("(1,)")
(1,)

If your input format works different than Python here, you need to handle that case separately or use another method like tuple(int(x) for x in tup_string[1:-1].split(',')).

Convert a string to a tuple

You could use the literal_eval of the ast module:

ast.literal_eval(node_or_string)

Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

Example:

>>> import ast
>>> ast.literal_eval("(255, 0, 0)")
(255, 0, 0)
>>>

Regarding pygame, note that the Color class can also take the name of a color as string:

>>> import pygame
>>> pygame.color.Color('RED')
(255, 0, 0, 255)
>>>

so maybe you could generally simplify your code.

Also, you should not name your dict Color, since there's already the Color class in pygame and that will only lead to confusion.

Parse string as list of tuples

Try another approach, with regular expressions:

>>> import re
>>> s = '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'
>>> e = '\(\s?(.*?)\s?,\s?(.*?)\s?\)'
>>> re.findall(e, s)
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]

Cpp: parse string snippets to tuple

There are many ways to parse the data. You can use std::stringstream or find or whatever. I believe the question you are asking is how to store the values directly into a tuple. For that, use std::get which returns a reference to the value in the tuple.

// Parameter s is the line to parse. Ex: "Dolly Davenell,8809903417,1 Toban Circle,Luozhou"
std::tuple<std::string, long, std::string, std::string> parse_line(std::string s)
{
std::stringstream ss(s);
std::tuple<std::string, long, std::string, std::string> t;

if (std::getline(ss, std::get<0>(t), ',')) {
if (ss >> std::get<1>(t)) {
ss.ignore(1); // Skip comma
if (std::getline(ss, std::get<2>(t), ',') && std::getline(ss, std::get<3>(t))
return t;
}
}
}
// throw here or handle error somehow
}

I changed int to long as the value in the example is too large for 32-bit int.

How to parse the string into list of tuples

You can use Python's built-in eval() function.

In this case, you can write your code like this:

s = " ( (4, 4), (11, 23), (8, 2), (12, 4), (7, 9) ) "
print(list(eval(s)))

Parsing a string which represents a list of tuples

>>> import ast
>>> print ast.literal_eval("(8, 12.25), (13, 15), (16.75, 18.5)")
((8, 12.25), (13, 15), (16.75, 18.5))

Convert a string representation of a list of tuples to a list when elements are not quoted

A general solution to this would require implementing a parser, but your simple example can be solved with a regex and a list comprehension:

>>> import re
>>> [tuple(x.split(',')) for x in re.findall("\((.*?)\)", s)]
[('a', 'b'), ('c', 'd'), ('e', 'f')]

If you want to use Python's parser to do the parsing for you, you could do something like this:

>>> import ast
>>> parsed = ast.parse(s)
>>> [tuple(el.id for el in t.elts) for t in parsed.body[0].value.elts]
[('a', 'b'), ('c', 'd'), ('e', 'f')]

Though keep in mind both these approaches assume your input has a very particular structure.


The most complete approach would be to implement a parser specific to the form of the input that you expect, using a tool like https://www.dabeaz.com/ply/

Here is an example: you can put this parsing code in a module named parser.py:

# parser.py
import os

import ply.lex as lex
import ply.yacc as yacc

class ParserBase:
"""
Base class for a lexer/parser that has the rules defined as methods
"""
def __init__(self, **kw):
self.debug = kw.get("debug", 0)
modname = (
os.path.split(os.path.splitext(__file__)[0])[1]
+ "_"
+ self.__class__.__name__
)
self.debugfile = modname + ".dbg"
self.tabmodule = modname + "_" + "parsetab"

# Build the lexer and parser
lex.lex(module=self, debug=self.debug)
yacc.yacc(
module=self,
debug=self.debug,
debugfile=self.debugfile,
tabmodule=self.tabmodule,
)

def parse(self, expression):
return yacc.parse(expression)

class Parser(ParserBase):

tokens = (
"NAME",
"COMMA",
"LPAREN",
"RPAREN",
"LBRACKET",
"RBRACKET",
)

# Tokens

t_COMMA = r","
t_LPAREN = r"\("
t_RPAREN = r"\)"
t_LBRACKET = r"\["
t_RBRACKET = r"\]"
t_NAME = r"[a-zA-Z_][a-zA-Z0-9_]*"

def t_error(self, t):
raise ValueError("Illegal character '%s'" % t.value[0])

def p_expression(self, p):
"""
expression : name
| list
| tuple
"""
p[0] = p[1]

def p_name(self, p):
"name : NAME"
p[0] = str(p[1])

def p_list(self, p):
"""
list : LBRACKET RBRACKET
| LBRACKET arglist RBRACKET
"""
if len(p) == 3:
p[0] = []
elif len(p) == 4:
p[0] = list(p[2])

def p_tuple(self, p):
"""
tuple : LPAREN RPAREN
| LPAREN arglist RPAREN
"""
if len(p) == 3:
p[0] = tuple()
elif len(p) == 4:
p[0] = tuple(p[2])

def p_arglist(self, p):
"""
arglist : arglist COMMA expression
| expression
"""
if len(p) == 4:
p[0] = p[1] + [p[3]]
else:
p[0] = [p[1]]

def p_error(self, p):
if p:
raise ValueError(f"Syntax error at '{p.value}'")
else:
raise ValueError("Syntax error at EOF")

Then use it this way:

>>> from parser import Parser
>>> p = Parser()
>>> p.parse('[(a,b),(c,d),(e,f)]')
[('a', 'b'), ('c', 'd'), ('e', 'f')]

This should work for arbitrarily-nested inputs:

>>> p.parse('[(a,b),(c,d),([(e,f,g),h,i],j)]')
[('a', 'b'), ('c', 'd'), ([('e', 'f', 'g'), 'h', 'i'], 'j')]

And will give you a nice error if your string doesn't match the parsing rules:

>>> p.parse('[a,b,c)')
...
ValueError: Syntax error at ')'


Related Topics



Leave a reply



Submit