Can you add new statements to Python's syntax?
You may find this useful - Python internals: adding a new statement to Python, quoted here:
This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I'm taking a hands-on approach here: I'm going to add an until
statement to Python.
All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.
The until
statement
Some languages, like Ruby, have an until
statement, which is the complement to while
(until num == 0
is equivalent to while num != 0
). In Ruby, I can write:
num = 3
until num == 0 do
puts num
num -= 1
end
And it will print:
3
2
1
So, I want to add a similar capability to Python. That is, being able to write:
num = 3
until num == 0:
print(num)
num -= 1
A language-advocacy digression
This article doesn't attempt to suggest the addition of an until
statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python's philosophy of minimalism. All I'm trying to do here, really, is gain some insight into the inner workings of Python.
Modifying the grammar
Python uses a custom parser generator named pgen
. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar
[1]. This is a simple text file that specifies the grammar of Python.
[1]: From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python.
Two modifications have to be made to the grammar file. The first is to add a definition for the until
statement. I found where the while
statement was defined (while_stmt
), and added until_stmt
below [2]:
compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite
[2]: This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while
also has to be done for until
, it serves as a pretty good guideline.
Note that I've decided to exclude the else
clause from my definition of until
, just to make it a little bit different (and because frankly I dislike the else
clause of loops and don't think it fits well with the Zen of Python).
The second change is to modify the rule for compound_stmt
to include until_stmt
, as you can see in the snippet above. It's right after while_stmt
, again.
When you run make
after modifying Grammar/Grammar
, notice that the pgen
program is run to re-generate Include/graminit.h
and Python/graminit.c
, and then several files get re-compiled.
Modifying the AST generation code
After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.
So, we're going to visit Parser/Python.asdl
which defines the structure of Python's ASTs and add an AST node for our new until
statement, again right below the while
:
| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)
If you now run make
, notice that before compiling a bunch of files, Parser/asdl_c.py
is run to generate C code from the AST definition file. This (like Grammar/Grammar
) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py
is a Python script, this is a kind of bootstrapping - to build Python from scratch, Python already has to be available.
While Parser/asdl_c.py
generated the code to manage our newly defined AST node (into the files Include/Python-ast.h
and Python/Python-ast.c
), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c
. There, a function named ast_for_stmt
converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while
, we jump right into the big switch
for handling compound statements and add a clause for until_stmt
:
case while_stmt:
return ast_for_while_stmt(c, ch);
case until_stmt:
return ast_for_until_stmt(c, ch);
Now we should implement ast_for_until_stmt
. Here it is:
static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
/* until_stmt: 'until' test ':' suite */
REQ(n, until_stmt);
if (NCH(n) == 4) {
expr_ty expression;
asdl_seq *suite_seq;
expression = ast_for_expr(c, CHILD(n, 1));
if (!expression)
return NULL;
suite_seq = ast_for_suite(c, CHILD(n, 3));
if (!suite_seq)
return NULL;
return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
}
PyErr_Format(PyExc_SystemError,
"wrong number of tokens for 'until' statement: %d",
NCH(n));
return NULL;
}
Again, this was coded while closely looking at the equivalent ast_for_while_stmt
, with the difference that for until
I've decided not to support the else
clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr
for the condition expression and ast_for_suite
for the body of the until
statement. Finally, a new node named Until
is returned.
Note that we access the parse-tree node n
using some macros like NCH
and CHILD
. These are worth understanding - their code is in Include/node.h
.
Digression: AST composition
I chose to create a new type of AST for the until
statement, but actually this isn't necessary. I could've saved some work and implemented the new functionality using composition of existing AST nodes, since:
until condition:
# do stuff
Is functionally equivalent to:
while not condition:
# do stuff
Instead of creating the Until
node in ast_for_until_stmt
, I could have created a Not
node with an While
node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.
Compiling ASTs into bytecode
The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.
The code we will look at next is Python/compile.c
. Following the lead of while
, we find the function compiler_visit_stmt
, which is responsible for compiling statements into bytecode. We add a clause for Until
:
case While_kind:
return compiler_while(c, s);
case Until_kind:
return compiler_until(c, s);
If you wonder what Until_kind
is, it's a constant (actually a value of the _stmt_kind
enumeration) automatically generated from the AST definition file into Include/Python-ast.h
. Anyway, we call compiler_until
which, of course, still doesn't exist. I'll get to it an a moment.
If you're curious like me, you'll notice that compiler_visit_stmt
is peculiar. No amount of grep
-ping the source tree reveals where it is called. When this is the case, only one option remains - C macro-fu. Indeed, a short investigation leads us to the VISIT
macro defined in Python/compile.c
:
#define VISIT(C, TYPE, V) {\
if (!compiler_visit_ ## TYPE((C), (V))) \
return 0; \
It's used to invoke compiler_visit_stmt
in compiler_body
. Back to our business, however...
As promised, here's compiler_until
:
static int
compiler_until(struct compiler *c, stmt_ty s)
{
basicblock *loop, *end, *anchor = NULL;
int constant = expr_constant(s->v.Until.test);
if (constant == 1) {
return 1;
}
loop = compiler_new_block(c);
end = compiler_new_block(c);
if (constant == -1) {
anchor = compiler_new_block(c);
if (anchor == NULL)
return 0;
}
if (loop == NULL || end == NULL)
return 0;
ADDOP_JREL(c, SETUP_LOOP, end);
compiler_use_next_block(c, loop);
if (!compiler_push_fblock(c, LOOP, loop))
return 0;
if (constant == -1) {
VISIT(c, expr, s->v.Until.test);
ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
}
VISIT_SEQ(c, stmt, s->v.Until.body);
ADDOP_JABS(c, JUMP_ABSOLUTE, loop);
if (constant == -1) {
compiler_use_next_block(c, anchor);
ADDOP(c, POP_BLOCK);
}
compiler_pop_fblock(c, LOOP, loop);
compiler_use_next_block(c, end);
return 1;
}
I have a confession to make: this code wasn't written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while
function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis
module, which has a list of Python bytecodes with descriptions, it's possible to understand what's going on.
That's it, we're done... Aren't we?
After making all the changes and running make
, we can run the newly compiled Python and try our new until
statement:
>>> until num == 0:
... print(num)
... num -= 1
...
3
2
1
Voila, it works! Let's see the bytecode created for the new statement by using the dis
module as follows:
import dis
def myfoo(num):
until num == 0:
print(num)
num -= 1
dis.dis(myfoo)
Here's the result:
4 0 SETUP_LOOP 36 (to 39)
>> 3 LOAD_FAST 0 (num)
6 LOAD_CONST 1 (0)
9 COMPARE_OP 2 (==)
12 POP_JUMP_IF_TRUE 38
5 15 LOAD_NAME 0 (print)
18 LOAD_FAST 0 (num)
21 CALL_FUNCTION 1
24 POP_TOP
6 25 LOAD_FAST 0 (num)
28 LOAD_CONST 2 (1)
31 INPLACE_SUBTRACT
32 STORE_FAST 0 (num)
35 JUMP_ABSOLUTE 3
>> 38 POP_BLOCK
>> 39 LOAD_CONST 0 (None)
42 RETURN_VALUE
The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until
. If the jump isn't executed, the loop body keeps running until it jumps back to the condition at operation 35.
Feeling good about my change, I then tried running the function (executing myfoo(3)
) instead of showing its bytecode. The result was less than encouraging:
Traceback (most recent call last):
File "zy.py", line 9, in
myfoo(3)
File "zy.py", line 5, in myfoo
print(num)
SystemError: no locals when loading 'print'
Whoa... this can't be good. So what went wrong?
The case of the missing symbol table
One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build
in PyAST_Compile
calls into the symbol table module (Python/symtable.c
), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.
To fix the problem, we have to modify the symtable_visit_stmt
function in Python/symtable.c
, adding code for handling until
statements, after the similar code for while
statements [3]:
case While_kind:
VISIT(st, expr, s->v.While.test);
VISIT_SEQ(st, stmt, s->v.While.body);
if (s->v.While.orelse)
VISIT_SEQ(st, stmt, s->v.While.orelse);
break;
case Until_kind:
VISIT(st, expr, s->v.Until.test);
VISIT_SEQ(st, stmt, s->v.Until.body);
break;
[3]: By the way, without this code there’s a compiler warning for Python/symtable.c
. The compiler notices that the Until_kind
enumeration value isn’t handled in the switch statement of symtable_visit_stmt
and complains. It’s always important to check for compiler warnings!
And now we really are done. Compiling the source after this change makes the execution of myfoo(3)
work as expected.
Conclusion
In this article I've demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn't difficult to implement, because I used a similar and existing statement as a guideline.
The Python compiler is a sophisticated chunk of software, and I don't claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler's principles and source code. It will serve as a base for future articles that will get deeper into the compiler.
References
I used a few excellent references for the construction of this article. Here they are, in no particular order:
- PEP 339: Design of the CPython compiler - probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
- "Python Compiler Internals" - an article by Thomas Lee
- "Python: Design and Implementation" - a presentation by Guido van Rossum
- Python (2.5) Virtual Machine, A guided tour - a presentation by Peter Tröger
original source
Add new statements to Python without customizing the compiler
One cannot introduce a new keyword without changing the language
The parser is the tool/program that reads through the code, and decides what makes sense and what doesn't.
Although it's a rather coarse definition, the consequence is that the language is defined by its parser.
The parser relies on the language's (formal) grammar, specified in the ast
module documentation.
While defining a mere function only introduces a new feature without modifying the language, adding a keyword is tantamount to introducing a new syntax, which in turn changes the language's grammar.
Therefore, adding a new keyword, in the sense of adding a new syntax to a language, cannot be made without changing the grammar's language, which requires editing the compilation and execution chain.
However...
There might be some smart ways to introduce a new feature, that looks like a new syntax but in fact only uses the existing syntax.
For instance, the goto module relies on a not-so-well-known property of the language, that the spaces around a dot in a qualified identifier are ignored.
You can try this by yourself:
>>> l = [1, 2, 3]
>>> l .append(4)
>>> l
[1, 2, 3, 4]
>>> l. append(5)
>>> l
[1, 2, 3, 4, 5]
This allows using the following, that looks like a new syntax, but really is not:
label .myLabel
goto .myLabel
Now, the goto
module uses the way the interpreter internally works to perform break from one goto
to a given label
...
But that's another problem.
I'd like to add that Python is quite an open-minded language.
It provides a nice amount of seldom used operators, for instance, @
.
This operator, introduced from Python 3.5, was primarily meant for matrix multiplication, and falls back to a call to __matmul__
.
I have to say, I've never seen it in code.
So, why not use it for your purpose?
Let's do it step-by-step.
I propose to define a r
class, that will behave as a regex.
import re
class r:
def __init__(self, pattern):
self.regex = re.compile(pattern)
Now, I want to be able to use the @
operator with this class, together with a string, with the semantic of a match
between the string and the pattern.
I'll define the __matmul__
method, just as follows:
class r:
def __matmul__(self, string):
return bool(self.regex.match(string))
Now, I can do the following:
>>> r("hello") @ "hello"
True
>>> r("hello"] @ "world"
False
Pretty nice, but not that yet.
I'll define the __rmatmul__
method as well, so it merely falls back to a call to __matmul__
.
In the end, the r
class looks like this:
class r:
def __init__(self, pattern):
self.regex = re.compile(pattern)
def __matmul__(self, string):
return bool(self.regex.match(string))
def __rmatmul__(self, string):
return self @ string
Now, the reverse operation works as well:
>>> "hello" @ r("hello")
True
>>> "123456" @ r("\d+")
True
>>> "abc def" @ r("\S+$")
False
This is very near from what you were attempting, except, I didn't have to introduce a new keyword!
Of course, now the r
identifier must be protected, just like str
or list
...
Making your own statements
No, you cannot add new syntax within a Python program. The only way to alter the language is to edit and recompile the grammar file and supporting C code, to obtain a new altered interpreter, compiler and runtime.
How can I put multiple statements in one line?
Unfortunately, what you want is not possible with Python (which makes Python close to useless for command-line one-liner programs). Even explicit use of parentheses does not avoid the syntax exception. You can get away with a sequence of simple statements, separated by semicolon:
for i in range(10): print "foo"; print "bar"
But as soon as you add a construct that introduces an indented block (like if
), you need the line break. Also,
for i in range(10): print "i equals 9" if i==9 else None
is legal and might approximate what you want.
If you are still determined to use one-liners, see the answer by elecprog.
As for the try ... except
thing: It would be totally useless without the except
. try
says "I want to run this code, but it might throw an exception". If you don't care about the exception, leave out the try
. But as soon as you put it in, you're saying "I want to handle a potential exception". The pass
then says you wish to not handle it specifically. But that means your code will continue running, which it wouldn't otherwise.
How to build a new kind of syntax for an object in python
Based on your clarification, maybe this would work for you:
import subprocess
while True:
cmd = input("Enter a command: ")
if cmd != "":
cmd = cmd.split("!")[1]
subprocess.call(cmd, shell=True)
else:
print("Input cannot be None")
is it possible to add syntax synonym to 'def' in python?
def
is a keyword in python, so it can't be changed to anything else.
From the docs:
The following identifiers are used as reserved words, or keywords of
the language, and cannot be used as ordinary identifiers. They must be
spelled exactly as written here:
and del from not while as elif
global or with assert else if pass yield
break except import print class exec in
raise continue finally is return def for lambda
try
Related Topics
Is It Pythonic: Naming Lambdas
Elegant Python Code for Integer Partitioning
Keep Only Date Part When Using Pandas.To_Datetime
Python: Execute Cat Subprocess in Parallel
Retrieving Parameters from a Url
How to Find the Exact Intersection of a Curve (As Np.Array) with Y==0
Python: Finding Differences Between Elements of a List
How to Get the Number of Elements in a List in Python
Hash Function in Python 3.3 Returns Different Results Between Sessions
How to Blit a Png with Some Transparency Onto a Surface in Pygame
How to Have Clusters of Stacked Bars
Should You Always Favor Xrange() Over Range()
Should I Always Specify an Exception Type in 'Except' Statements
Slicing a List in Python Without Generating a Copy
Attributeerror: 'Module' Object Has No Attribute
Python Append() VS. + Operator on Lists, Why Do These Give Different Results