Parse a .Py File, Read the Ast, Modify It, Then Write Back the Modified Source Code

Parse a .py file, read the AST, modify it, then write back the modified source code

Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).

Both these tools uses the lib2to3 library which is an implementation of the python parser/compiler machinery that can preserve comments in source when it's round tripped from source -> AST -> source.

The rope project may meet your needs if you want to do more refactoring like transforms.

The ast module is your other option, and there's an older example of how to "unparse" syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.

The redbaron project also may be a good fit (ht Xavier Combelle)

Can't get the right AST selector for history.push()

The following should work for history.push():

/* eslint no-restricted-syntax: ['error', 'CallExpression[callee.object.name="history"][callee.property.name="push"]'] */
history.push();

Transform a code tokens list into valid string code

I am trying to merge the tokens using this script

import re

def tokenize_for_bleu_eval(code):
    code = re.sub(r'([^A-Za-z0-9_])', r' \1 ', code)
    code = re.sub(r'([a-z])([A-Z])', r'\1 \2', code)
    code = re.sub(r'\s+', ' ', code)
    code = code.replace('"', '`')
    code = code.replace('\'', '`')
    tokens = [t for t in code.split(' ') if t]

    return tokens

def merge_tokens(tokens):
    code = ''.join(tokens)
    code = code.replace('`', "'")
    code = code.replace(',', ", ")

    return code

tokenize = tokenize_for_bleu_eval("struct.unpack('h', pS[0:2])")
print(tokenize)  # ['struct', '.', 'unpack', '(', '`', 'h', '`', ',', 'p', 'S', '[', '0', ':', '2', ']', ')']
merge_result = merge_tokens(tokenize)
print(merge_result)  # struct.unpack('h', pS[0:2])

Edit:

I found this interesting idea to tokenize and merge.

import re

def tokenize_for_bleu_eval(code):
    tokens_list = []
    codes = code.split(' ')
    for i in range(len(codes)):
        code = codes[i]
        code = re.sub(r'([^A-Za-z0-9_])', r' \1 ', code)
        code = re.sub(r'([a-z])([A-Z])', r'\1 \2', code)
        code = re.sub(r'\s+', ' ', code)
        code = code.replace('"', '`')
        code = code.replace('\'', '`')
        tokens = [t for t in code.split(' ') if t]
        tokens_list.append(tokens)
        if i != len(codes) -1:
            tokens_list.append([' '])
    
    flatten_list = []

    for tokens in tokens_list:
        for token in tokens:
            flatten_list.append(token)

    return flatten_list

def merge_tokens(flatten_list):
    code = ''.join(flatten_list)
    code = code.replace('`', "'")

    return code

test1 ="struct.unpack('h', pS[0:2])"
test2 = "items = [item for item in container if item.attribute == value]"
tokenize = tokenize_for_bleu_eval(test1)
print(tokenize)  # ['struct', '.', 'unpack', '(', '`', 'h', '`', ',', ' ', 'p', 'S', '[', '0', ':', '2', ']', ')']
merge_result = merge_tokens(tokenize)
print(merge_result)  # struct.unpack('h', pS[0:2])

tokenize = tokenize_for_bleu_eval(test2)
print(tokenize)  # ['items', ' ', '=', ' ', '[', 'item', ' ', 'for', ' ', 'item', ' ', 'in', ' ', 'container', ' ', 'if', ' ', 'item', '.', 'attribute', ' ', '=', '=', ' ', 'value', ']']
merge_result = merge_tokens(tokenize)
print(merge_result)  # items = [item for item in container if item.attribute == value]

This script will also remember each space from the input

Parse a .Py File, Read the Ast, Modify It, Then Write Back the Modified Source Code