How to Remove the Ansi Escape Sequences from a String in Python

How can I remove the ANSI escape sequences from a string in python

Delete them with a regular expression:

import re

# 7-bit C1 ANSI sequences
ansi_escape = re.compile(r'''
\x1B # ESC
(?: # 7-bit C1 Fe (except CSI)
[@-Z\\-_]
| # or [ for CSI, followed by a control sequence
\[
[0-?]* # Parameter bytes
[ -/]* # Intermediate bytes
[@-~] # Final byte
)
''', re.VERBOSE)
result = ansi_escape.sub('', sometext)

or, without the VERBOSE flag, in condensed form:

ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
result = ansi_escape.sub('', sometext)

Demo:

>>> import re
>>> ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
>>> sometext = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
>>> ansi_escape.sub('', sometext)
'ls\r\nexamplefile.zip\r\n'

The above regular expression covers all 7-bit ANSI C1 escape sequences, but not the 8-bit C1 escape sequence openers. The latter are never used in today's UTF-8 world where the same range of bytes have a different meaning.

If you do need to cover the 8-bit codes too (and are then, presumably, working with bytes values) then the regular expression becomes a bytes pattern like this:

# 7-bit and 8-bit C1 ANSI sequences
ansi_escape_8bit = re.compile(br'''
(?: # either 7-bit C1, two bytes, ESC Fe (omitting CSI)
\x1B
[@-Z\\-_]
| # or a single 8-bit byte Fe (omitting CSI)
[\x80-\x9A\x9C-\x9F]
| # or CSI + control codes
(?: # 7-bit CSI, ESC [
\x1B\[
| # 8-bit CSI, 9B
\x9B
)
[0-?]* # Parameter bytes
[ -/]* # Intermediate bytes
[@-~] # Final byte
)
''', re.VERBOSE)
result = ansi_escape_8bit.sub(b'', somebytesvalue)

which can be condensed down to

# 7-bit and 8-bit C1 ANSI sequences
ansi_escape_8bit = re.compile(
br'(?:\x1B[@-Z\\-_]|[\x80-\x9A\x9C-\x9F]|(?:\x1B\[|\x9B)[0-?]*[ -/]*[@-~])'
)
result = ansi_escape_8bit.sub(b'', somebytesvalue)

For more information, see:

  • the ANSI escape codes overview on Wikipedia
  • ECMA-48 standard, 5th edition (especially sections 5.3 and 5.4)

The example you gave contains 4 CSI (Control Sequence Introducer) codes, as marked by the \x1B[ or ESC [ opening bytes, and each contains a SGR (Select Graphic Rendition) code, because they each end in m. The parameters (separated by ; semicolons) in between those tell your terminal what graphic rendition attributes to use. So for each \x1B[....m sequence, the 3 codes that are used are:

  • 0 (or 00 in this example): reset, disable all attributes
  • 1 (or 01 in the example): bold
  • 31: red (foreground)

However, there is more to ANSI than just CSI SGR codes. With CSI alone you can also control the cursor, clear lines or the whole display, or scroll (provided the terminal supports this of course). And beyond CSI, there are codes to select alternative fonts (SS2 and SS3), to send 'private messages' (think passwords), to communicate with the terminal (DCS), the OS (OSC), or the application itself (APC, a way for applications to piggy-back custom control codes on to the communication stream), and further codes to help define strings (SOS, Start of String, ST String Terminator) or to reset everything back to a base state (RIS). The above regexes cover all of these.

Note that the above regex only removes the ANSI C1 codes, however, and not any additional data that those codes may be marking up (such as the strings sent between an OSC opener and the terminating ST code). Removing those would require additional work outside the scope of this answer.

Remove ANSI escape sequences when redirecting python output

Turns out this is because grep isn't enabling color output when being piped. Its --color=auto option only enables color output if stdout is connected to a terminal. Running the same test with grep --color=always ... also results in ANSI escape characters being written to the text file.

I solved this problem by testing for sys.stdout.isatty() before adding ANSI escape codes

Debugging a Python function to remove ANSI codes

Your code should clean up the ANSI codes from the string presented, are you sure you're calling it right?

Either way, it will strip only the selected codes and is not a particularly elegant or performant way to do it - I'd suggest you to use regex and save yourself some trouble:

import re

ANSI_CLEANER = re.compile(r"(\x9B|\x1B\[)[0-?]*[ -/]*[@-~]")

clean_string = ANSI_CLEANER.sub("", your_string)

print(repr(clean_string))
# prints '?- human(socrates)\n.\ntrue.\n\n?- '

How to remove all the escape sequences from a list of strings?

Something like this?

>>> from ast import literal_eval
>>> s = r'Hello,\nworld!'
>>> print(literal_eval("'%s'" % s))
Hello,
world!

Edit: ok, that's not what you want. What you want can't be done in general, because, as @Sven Marnach explained, strings don't actually contain escape sequences. Those are just notation in string literals.

You can filter all strings with non-ASCII characters from your list with

def is_ascii(s):
try:
s.decode('ascii')
return True
except UnicodeDecodeError:
return False

[s for s in ['william', 'short', '\x80', 'twitter', '\xaa',
'\xe2', 'video', 'guy', 'ray']
if is_ascii(s)]

Exclude ANSI escape sequences from output log file

Use a regexp like \x1b\[[0-9;]*m to strip out the ANSI codes when writing to the output.log object?

I.e.

import re

ansi_re = re.compile(r'\x1b\[[0-9;]*m')

# ...

self.log.write(re.sub(ansi_re, '', message))

How to replace in string without breaking ansi escape codes?

Using the regex to match ANSI escape sequences from an earlier answer, we can make a helper function that only replaces those parts of the text that do not belong to such a sequence.

Assuming this is utils.py:

import re

# https://stackoverflow.com/a/14693789/18771
ANSICODE = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')

def replace_ansi(imput_str, search_str, replace_str):
pos = 0
result = []
for m in ANSICODE.finditer(imput_str):
text = imput_str[pos:m.start()]
text = text.replace(search_str, replace_str)
result.append(text)
result.append(m.group())
pos = m.end()

text = imput_str[pos:]
result.append(text)
return ''.join(result)

usage

from utils import replace_ansi

s1 = 'bla 1\x1b[1;31mbla 2\x1b[0mbla 3'
s2 = replace_ansi(s1, '1', 'X')
print(s1)
print(s2)

prints


bla 1[1;31mbla 2[0mbla 3
bla X[1;31mbla 2[0mbla 3

How to strip the ANSI escape codes from a text file?

This is not a direct answer to your question as you asked it, but I just wanted to note that terraform plan has a -no-color option which will disable the control codes and just emit plain text at the source, avoiding the need to strip the codes out later.

Python how to remove escape characters from a string

Maybe the regex module is the way to go

>>> s = 'test\x06\x06\x06\x06'
>>> s1 = 'test2\x04\x04\x04\x04'
>>> import re
>>> re.sub('[^A-Za-z0-9]+', '', s)
'test'
>>> re.sub('[^A-Za-z0-9]+', '', s1)
'test2'

Escaping color strings with regex in python

(One or two (color escape sequences)) followed by (uppercase alpha characters enclosed in square brackets)(positive look ahead)

pat = r'''((\[\d+m){1,2})(?=\[[A-Z]+\])'''

Works with this string:

s = '''[0m[ERROR] [1585551547.349979]: xyz xyz.
[0m[32m[INFO] [2020-03-29 23:58:34.695268] hjk hjk.[0m[32m[INFO] [2020-03-29 23:58:34.695268] foo bar foo'''

The positive lookahead prevents that last bit from being captured.


>>> print(re.sub(pat,'',s))
[ERROR] [1585551547.349979]: xyz xyz.
[INFO] [2020-03-29 23:58:34.695268] hjk hjk.[INFO] [2020-03-29 23:58:34.695268] foo bar foo
>>>

If you need to remove sequences specifying foreground and background colors like

[2m[93m[0m[32m[INFO] [2020-03-29 23:58:34.695268] foo bar foo

use pat = r'''((\[\d+m){1,})(?=\[[A-Z]+\])''' for (one or more) instead of (one or two).


If there is also stuff like this

[0m[1;37m[ERROR] [1585551547.349979]: xyz xyz.
[0m[1;37m[0;32m[ERROR] [1585551547.349979]: xyz xyz.

use pat = r'''(\[([01];)?\d+m){1,}(?=\[[A-Z]+\])'''


Some of your example strings showed color sequences in the middle of the string and you desired output showed them being replaced - contrary to your comment

all color codes in the beginning of lines.

These patterns will remove the sequence from the middle of a string.



Related Topics



Leave a reply



Submit