Re.Sub Not Working for Me

Re.sub not working for me

You are assigning the result of re.sub back to a variable, right? e.g.

lines = re.sub(pattern, key[1], lines)

It's a string, so it can't be changed (strings are immutable in Python), therefore a new string is created and returned to you. If you don't assign it back to a name, you will lose it.

re.sub Not working for me with python

You have to re-assign it back to page:

page = re.sub("&",'',page)

re.sub not replacing the string

The symbols [ and ] means something in regular expressions, you have to escape them:

>>> re.sub('as_Points\[0\]\.ub_X', '0x00', text)
'AFL_v_CalcOneIntAreas (%0x00%);\n'

[a-z] represents all the lower letters for instance. [...] are used to denote «anything in them» so [01] is for 0 or 1.

In your case 'as_Points[0].ub_X' is in fact 'as_Points0.ub_X'.

Note that the . has special meanings too. It means 1 character. You should also escape it too.


If you don't know if your expression contains characters you should escape, you can use re.escape:

>>> someExpression = "as_Points[0].ub_X"
>>> re.escape(someExpression)
'as\\_Points\\[0\\]\\.ub\\_X'
>>> re.sub(re.escape(someExpression), '0x00', text)
'AFL_v_CalcOneIntAreas (%0x00%);\n'

But if you don't need regular expression power, strings have the replace method:

text.replace('as_Points[0].ub_X','0x00')

Python re.sub not working as expected

You're only stripping off spaces following <br> with that. You can instead use a positive lookahead to remove all <br>s that have another <br> immediately following:

re.sub(r'<br>(?=<br>)', '', _str)

You may handle inter <br> spaces with:

re.sub(r'<br>(?=\s*<br>)', '', _str)

Why Does re.sub() Not Work in Python 3.6?

I would replace all this with a single call to str.translate, since you are only making single-character-to-single-character replacements.

You'll just need to define a single dict (that you can reused for every call to str.translate) that maps each character to its replacement. Characters that stay the same do not need to be added to the mapping.

replacements = {}
replacements.update(dict.fromkeys(range(0x2000, 0x2070), " "))
replacements[0x1680] = ' '
# etc

string = string.translate(replacements)

You can also use str.maketrans to construct an appropriate translation table from a char-to-char mapping.

Why re.sub() adds not matched string by default in Python?

You seem to have a misunderstanding of what sub does. it substitutes the matching regex. this regex r'(size:)\D+(\d+)\D+(\d+)\D+(\d+)' matches part of your string and so ONLY THE MATCHING PART will be substituted, the capture groups do not effect this.
what you can do (if you don't want to add .* in the beginning and the end is to use re.findall like this

re.findall(
r'(size:)\D+(\d+)\D+(\d+)\D+(\d+)',
'START, size: 100Х200 x 50, END'
)

which will return [('size:', '100', '200', '50')], you can then format it as you wish.
one way to do is as one liner with no error handling is like this:

'{1}x{2}x{3}'.format(
*re.findall(
r'(size:)\D+(\d+)\D+(\d+)\D+(\d+)',
'START, size: 100Х200 x 50, END')[0]
)

Python re.sub() is not replacing every match

The site explains it well, hover and use the explanation section.

(.)(.*?)\1 Does not remove or match every double occurance. It matches 1 character, followed by anything in the middle sandwiched till that same character is encountered again.

so, for abbcabb the "sandwiched" portion should be bbc between two a

EDIT:
You can try something like this instead without regexes:

string = "abbcabb"
result = []
for i in string:
if i not in result:
result.append(i)
else:
result.remove(i)
print(''.join(result))

Note that this produces the "last" odd occurrence of a string and not first.

For "first" known occurance, you should use a counter as suggested in this answer . Just change the condition to check for odd counts. pseudo code(count[letter] %2 == 1)

Re.sub in python not working

While the other answer is technically absolutely correct, I don't think you want that what is mentionned there.

Instead, you might want to work with a match object:

m = re.search(r'href="([\w:/.]+)"', s, re.I)
print m.expand(r"url: \1")

which results to

url: http://google.com

without the <A before and the ID="test">blah</A> behind.

(If you want to do more of these replacements, you might even want to reuse the regex by compiling it:

r = re.compile(r'href="([\w:/.]+)"', re.I)
ex = lambda st: r.search(st).expand(r"url: \1")
print ex('<A HREF="http://www.google.com" ID="test">blah</A>')
print ex('<A HREF="http://www.yahoo.com" ID="test">blah</A>')
# and so on.

If, however, you indeed want to keep the HTML around it, you'll have to work with lookahead and lookbehind expressions:

re.sub(r'(?<=href=")([\w:/.]+)(?=")', "url: " + r'\1', s, flags=re.I)
# -> '<A HREF="url: http://www.google.com" ID="test">blah</A>'

or simply by repeating the omitted stuff:

re.sub(r'href="([\w:/.]+)"', r'href="url: \1"', s, flags=re.I)
# -> '<A href="url: http://www.google.com" ID="test">blah</A>'

python re.sub not replacing all the occurance of string

I would use re.findall here, rather than trying to do a replacement to remove the portions you don't want:

src = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123|  http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
matches = re.findall(r'https?://www\.\S+#([^|\s]+)', src)
output = '|'.join(matches)
print(output) # image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03

Note that if you want to be more specific and match only Google URLs, you may use the following pattern instead:

https?://www\.google\.\S+#([^|\s]+)

why does re.sub replaces none of the occurrences even there is already pattern, repl and string added

You need to assign the output of the re.sub back to the original variable.

   data = re.sub(r"\b{}\b".format(oldstring), newstring, data)


Related Topics



Leave a reply



Submit