Python Replace Single Quotes Except Apostrophes

Python Replace Single Quotes Except Apostrophes

What you really need to properly replace starting and ending '
is regex.
To match them you should use:

  • ^' for starting ' (opensingle),
  • '$ for ending ' (closesingle).

Unfortunately, replace method does not support regexes,
so you should use re.sub instead.

Below you have an example program, printing your desired output
(in Python 3):

import re
str = "don't 'George ma'am end.' didn't.' 'Won't"
words = str.split(" ")
for word in words:
word = re.sub(r"^'", '<opensingle>\n', word)
word = re.sub(r"'$", '\n<closesingle>', word)
word = word.replace('.', '\n<period>')
word = word.replace(',', '\n<comma>')
print(word)

Replace the single quote (') character from a string

As for how to represent a single apostrophe as a string in Python, you can simply surround it with double quotes ("'") or you can escape it inside single quotes ('\'').

To remove apostrophes from a string, a simple approach is to just replace the apostrophe character with an empty string:

>>> "didn't".replace("'", "")
'didnt'

Replace single quotes with double with exclusion of some elements

First attempt

You can also use this regex:

(?:(?<!\w)'((?:.|\n)+?'?)'(?!\w))

DEMO IN REGEX101

This regex match whole sentence/word with both quoting marks, from beginning and end, but also campure the content of quotation inside group nr 1, so you can replace matched part with "\1".

  • (?<!\w) - negative lookbehind for non-word character, to exclude words like: "you'll", etc., but to allow the regex to match quatations after characters like \n,:,;,. or -,etc. The assumption that there will always be a whitespace before quotation is risky.
  • ' - single quoting mark,
  • (?:.|\n)+?'?) - non capturing group: one or more of any character or
    new line (to match multiline sentences) with lazy quantifire (to avoid
    matching from first to last single quoting mark), followed by
    optional single quoting sing, if there would be two in a row
  • '(?!\w) - single quotes, followed by non-word character, to exclude
    text like "i'm", "you're" etc. where quoting mark is beetwen words,

The s' case

However it still has problem with matching sentences with apostrophes occurs after word ending with s, like: 'the classes' hours'. I think it is impossible to distinguish with regex when s followed by ' should be treated as end of quotation, or as or s with apostrophes. But I figured out a kind of limited work around for this problem, with regex:

(?:(?<!\w)'((?:.|\n)+?'?)(?:(?<!s)'(?!\w)|(?<=s)'(?!([^']|\w'\w)+'(?!\w))))

DEMO IN REGEX101

PYTHON IMPLEMENTATION

with additional alternative for cases with s': (?<!s)'(?!\w)|(?<=s)'(?!([^']|\w'\w)+'(?!\w) where:

  • (?<!s)'(?!\w) - if there is no s before ', match as regex above (first attempt),
  • (?<=s)'(?!([^']|\w'\w)+'(?!\w) - if there is s before ', end a match on this ' only if there is no other ' followed by non-word
    character in following text, before end or before another ' (but only ' preceded by letter other than s, or opening of next quotaion). The \w'\w is to include in such match a ' wich are between letters, like in i'm, etc.

this regex should match wrong only it there is couple s' cases in a row. Still, it is far from perfect solution.

Flaws of \w

Also, using \w there is always chance that ' would occur after sybol or non-[a-zA-Z_0-9] but still letter character, like some local language character, and then it will be treated as beginning of a quatation. It could be avoided by replacing (?<!\w) and (?!\w) with (?<!\p{L}) and (?!\p{L}) or something like (?<=^|[,.?!)\s]), etc., positive lookaround for characters wich can occour in sentence before quatation. However a list could be quite long.

Replace single quotes in a string but not escaped single quotes

You need a negative lookbehind, not a negative lookahead ("no backslash before a quote"):

result = '''{'key1': 4, 'key2': 'I\\'m home'}'''
print(re.sub(r"(?<!\\)'", '"', result))
#{"key1": 4, "key2": "I\'m home"}

Removing single quotes if they aren't in the middle of a word

Split the string, use strip() on each word to remove leading and trailing characters on it, then join it all back together.

>>> s = "'here is some stuff 'now there are quotes' now there's not'"
>>> print(' '.join(w.strip("'") for w in s.split()).lower())
here is some stuff now there are quotes now there's not

How to find all occurances of a single quote not within a word with python regex

Try with following regex.

Regex: (?<![a-zA-Z])'|'(?![a-zA-Z]) and replace with "

Explanation:

  • (?<![a-zA-Z])' matches apostrophe not preceded by a letter.

  • '(?![a-zA-Z]) matches the apostrophe not followed by a letter.

Regex101 Demo



Related Topics



Leave a reply



Submit