Regex; Eliminate All Punctuation Except

Python regex, remove all punctuation except hyphen for unicode string

[^\P{P}-]+

\P is the complementary of \p - not punctuation. So this matches anything that is not (not punctuation or a dash) - resulting in all punctuation except dashes.

Example: http://www.rubular.com/r/JsdNM3nFJ3

If you want a non-convoluted way, an alternative is \p{P}(?<!-): match all punctuation, and then check it wasn't a dash (using negative lookbehind).

Working example: http://www.rubular.com/r/5G62iSYTdk

Remove all punctuation from string except full stop (.) and colon (:) in Python

you don't escape special characters in string.punctuation for your regex. also you forgot to replace :!

use re.escape to escape regex special characters in punctuation. your final pattern will be [\!\"\#\$\%\&\'\(\)\*\+\,\-\/\;\<\=\>\?\@\[\\\]\^_\`\{\|\}\~]

import string
import re
remove = string.punctuation

remove = remove.replace(".", "")
remove = remove.replace(":", "")

pattern = r"[{}]".format(re.escape(remove))

line = "NETWORK [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
line = re.sub(pattern, "", line)

output:

NETWORK  listener connection accepted from 127.0.0.1:59926 4785 3 connections now open

Remove all punctuation except apostrophes in R

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^[:alnum:][:space:]']", "", x)

[1] "I like to chew gum but don't like bubble gum"

The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.

Regex; eliminate all punctuation except

It's not clear to me what you want the result to be, but you might be able to use negative classes like this answer.

R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"

How to remove punctuation from a string with exceptions using regex in bash

You can specify the punctuation marks you want removed, e.g.

>echo "Jiro. Inagaki' & Soul, Media_Breeze." | tr -d "[.,/\\-\=\+\{\[\]\}\!\@\#\$\%\^\*\'\\\(\)]"
Jiro Inagaki & Soul Media_Breeze

Or, alternatively,

>echo "Jiro. Inagaki' & Soul, Media_Breeze." | tr -dc '[:alnum:] &_'
Jiro Inagaki & Soul Media_Breeze

Python 3 Regex: remove all punctuation, except special word pattern

Using regex module instead of re with verbs (*SKIP)(*FAIL):

import regex
text = 'Lorem Ipsum, simply dummy text -TOKEN_ABC-, yes! '
res = regex.sub(r'-[A-Z]+(?:_[A-Z]+)*-(*SKIP)(*FAIL)|[^\w\s]+', '', text)
print (res)

Output:

Lorem Ipsum simply dummy text -TOKEN_ABC- yes

Explanation:

    -               # a hyphen
[A-Z]+ # 1 or more capitals
(?: # non capture group
_ # underscore
[A-Z]+ # 1 or more capitals
)* # end group, may appear 0 or more times
- # a hyphen
(*SKIP) # forget the match
(*FAIL) # and fail
| # OR
[^\w\s]+ # 1 or more non word characters or spaces

how can i remove punctuation except ! and ? in sentiment analysis in text mining using python

You can include the ? and ! characters in your regular expression:

text = re.sub("[^a-zA-Z!?]".format(a), ' ', text)

Javascript regex to remove all punctuation except . and ?

Just use [^\w\s?.] for your character class.

Removing all punctuation except - and _ from a java string using RegEx

Use a character class subtraction (and add a + quantifier to match chunks of 1 or more punctuation chars):

name = name.replaceAll("[\\p{Punct}&&[^_-]]+", "");

See the Java demo.

The [\\p{Punct}&&[^_-]]+ means match any char from \p{Punct} class except _ and -.

The construction you found can also be used, but you'd need to put the - and _ into a character class, and use .replaceAll("(?![_-])\\p{Punct}", ""), or .replaceAll("(?:(?![_-])\\p{Punct})+", "").



Related Topics



Leave a reply



Submit