Python regex, remove all punctuation except hyphen for unicode string
[^\P{P}-]+
\P
is the complementary of \p
- not punctuation. So this matches anything that is not (not punctuation or a dash) - resulting in all punctuation except dashes.
Example: http://www.rubular.com/r/JsdNM3nFJ3
If you want a non-convoluted way, an alternative is \p{P}(?<!-)
: match all punctuation, and then check it wasn't a dash (using negative lookbehind).
Working example: http://www.rubular.com/r/5G62iSYTdk
Remove all punctuation from string except full stop (.) and colon (:) in Python
you don't escape special characters in string.punctuation
for your regex. also you forgot to replace :
!
use re.escape
to escape regex special characters in punctuation. your final pattern will be [\!\"\#\$\%\&\'\(\)\*\+\,\-\/\;\<\=\>\?\@\[\\\]\^_\`\{\|\}\~]
import string
import re
remove = string.punctuation
remove = remove.replace(".", "")
remove = remove.replace(":", "")
pattern = r"[{}]".format(re.escape(remove))
line = "NETWORK [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
line = re.sub(pattern, "", line)
output:
NETWORK listener connection accepted from 127.0.0.1:59926 4785 3 connections now open
Remove all punctuation except apostrophes in R
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^[:alnum:][:space:]']", "", x)
[1] "I like to chew gum but don't like bubble gum"
The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.
Regex; eliminate all punctuation except
It's not clear to me what you want the result to be, but you might be able to use negative classes like this answer.
R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
How to remove punctuation from a string with exceptions using regex in bash
You can specify the punctuation marks you want removed, e.g.
>echo "Jiro. Inagaki' & Soul, Media_Breeze." | tr -d "[.,/\\-\=\+\{\[\]\}\!\@\#\$\%\^\*\'\\\(\)]"
Jiro Inagaki & Soul Media_Breeze
Or, alternatively,
>echo "Jiro. Inagaki' & Soul, Media_Breeze." | tr -dc '[:alnum:] &_'
Jiro Inagaki & Soul Media_Breeze
Python 3 Regex: remove all punctuation, except special word pattern
Using regex
module instead of re
with verbs (*SKIP)(*FAIL)
:
import regex
text = 'Lorem Ipsum, simply dummy text -TOKEN_ABC-, yes! '
res = regex.sub(r'-[A-Z]+(?:_[A-Z]+)*-(*SKIP)(*FAIL)|[^\w\s]+', '', text)
print (res)
Output:
Lorem Ipsum simply dummy text -TOKEN_ABC- yes
Explanation:
- # a hyphen
[A-Z]+ # 1 or more capitals
(?: # non capture group
_ # underscore
[A-Z]+ # 1 or more capitals
)* # end group, may appear 0 or more times
- # a hyphen
(*SKIP) # forget the match
(*FAIL) # and fail
| # OR
[^\w\s]+ # 1 or more non word characters or spaces
how can i remove punctuation except ! and ? in sentiment analysis in text mining using python
You can include the ?
and !
characters in your regular expression:
text = re.sub("[^a-zA-Z!?]".format(a), ' ', text)
Javascript regex to remove all punctuation except . and ?
Just use [^\w\s?.]
for your character class.
Removing all punctuation except - and _ from a java string using RegEx
Use a character class subtraction (and add a +
quantifier to match chunks of 1 or more punctuation chars):
name = name.replaceAll("[\\p{Punct}&&[^_-]]+", "");
See the Java demo.
The [\\p{Punct}&&[^_-]]+
means match any char from \p{Punct}
class except _
and -
.
The construction you found can also be used, but you'd need to put the -
and _
into a character class, and use .replaceAll("(?![_-])\\p{Punct}", "")
, or .replaceAll("(?:(?![_-])\\p{Punct})+", "")
.
Related Topics
Insert Missing Time Rows into a Dataframe
How to Increase Smoothness of Spheres3D in Rgl
Creating New Shape Palettes in Ggplot2 and Other R Graphics
R Shiny Dt - Edit Values in Table with Reactive
How to Add Expressions to Labels in Facet_Wrap
Ggplot Each Group Consists of Only One Observation
What Are the Ways to Create an Executable from R Program
Regex; Eliminate All Punctuation Except
Get Dates of a Certain Weekday from a Year in R
How to Adapt a Latex Beamer Theme to Apply It in an Rmarkdown::Beamer_Presentation