Regular Expression to Remove Everything But Characters and Numbers

Regular Expression to remove everything but characters and numbers

There's probably a more concise regex, but this will certainly work:

string.replaceAll("[^a-zA-Z0-9]", "");

Regex to Remove Everything but Numbers, Letters and Spaces in R

gsub("[^[:alnum:] ]", "", x)

Try replacing the character class [^[:alnum:] ], which will match any character which is not a letter, number, or space:

Full code:

x <- "_kMDItemOwnerUserID = 99kMDItemAlternateNames = ( \"(500) Days of Summer     (2009).m4v\")kMDItemAudioBitRate = 163kMDItemAudioChannelCount =     2kMDItemAudioEncodingApplication = \"HandBrake 0.9.4 2009112300\"kMDItemCodecs =     ( \"H.264\", AAC, \"QuickTime Text\")"

gsub("[^[:alnum:] ]", "", x)
[1] "kMDItemOwnerUserID 99kMDItemAlternateNames 500 Days of Summer 2009m4vkMDItemAudioBitRate 163kMDItemAudioChannelCount 2kMDItemAudioEncodingApplication HandBrake 094 2009112300kMDItemCodecs H264 AAC QuickTime Text"

How to remove everything but letters, numbers, space, exclamation and question mark from string?

You can use regex

myString.replace(/[^\w\s!?]/g,'');

This will replace everything but a word character, space, exclamation mark, or question.

Character Class: \w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

\s stands for "whitespace character". It includes [ \t\r\n].

If you don't want the underscore, you can use just [A-Za-z0-9].

myString.replace(/[^A-Za-z0-9\s!?]/g,'');

For unicode characters, you can add something like \u0000-\u0080 to the expression. That will exclude all characters within that unicode range. You'll have to specify the range for the characters you don't want removed. You can see all the codes on Unicode Map. Just add in the characters you want kept or a range of characters.

For example:

myString.replace(/[^A-Za-z0-9\s!?\u0000-\u0080\u0082]/g,'');

This will allow all the previously mentioned characters, the range from \u0000-\u0080 and \u0082. It will remove \u0081.

How to remove everything except "01 - 10" pattern and vice-versa?

You may use this regex:

^(?!\d{2}\h*[:-]\h*\d{2}\h*$).*[\r\n]

RegEx Demo

RegEx Details:

  • ^: Start
  • (?!: Start negative lookahead
    • \d{2}: Match 2 digits
    • \h*[:-]\h*: Match 0 or more horizontal whitespaces followed by : or - followed by 0 or more horizontal whitespaces
    • \d{2}: Match 2 digits
    • \h*: Match 0 or more whitespaces
    • $: End of the line
  • ): End negative lookahead
  • .*: Match anything
  • [\r\n]: Match 1+ of line breaks
  • Replacement is an empty string to remove all matching lines

Reverse Removal

To remove digit pair lines you can use:

^(?=\d{2}\h*[:-]\h*\d{2}\h*$).*[\r\n]+

RegEx Demo 2

Remove everything except a certain pattern

In order to remove anything but a specific text, you need to use .*(text_you_need_to_keep).* with . matching a newline.

In Notepad++, use

       Find: .*(phone=\S*?digits=1).*

Replace: $1

NOTE: . matches newline option must be checked.

I use \S*? instead of .* inside the capturing pattern since you only want to match any non-whitespace characters as few as possible from phone= up to the closest digits. .* is too greedy and may stretch across multiple lines with DOTALL option ON.

UPDATE

When you want to keep some multiple occurrences of a pattern in a text, in Notepad++, you can use

.*?(phone=\S*?digits=1)

Replace with $1\n. With that, you will remove all the unwanted substrings but those after the last occurrence of your necessary subpattern.

You will need to remove the last chunk either manaully or with

   FIND: (phone=\S*?digits=1).*
REPLACE: $1

how to remove everything but letters, numbers and ! ? . ; , @ ' using regex in python pandas df?

Is this what you are looking for?

df.text.str.replace("(?i)[^0-9a-z!?.;,@' -]",'')
Out:
0 hey guys! wuzup
1 hello p3ople!What's up?
2 hey, how- thing don
3 my name is bond, james b0nd
Name: text, dtype: object

Remove everything except period and numbers from string regex in R

Try this

gsub("[\\c|\\(|\\)]", "",df$b)
#[1] "34.0522, 118.2437" "40.7128, 74.0059" "37.3382, 121.8863"

Regex remove all special characters except numbers?

Use the global flag:

var name = name.replace(/[^a-zA-Z ]/g, "");
^

If you don't want to remove numbers, add it to the class:

var name = name.replace(/[^a-zA-Z0-9 ]/g, "");


Related Topics



Leave a reply



Submit