Regular Expression: Match Start or Whitespace

Regular expression: match start or whitespace

Use the OR "|" operator:

>>> re.sub(r'(^|\W)GBP([\W\d])', u'\g<1>£\g<2>', text)
u'\xa3 5 Off when you spend \xa375.00'

regex match if starts or ends with whitespaces

modify it to the following

(^\s+)|(\s+$)

Based on modified OP, Use this Pattern ^\s*(.*?)\s*$ Demo look at capturing group #1

^               # Start of string/line
\s # <whitespace character>
* # (zero or more)(greedy)
( # Capturing Group (1)
. # Any character except line break
*? # (zero or more)(lazy)
) # End of Capturing Group (1)
\s # <whitespace character>
* # (zero or more)(greedy)
$ # End of string/line

Regular Expression match lines starting with a certain character OR whitespace and then that character

You can try

^\s*-
  • ^: start of string
  • \s*: zero or more whitespace characters
  • -: a literal - (you don't need to escape this outside a character class)

Regex that matches a string that starts with whitespace and has only digits after

The [ 0-9]{7} will match 7 digits or spaces in any order and this pattern can return partial matches since it is not anchored at the start/end of the string.

You can use a lookahead restricting the length of the string, and use the sequential subpatterns:

^(?=[\s\d]{7}$)\s*\d*$

See the regex demo

The pattern breakdown:

  • ^ - start of string
  • (?=[\s\d]{7}$) - the string will be matched only if the whole string consists of whitespaces or/and digits of whole length 7
  • \s* - 0+ whitespace symbols
  • \d* - 0+ digits
  • $ - end of string.

Regex: Specify space or start of string and space or end of string

You can use any of the following:

\b      #A word break and will work for both spaces and end of lines.
(^|\s) #the | means or. () is a capturing group.

/\b(stackoverflow)\b/

Also, if you don't want to include the space in your match, you can use lookbehind/aheads.

(?<=\s|^)         #to look behind the match
(stackoverflow) #the string you want. () optional
(?=\s|$) #to look ahead.

Regex match line starting with whitespace and first character is non-digit

See regex in use here

^(?! +\d+ ).*\n*
  • ^ Assert position at the start of the line
  • (?! +\d+ ) Negative lookahead ensuring what follows is not one or more spaces, then one or more digits, then a space
  • .* Match any character (except \n) any number of times
  • \n* Matches any number of newline characters

Result:

    1   1/153  M0139 1:15:08 2:05:50 2:29:20   2:29:20  5:42 Eric
2 2/153 M0139 1:15:07 2:06:29 2:29:56* 2:29:56 5:44 Bryan
49 8/77 M4049 1:36:48 2:54:03 3:37:02 3:36:59 8:17 Joshua
50 28/153 M0139 1:49:45 3:03:56 3:37:38# 3:37:22 8:18 Brian
99 1/16 M6069 1:56:30 3:15:24 3:51:06 3:50:46 8:49 Paul
100 3/35 F5059 1:50:06 3:11:37 3:51:03 3:50:47 8:49 Ashley
101 4/35 F5059 1:55:26 3:16:37 3:56:03 3:55:57 9:14 Joan

Regular expression: matching words between white space

You seem to work in Python as (?<=^|\s) is perfectly valid in PCRE, Java and Ruby (and .NET regex supports infinite width lookbehind patterns).

Use

(?<!\S)\w+(?!\S)

It will match 1 or more word chars that are enclosed with whitespace or start/end of string.

See the regex demo.

Pattern details:

  • (?<!\S) - a negative lookbehind that fails the match once the engine finds a non-whitespace char immediately to the left of the current location
  • \w+ - 1 or more word chars
  • (?!\S) - a negative lookahead that fails the match once the engine finds a non-whitespace char immediately to the right of the current location.

Regular expression match fails if only whitespace after the - character

Something like,

^\d+\.\d+\.\d+(?:\s*-\s*\w+)?\/\d+\.\d+\.\d+\.\d+(?:\s*-\s*\w+)?.txt$

Or you can combine the \.\d+ repetitions as

^\d+(?:\.\d+){2}(?:\s*-\s*\w+)?\/\d+(?:\.\d+){3}(?:\s*-\s*\w+)?.txt$

Regex Demo


Changes

  • .{1} When you want to repeat something once, no need for {}. Its implicit

  • (?:\s*-\s*\w+) Matches zero or more space (\s*) followed by -, another space and then \w+ a description of length greater than 1

    • The ? at the end of this patterns makes this optional.
    • This same pattern is repeated again at the end to match the second part.
  • ^ Anchors the regex at the start of the string.
  • $ Anchors the regex at the end of the string. These two are necessary so that there is nothing other in the string.
  • Don't group the patterns using () unless it is necessary to capture them. This can lead to wastage of memory. Use (?:..) If you want to group patterns but not capture them


Related Topics



Leave a reply



Submit