Python Regex - Finding Phone Number

Find USA phone numbers in python script

If you are interested in learning Regex, you could take a stab at writing it yourself. It's not quite as hard as it's made out to be. Sites like RegexPal allow you to enter some test data, then write and test a Regular Expression against that data. Using RegexPal, try adding some phone numbers in the various formats you expect to find them (with brackets, area codes, etc), grab a Regex cheatsheet and see how far you can get. If nothing else, it will help in reading other peoples Expressions.

Edit:
Here is a modified version of your Regex, which should also match 7 and 10-digit phone numbers that lack any hyphens, spaces or dots. I added question marks after the character classes (the []s), which makes anything within them optional. I tested it in RegexPal, but as I'm still learning Regex, I'm not sure that it's perfect. Give it a try.

(\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4})

It matched the following values in RegexPal:

000-000-0000
000 000 0000
000.000.0000

(000)000-0000
(000)000 0000
(000)000.0000
(000) 000-0000
(000) 000 0000
(000) 000.0000

000-0000
000 0000
000.0000

0000000
0000000000
(000)0000000

Python phone number regex

https://docs.python.org/3/library/re.html#re.findall

Findall returns lists of tuples, with each tuple representing the groups from one match. You are grouping the whitespaces but you're not grouping the actual digits.

Try a regex that groups the digits too:

r"(\+420)?(\s*)?(\d{3})(\s*)?\(d{3})(\s*)?\(d{3})"

E.g.

def detect_numbers(text):
phone_regex = re.compile(r"(\+420)?\s*?(\d{3})\s*?(\d{3})\s*?(\d{3})")
print(phone_regex.findall(text))

detect_numbers("so I need to match +420 123 123 123, also 123 123 123, also +420123123123 and also 123123123. Can y")

prints:

[('+420', '123', '123', '123'), ('', '123', '123', '123'), ('+420', '123', '123', '123'), ('', '123', '123', '123')]

You could then string-join the group matches to get the numbers, e.g.

def detect_numbers(text):
phone_regex = re.compile(r"(\+420)?\s*?(\d{3})\s*?(\d{3})\s*?(\d{3})")
groups = phone_regex.findall(text)
for g in groups:
print("".join(g))

detect_numbers("so I need to match +420 123 123 123, also 123 123 123, also +420123123123 and also 123123123. Can y")

prints:

+420123123123
123123123
+420123123123
123123123

python regex - finding phone number

A quick fix for you pattern is

\+?\d+(?:[- \)]+\d+)+

See the regex demo. Note that use of the non-capturing group that helps avoid creating lists of tuples in the result of the re.findall call.

Details

  • \+? - an optional (1 or 0) plus signs
  • \d+ - 1+ digits
  • (?: - start of a non-capturing group:

    • [- )]+ - 1 or more -, spaces,)` chars
    • \d+ - 1+ digits
  • )+ - 1 or more repetitions (the whole (?:...) sequence of patterns are quantified this way, both symbols and digits are required at least once and as a sequence).

Python demo:

import re
rx = r"\+?\d+(?:[- )]+\d+)+"
s = "+00 0000 0000 is my number and +44-787-77950 was my uk number"
print(re.findall(rx, s))
# => ['+00 0000 0000', '+44-787-77950']

Matching phone numbers, regex

I think you are looking for something like this:

(\(\d{3}\) \d{3}-\d{4})

From the Python docs:

{m}

Specifies that exactly m copies of the previous RE should be
matched; fewer matches cause the entire RE not to match. For example,
a{6} will match exactly six 'a' characters, but not five.

(\(\d\d\d\) \d\d\d-\d\d\d\d) would also work, but, as you said in your question, is rather repetitive. Your other suggested pattern, (\([0-9]+\) [0-9]+-[0-9]+), gives false positives on input such as (1) 2-3.

Find a valid phone number using regular expression in python

It matches 123-111-1234 (Everything except the first digit). Change your regex to: ^\d{3}-\d{3}-\d{4}$ to make sure it only matches the whole input (example).

Extract phone number using regex with different formats python

You could use

\b(?:03|7[016])[- /]?\d{3} ?\d{3}\b

Explanation

  • \b A word boundary
  • (?:03|7[016]) Match one of 03 70 71 76
  • [- /]? Optionally match - a space or /
  • \d{3} ?\d{3} Match 6 digits with an optional space after the 3rd digits
  • \b A word boundary

Regex demo | Python demo

For example

import re

regex = r"\b(?:03|7[016])[- /]?\d{3} ?\d{3}\b"
test_str = "Hi my name is marc and my phone number is 03-123456 and i would like 2 bottles of water 0.5L"
matches = re.search(regex, test_str)

if matches:
print(matches.group())

Output

03-123456

Python regular expression for phone numbers

I suggest using this pattern:

(?:\B\+ ?49|\b0)(?: *[(-]? *\d(?:[ \d]*\d)?)? *(?:[)-] *)?\d+ *(?:[/)-] *)?\d+ *(?:[/)-] *)?\d+(?: *- *\d+)?

See the regex demo. Note it is written based on your comment saying the phone numbers starts with +49 or a 0 and on the list of examples you provided. It may be considered "work in progress" since you have not provided more specific rules for phone number extraction.

Pattern details

  • (?:\B\+ ?49|\b0) - a +, optional space, 49 or a 0, both substrings cannot be preceded with a word char
  • (?: *[(-]? *\d(?:[ \d]*\d)?)? - an optional substring matching 0+ spaces, then an optional ( or -, 0+ spaces, a digit and then an optional sequence of digits/spaces followed with a digit
  • *(?:[)-] *)? - 0+ spaces and then an optional sequence of ) or - followed with 0+ spaces
  • \d+ - 1+ digits
  • * - 0+ spaces
  • (?:[/)-] *)? - an optional sequence of /, ) or - followed with 0+ spaces
  • \d+ - 1+ digits
  • *(?:[/)-] *)? - 0+ spaces and then an optional sequence of /, ) or - followed with 0+ spaces
  • \d+ - 1+ digits
  • (?: *- *\d+)? - an optional sequence: 0+ spaces, -, 0+ spaces, 1+ digits.


Related Topics



Leave a reply



Submit