Extract Phone Number from Noised String

How to get only phone number not email address in a query

You could add a not like statement with a wildcard in it:
https://www.w3schools.com/sql/sql_wildcards.asp

 Select * from a.cus as cus_num, b.X+''+b.Y As phone number from table where b.Y NOT LIKE('%@%')

Formatting 10 digit phone number from database to html view

you could use the substring function:

'(' + substr(first three digits) + ')' + substr(middle three digits) + '-' + substr(last four)

Redacting phone numbers

In Python, to replace three or more digits with three dots in string s:

import re
s = re.sub(r'\d{3,}', '...', s)

"Exotic notations of e-mail addresses" is hard for me to parse; maybe you mean s/thing like

s = re.sub(r'[\w.]+@[\w.]+', '<email redacted>', s)

Extracting an integer from a string?

This is "plain" JavaScript, but FWIW:

justNumsAndDots = "rofl1.50lmao".replace(/[^\d.]/g,"") // -> "1.50" (string)
asIntegral = parseInt("0" + justNumsAndDots, 10) // -> 1 (number)
asNumber = parseFloat("0" + justNumsAndDots) // -> 1.5 (number)
asTwoDecimalPlaces = (2 + asNumber).toFixed(2) // -> "3.50" (string)

Notes:

  1. Doesn't take localization into account.
  2. Radix (base-10) is passed to parseInt to avoid potential octal conversion (not sure if this "issue" plagues AS).
  3. "0" is added to the start of justNumsAndDots so parseInt/parseFloat will never return a NaN here. (e.g. parseFloat(".") -> NaN, parseFloat("0.") -> 0). If NaN's are desired, alter to suite.
  4. Input like "rofl1.chopter50lolz" will be stripped to "1.50", it might be over-greedy, depending.
  5. Adapt to AS as necessary.

Happy coding.

Extracting number from a description filed for Hive

You could try using the Split function like this

SELECT
description,
split (description, ':\\s')[1] as Revenue
FROM task

Where :\\s is the regex pattern to match a colon followed by a space.

-------- EDIT: --------

If there are multiple : in the data then you could try (not sure if it will work though) the following (assuming that the last split will always contain the digits)

SELECT
description,
split (description, ':\\s')[size(split (description, ':\\s')) - 1] as Revenue
FROM task

Also your try of using Revenue\\s:\\s as the pattern may not be working due to the extra space matching try `Revenue:\s'

---------------------------

Or alternatively if the description doesn't always have the colon you could use the method regexp_extract(string subject, string pattern, int index)

Something like:

SELECT
description,
regexp_extract(description, '.*?(\d+)$', 1) as Revenue
FROM task

Where the regex pattern .*?(\\d+)$ will match multiple digits at the end of the description (but only if they are at the end)

With the latter option you should be able to find a suitable pattern if the description is not always consistent.

How to get only phone number not email address in a query

You could add a not like statement with a wildcard in it:
https://www.w3schools.com/sql/sql_wildcards.asp

 Select * from a.cus as cus_num, b.X+''+b.Y As phone number from table where b.Y NOT LIKE('%@%')


Related Topics



Leave a reply



Submit