How to Get Domain from Email

How to select domain name from email address

Assuming that the domain is a single word domain like gmail.com, yahoo.com, use

select (SUBSTRING_INDEX(SUBSTR(email, INSTR(email, '@') + 1),'.',1))

The inner SUBSTR gets the right part of the email address after @ and the outer SUBSTRING_INDEX will cut off the result at the first period.

otherwise if domain is expected to contain multiple words like mail.yahoo.com, etc, use:

select (SUBSTR(email, INSTR(email, '@') + 1, LENGTH(email) - (INSTR(email, '@') + 1) - LENGTH(SUBSTRING_INDEX(email,'.',-1)))) 

LENGTH(email) - (INSTR(email, '@') + 1) - LENGTH(SUBSTRING_INDEX(email,'.',-1)) will get the length of the domain minus the TLD (.com, .biz etc. part) by using SUBSTRING_INDEX with a negative count which will calculate from right to left.

Regex get domain name from email

[^@] means "match one symbol that is not an @ sign. That is not what you are looking for - use lookbehind (?<=@) for @ and your (?=\.) lookahead for \. to extract server name in the middle:

(?<=@)[^.]+(?=\.)

The middle portion [^.]+ means "one or more non-dot characters".

Demo.

How to extract domain from email address with Pandas

I believe you need split and select second value of lists by indexing:

df = pd.DataFrame({'email':['kkk@gmail.com','aa@yahoo.com']})

df['domain'] = df['email'].str.split('@').str[1]
#faster solution if no NaNs values
#df['domain'] = [x.split('@')[1] for x in df['email']]
print (df)
email domain
0 kkk@gmail.com gmail.com
1 aa@yahoo.com yahoo.com

One-liner to extract domain from email address

Not one liner, and only works on 2.13. But this seems very clear to me.

def extractDomain(email: String): Option[String] = email match {
case s"${_}@${domain}" => Some(domain)
case _ => None
}

(Note, if there are more than one @ sign, this will just split on the first one).

How to get domain from email

>> "hey@mycorp.com".split("@").last
=> "mycorp.com"

Spark: Extaract domain from email address in dataframe

You can simple use inbuilt regexp_extract function to get your domain name from email address.

//create an example dataframe
val df = Seq((1, "ii@koko.com"),
(2, "lol@fsa.org"),
(3, "kokojambo@mon.eu"))
.toDF("id", "email")

//original dataframe
df.show(false)
//output
// +---+----------------+
// |id |email |
// +---+----------------+
// |1 |ii@koko.com |
// |2 |lol@fsa.org |
// |3 |kokojambo@mon.eu|
// +---+----------------+

//using regex get the domain name
df.withColumn("domain",
regexp_extract($"email", "(?<=@)[^.]+(?=\\.)", 0))
.show(false)

//output
// +---+----------------+------+
// |id |email |domain|
// +---+----------------+------+
// |1 |ii@koko.com |koko |
// |2 |lol@fsa.org |fsa |
// |3 |kokojambo@mon.eu|mon |
// +---+----------------+------+


Related Topics



Leave a reply



Submit