In PHP, How to Extract Multiple E-Mail Addresses from a Block of Text and Put Them into an Array

In PHP, how do I extract multiple e-mail addresses from a block of text and put them into an array?

Your code is almost perfect, you just need to replace preg_match(...) with preg_match_all(...)

http://www.php.net/manual/en/function.preg-match.php

http://www.php.net/manual/en/function.preg-match-all.php

Extract email address from string - php

Try this

<?php 
$string = 'Ruchika < ruchika@example.com >';
$pattern = '/[a-z0-9_\-\+\.]+@[a-z0-9\-]+\.([a-z]{2,4})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
var_dump($matches[0]);
?>

see demo here

Second method

<?php 
$text = 'Ruchika < ruchika@example.com >';
preg_match_all("/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i", $text, $matches);
print_r($matches[0]);
?>

See demo here

How to get email address from a long string

If you're not sure which part of the space-separated string is the e-mail address, you can split the string by spaces and use

filter_var($email, FILTER_VALIDATE_EMAIL)

on each substring.

How to make a PHP regular expression assertion apply to the whole pattern when finding email addresses?

You need to have a word boundary at the beginning of regex to avoid matching the text partially, and also use + instead of * for the username part in email regex. Try using this regex,

(?<!])\b([a-zA-Z0-9_\-\.]+@\S+\.\w+)(?!\[)

Demo

How to change a pipe-forwarded email's From and To section in PHP

After a lot of failed attempts, finally have a solution I can live with. The host server didn't want to allow MailParse because it was an issue on their shared hosting environment, so I went with Mail_mimeDecode and Mail_MIME PEAR extensions.

// Read the message from STDIN
$fd = fopen("php://stdin", "r");
$input = "";
while (!feof($fd)) {
$input .= fread($fd, 1024);
}
fclose($fd);

$params['include_bodies'] = true;
$params['decode_bodies'] = true;
$params['decode_headers'] = true;
$decoder = new Mail_mimeDecode($input);
$structure = $decoder->decode($params);

// get the header From and To email
$From = ExtractEmailAddress($structure->headers['from'])[0];
$To = ExtractEmailAddress($structure->headers['to'])[0];
$Subject = $structure->headers['subject'];

ExtractEmailAddress uses a solution from "In PHP, how do I extract multiple e-mail addresses from a block of text and put them into an array?"

For the Body I used the following to find the text and html portions:

$HTML = "";
$TEXT = "";
// extract email body details
foreach($structure as $K => $V){
if(is_array($V)){
foreach($V as $KK => $VV){
if(is_object($VV)){
$bodyHTML = false;
$bodyPLAIN = false;
foreach($VV as $KKK => $VVV){
if(!is_array($VVV)){
if($KKK === 'ctype_secondary'){
if($VVV === 'html') { $bodyHTML = true; }
if($VVV === 'plain') { $bodyPLAIN = true; }
}
if($KKK === 'body'){
if($bodyHTML){
$bodyHTML = false;
$HTML .= quoted_printable_decode($VVV);
}
if($bodyPLAIN){
$bodyPLAIN = false;
$TEXT .= quoted_printable_decode($VVV);
}
}
}
}
}
}
}
}

Finally, I had the parts I needed so I used Mail_MIME to get the message out. I do my database lookup logic here and find the real destination and masked From email address using the From and To I extracted from the header.

$mime = new Mail_mime(array('eol' => "\r\n"));
$mime->setTXTBody($TEXT);
$mime->setHTMLBody($HTML);

$mail = &Mail::factory('mail');
$hdrs = array(
'From' => $From,
'Subject' => $Subject
);
$mail->send($To, $mime->headers($hdrs), $mime->get());

I don't know if this will cover all cases of email bodies, but since my system is not using attachments I am ok for now.

Take not of quoted_printable_decode(), that how I fixed the issue with the = in the body.

The only issue is the delay in mail I am having now, but I'll deal with that

How to extract qualifying emails from sql export text?

Based on your sample input and your criteria, this will tell regex your intentions:

/[^']+\.gov(?=')/

Match all non-apostrophe characters followed by .gov followed by (not capturing) '.

Then implode the output array with commas.

Or fewer steps with:

/^(?:[^,]*, ){7}'\K[^']+\.gov(?=')/m

Pattern Demo: https://regex101.com/r/nmNjZb/1/

This will match more literally from the start of each line.

Code: (Demo: https://3v4l.org/por3g )

$csv="(2843, '', 0, '', '', '', '', 'mail@yahoo.gr', '', ''),
(2844, '', 0, '', '', '', '', 'mail1@washpost.com', '', ''),
(2845, '', 0, '', '', '', '', 'someMail@gmail.com', '', ''),
(2846, '', 0, '', '', '', '', 'else@gmail.gov', '', ''),
(2846, '', 0, '', '', '', '', 'nextElse@gmail.gov', '', '')";
echo implode(',',preg_match_all("/^(?:[^,]*, ){7}'\K[^']+\.gov(?=')/m",$csv,$out)?$out[0]:[]);

Output:

else@gmail.gov,nextElse@gmail.gov

PHP extracting data from text

I don't think it's possible for a perfect solution, but FWIW, maybe this is good enough for you.

Without a known / reliable delimiter between clients, I can't think of any good way you can get the notes without having the header stuff for the next company included, unless you can do something involving a big lookup table of all client names.

I do have (an ugly) regex that may reliably help as far as the other stuff though:

$content='[the contents of your file]';
preg_match_all('~(Ser Name|Route|Address|Frequency|Week/Day|City State Zip|Sched Time \(HH:MM\)|Ser Phone|Service|Bill to|Rate \(\$\)|Terms|notes):\s*((?:(?!Ser Name|Route|Address|Frequency|Week/Day|City State Zip|Sched Time \(HH:MM\)|Ser Phone|Service|Bill to|Rate \(\$\)|Terms|notes).)+)~is',$content,$matches);

So this basically looks for the "header" and puts into first captured group, and then matches up to the next "header" and puts that into 2nd captured group.

Perhaps this is good enough for you, but TBH I can't think of anything better you can do, unless you can improve your extraction to a better format.

So your example data would output:

Array
(
[0] => Array
(
[0] => Ser Name: Block, Sunny
[1] => Route: 1

[2] => Address: 3354 ASPEN RD.
[3] => Frequency: Monthly

[4] => Address: ST PETE, GA 33333
[5] => Week/Day: First Monday

[6] => City State Zip: data
[7] => Sched Time (HH:MM): 10:00A

[8] => Ser Phone: 555-1212
[9] => Service: BASIC SERVICE

[10] => Bill to: BLOCK,SUNNY
[11] => Rate ($): 24.00

Company Name

Customer Information and Notes

Computed Monday, August 10 2015 Page 2

[12] => Address: 1123 Sligh
[13] => Terms: CASH

[14] => Address: Apt B

[15] => notes: Sunny has a mean dog
)

[1] => Array
(
[0] => Ser Name
[1] => Route
[2] => Address
[3] => Frequency
[4] => Address
[5] => Week/Day
[6] => City State Zip
[7] => Sched Time (HH:MM)
[8] => Ser Phone
[9] => Service
[10] => Bill to
[11] => Rate ($)
[12] => Address
[13] => Terms
[14] => Address
[15] => notes
)

[2] => Array
(
[0] => Block, Sunny
[1] => 1

[2] => 3354 ASPEN RD.
[3] => Monthly

[4] => ST PETE, GA 33333
[5] => First Monday

[6] => data
[7] => 10:00A

[8] => 555-1212
[9] => BASIC SERVICE

[10] => BLOCK,SUNNY
[11] => 24.00

Company Name

Customer Information and Notes

Computed Monday, August 10 2015 Page 2

[12] => 1123 Sligh
[13] => CASH

[14] => Apt B

[15] => Sunny has a mean dog
)

)

Use regex to strip out emails

If you already have a working regular expression, you can use PHP's preg_replace to replace all (non-overlapping) matches by a certain string, in our case "" (to remove them).

preg_replace($your_regex, "", $your_string)

This should strip all matches from your string.

Also, as @MonkeyZeus commented, if your regex contains the start anchor (^) or the end anchor ($), make sure to remove those before using preg_replace. Otherwise, the only match you can get will be the entire string, if it matches.



Related Topics



Leave a reply



Submit