0

I'm working on a PHP ticket system where I pipe emails, grab their HTML and insert into database.

I've added this line to my outgoing emails:

## If you reply, text above this line is added to the request ##

Saw this type of thing in an Upwork email and it was easy enough grab only the email/html BEFORE that unique string, using:

//now, get only the stuff before our "dividing" line starts
$html = strstr($html, '## If', true) ?: $html;

Anyway, I've noticed Gmail adds the following automatically to all email replies:

On Fri, Jun 7, 2019 at 2:40 PM Carson Wentz<[email protected]> wrote:

So after I do step one to only keep things before "## If you reply...," I now would like to search the remaining text/html to see if it has a string starting with "On" and ending with "wrote:". And if so, only grab the stuff before that (similar to step 1).

I'm having trouble finding anything clearly explaining how to search a longer string for a shorter string that BEGINS WITH something AND ENDS WITH something specific, regardless of what's in the middle. I imagine it would have to use REGEX?

However, as I write this, I just realized that it's pretty likely that at some point someone might start their reply with "On" in which case EVERYTHING would be removed. Ugh.

If anyone has any ideas if this can be handled, please let me know. More I think about it, I might just have to have that Gmail-included line appear in all replies within the ticket system since I don't think there's an absolute way I can get that exact string, since it includes date/time and Name info that obviously is always different.

Thanks for your time.

1
  • A person could also write On Fri, Jun 7, 2019 as the intro to a sentence which makes even stricter regex wrong. Maybe start with On [A-Z][a-z]{2}, [A-Z][a-z]{2} \d{1,2}, \d{4} at \d?\d:\d?\d [AP]M [A-Za-z]+ [A-Za-z]+<.*?> wrote: Change the [A-Z][a-z]{2} to a optional group with day abbrevs. Then do the same for months. Change the \d{1,2}` to valid minutes, hours. Commented Jun 7, 2019 at 19:13

1 Answer 1

1

You can use preg_replace and the following pattern:

/^(?:On .+?> wrote:)?((\R|.)+?)## If you reply, text above this line is added to the request ##/

This optionally matches a literal On, then any characters up to > wrote:\n from the start of the body string, then captures everything until the termination message including newlines with \R.

Of course, you can go further to make the header pattern more strict, but it seems pretty unlikely that someone will write On [any characters...]> wrote:\n on exactly the first line, which is a false positive and would cause information to be lost. Going the strict route might wind up with edge cases where an unusual email address causes a false negative and is incorrectly considered part of the body.

The below example shows that even if this header appears anywhere after the first line, it'll be considered as part of the body.

Use ^\s*On if there might be spaces before the On... begins.

<?php

$withGmailHeader = "On Fri, Jun 7, 2019 at 2:40 PM Carson Wentz<[email protected]> wrote:

Here's the text content of the email. We'd like to extract it.

On Fri, Jun 6, 2019 at 2:53 AM Bob Smith<[email protected]> wrote:
'hello'

## If you reply, text above this line is added to the request ##";
$withoutGmailHeader = "On Fri, Jun 7, 2019 at 2:40 PM Carson Wentz<[email protected]>  wrote:

Here's the text content of the email. We'd like to extract it.

On Fri, Jun 6, 2019 at 2:53 AM Bob Smith<[email protected]> wrote:
'hello'

## If you reply, text above this line is added to the request ##";

$pattern = "/^(?:On .+?> wrote:)?((\R|.)+?)## If you reply, text above this line is added to the request ##/";

preg_match($pattern, $withGmailHeader, $match);
echo "\n=> With Gmail header:\n";
var_export($match[1]);
echo "\n\n=> Without Gmail header: (note the extra space after >)\n";
preg_match($pattern, $withoutGmailHeader, $match);
var_export($match[1]);

Output:

=> With Gmail header:
'

Here\'s the text content of the email. We\'d like to extract it.

On Fri, Jun 6, 2019 at 2:53 AM Bob Smith<[email protected]> wrote:
\'hello\'

'

=> Without Gmail header (note the extra space after >):
'On Fri, Jun 7, 2019 at 2:40 PM Carson Wentz<[email protected]>  wrote:

Here\'s the text content of the email. We\'d like to extract it.

On Fri, Jun 6, 2019 at 2:53 AM Bob Smith<[email protected]> wrote:
\'hello\'

'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.