4

I'm converting a PDF to text using xpdf pdf2text and it works great except for one thing: it converts paragraph symbols (¶) into the number 8. I need to find a way to get to everything with the pattern of:

preg_match_all('/\b8\d{1,2}-/', 'text');

but only replace the "8" from that pattern. I've tried saving the matches into an array, but them how do I re-insert them into the text where they belong?

Ideally, the paragraph tag would just convert properly, but I've tried several different encodings with no success; I think some of the pdf's have embedded fonts.

Any ideas on how I could replace just the "8" in that pattern? I can't just replace all 8's because the page or chapter of the article being referenced may be 8; but there is no danger of the paragraph being 80-something (which is why I check for a digit after the 8).

Thanks.

1 Answer 1

5

Capture the rest of the pattern in a group and put it back in place:

$str = preg_replace('/\b8(\d{1,2}-)/', 'replacement$1', $str);
Sign up to request clarification or add additional context in comments.

1 Comment

That's perfect! Thanks. I'll accept in 3 minutes when I'm allowed to.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.