0

I'm trying to develop a word counting application that supports .pdf, .docx, .doc, .txt, ..etc documents and I was able to read .doc files with PHP and load the plain text to a variable.

I'm using following code to remove extra white spaces of the string.

$str = trim(preg_replace('/\s+/', ' ', $str));

My issue is: Word documents with hyperlinks are phrasing as Some dummy text here.. HYPERLINK "http://domain.com/directory/page" other dummy text is here..

So I want to remove that HYPERLINK "http://domain.com/directory/page" part or replace with a space or something.

Since I'm not a regular expression expert, I'm looking for help to solve this problem. Thanks!

0

1 Answer 1

1

HYPERLINK "http://domain.com/directory/page" will be matched by:

HYPERLINK "[^"]*"

Hyperlink, then quote, then anything but quote, then quote.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, Thanks for the answer It worked for me. I used / HYPERLINK "[^"]*" / because the part, I wanted to remove got two spaces before and after the string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.