1

I am trying to use the following code to count the number of the whole word "the" in a file. It keeps returning zero for the number of "the". How would I make this work?

totalthe=length(regexp(strcat(lines{:}),'\bthe\b'))

4
  • 1
    Can you give an example of the string you might be using? Commented Dec 4, 2013 at 3:50
  • I have the lines of a file read into a cell. Commented Dec 4, 2013 at 3:52
  • Check \s instead of '\b': totalthe=length(regexp(strcat(lines{:}),'\sthe\s')) Commented Dec 4, 2013 at 3:53
  • 1
    Other than using the proper MATLAB word boundary escape sequences (\< and \>), consider using regexpi instead of regexp for case insensitive matching (you probably don't want to miss The at the beginning of sentences!) Commented Dec 4, 2013 at 7:36

3 Answers 3

1

Sorry, turns out I may have led you astray in a previous answer. Turns out the word boundaries for MATLAB are \< and \> (for the start and ending word boundaries respectively) instead of \b. I learnt something new today too.

Note that this is preferable to using \s (whitespace), as otherwise you might miss matches at the start and end of the line.

Sign up to request clarification or add additional context in comments.

Comments

0

Summarizing all comments:

totalthe=length(regexpi(strvcat(lines{:}),'\<the\>'))

strvcat instead of strcat to prevent a leading The will not be stuck to a word at end of previous line.

1 Comment

Fails for lines = {'In the'; 'hat'}
0

Here we go, based on the other answers, comments and some trial and error:

Suppose these are your lines:

lines = {'In the cell on the island'; 'there is the man.';'The end'}

Then this will count the occurance of 'the', case insensitive:

x = regexpi(lines,'\<the\>')
numel([x{:}])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.