I am trying to use the following code to count the number of the whole word "the" in a file. It keeps returning zero for the number of "the". How would I make this work?
totalthe=length(regexp(strcat(lines{:}),'\bthe\b'))
Sorry, turns out I may have led you astray in a previous answer. Turns out the word boundaries for MATLAB are \< and \> (for the start and ending word boundaries respectively) instead of \b. I learnt something new today too.
Note that this is preferable to using \s (whitespace), as otherwise you might miss matches at the start and end of the line.
Summarizing all comments:
totalthe=length(regexpi(strvcat(lines{:}),'\<the\>'))
strvcat instead of strcat to prevent a leading The will not be stuck to a word at end of previous line.
lines = {'In the'; 'hat'}
\sinstead of '\b':totalthe=length(regexp(strcat(lines{:}),'\sthe\s'))\<and\>), consider usingregexpiinstead ofregexpfor case insensitive matching (you probably don't want to missTheat the beginning of sentences!)