5

I'm trying to build a javascript function capable of parsing a sentence and returning a number.

Here is a jsFiddle I've setup for the test cases below -

  1. 'I have 1 pound' -> 1
  2. 'I have £3.50 to spend' -> 3.50
  3. 'I have 23.00 pounds' -> 23
  4. '£27.33' -> 27.33
  5. '$4345.85' -> 4345.85
  6. '3.00' -> 3
  7. '7.0' -> 7
  8. 'Should have 2.0.' -> 2
  9. 'Should have 15.20.' -> 15.20
  10. '3.15' -> 3.15
  11. 'I only have 5, not great.' -> 5
  12. ' 34.23' -> 34.23
  13. 'sdfg545.14sdfg' -> 545.14
  14. 'Yesterday I spent £235468.13. Today I want to spend less.' -> 235468.13
  15. 'Yesterday I spent 340pounds.' -> 340
  16. 'I spent £14.52 today, £17.30 tomorrow' -> 14.52
  17. 'I have 0 trees, £11.33 tomorrow' -> 0

16&17 indicate that it should find the first number. I understand that some of the test cases may be tough but I welcome anything that gets me reasonable coverage.

Here is the format I'm using for my function

function parseSentenceForNumber(sentence){

    return number; //The number from the string
}

I think I could get 60-80% of the way myself, but I expect a regular expression might be the best solution here and I've never been great at them. Hopefully I have enough test cases but feel free to add any I might of missed.

Your help is much appreciated.

**UPDATE**

Loads of working answers and I need to spend some time looking at them in more detail. Mike Samuel mentioned commas and .5 which leads me to add another couple of test cases

18.'I have 1,000 pound' -> 1000 19.'.5' -> 0.5

And jsalonen mentioned adding test case for no numbers

20.'This sentence contains no numbers' -> null

Here is the updated fiddle using jsalonen's solution, without my changes in spec I'd be 100% there, with the changes I'm 95%. Can anyone offer a solution to number 18 with commas?

**UPDATE**

I added a statement to strip out commas to jsalonen's function and I'm at 100%.

Here is the final function

function parseSentenceForNumber(sentence){
    var matches = sentence.replace(/,/g, '').match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/);
    return matches && matches[0] || null;
}

And the final Fiddle

Really appreciate the help and I have improved my regular expression knowledge along the way. Thanks

8
  • Give it a try and see how it feels! Commented Jul 26, 2013 at 15:58
  • 2
    Rule 2 and 3 conflict, do you want the decimals or not? Commented Jul 26, 2013 at 16:00
  • RegExr is probably a better site than jsfiddle for RegEx test cases. Commented Jul 26, 2013 at 16:08
  • 1
    @FritsvanCampen, its outputting numbers, 23 is a 'decimal'. This is not a regex problem Commented Jul 26, 2013 at 16:38
  • 1
    @FritsvanCampen I think if it's "xy.00", he wants xx. But if it's "xy.zw", he wants xy.zw. Or, in layman's terms, he just wants the whole number if the decimal portion is only zeroes, otherwise he wants the entire thing including the decimals. Commented Jul 26, 2013 at 16:38

6 Answers 6

2

Answer that matches all negative and positive numbers with any number of digits:

function parseSentenceForNumber(sentence){
    var matches = sentence.match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/);
    return matches && matches[0] || null;
}

Consider adding negative test cases too, like testing what happens when a string does not have numbers:

test("Test parseSentenceForNumber('This sentence contains no numbers')", function() {
  equal( parseSentenceForNumber('This sentence contains no numbers'), null );
});

Full fiddle: http://jsfiddle.net/cvw8g/6/

Sign up to request clarification or add additional context in comments.

2 Comments

Should it handle ".5"?
@MikeSamuel Perhaps, but he didn't ask for it. Tests pass = DONE :)
2

The regular expression:

\d+(?:\.\d+)?

should do it.

  • \d+ matches a sequence of digits.
  • .\d+ matches a decimal point followed by digits.
  • (?:...)? makes that group optional

This doesn't deal with the special case where the fraction is all zeroes, and you don't want the fraction included in the result, that's difficult with a regexp (I'm not sure if it can even be done, although I'm willing to be proven wrong). It should be easier to handle that after matching the number with the decimal in it.

Once you've matched the number in the string, use parseFloat() to convert it to a number, and toFixed(2) to get 2 decimal places.

4 Comments

If you store it as a number, there's no difference between 32 and 32.00 in JavaScript
@NullUserException That's true, but it doesn't look like that's what he's doing. Notice that in case 2 he says 3.50, not 3.5. So he seems to care about trailing zeroes when the fraction is non-zero.
I don't care about trailing zeros, but I really wasn't clear. In the js Fiddle test cases I compare the results to numbers, so it answered it there. I should have been more explicit though, thanks.
I put your expression in the fiddle and sure enough, it works. jsfiddle.net/cvw8g/8
2

The general form of a number in computer readable form is:

/[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/

based on the grammar

number            := optional_sign (integer optional_fraction | fraction) optional_exponent;
optional_sign     := '+' | '0' | ε;
integer           := decimal_digit optional_integer;
optional_integer  := integer | ε;
optional_fraction := '.' optional_integer | ε;
fraction          := '.' integer;
optional_exponent := ('e' | 'E') optional_sign integer;

so you can do

function parseSentenceForNumber(sentence){
  var match = sentence.match(
      /[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/);
  return match ? +match[0] : null; //The number from the string
}

but this doesn't account for

  1. Locales that use fraction separators other than '.' as in "π is 3,14159..."
  2. Commas to separate groups of digits as in 1,000,000
  3. Fractions
  4. Percentages
  5. Natural language descriptions like "a dozen" or "15 million pounds"

To handle those cases you might search for "entity extraction" since that's the overarching field that tries to find phrases that specify structured data within unstructured text.

4 Comments

I tried your function here jsfiddle.net/tyULg and it says the regular expression is invalid. I have lots that work now anyway but your answer still deserves my upvote as I hadn't thought of commas and would probably need a solution for that. Thanks
@Ben, Fixed it. There was an extraneous close parenthesis.
Perfect, I added a call to strip out the commas and it also passes all 20 tests here - jsfiddle.net/RPTAb. I marked jsalonen as the answer as he got in first. Can you see any major cases where your answers would behave differently. Thanks
@Ben, jsalonen answer looks fine. The major difference is numbers in scientific notation, such as those generated by sprintf("%e", ...) and sprintf("%g", ...). For example, in "Avogadro's number is 6.02e23." it finds 6.02 instead of as 6.02e23 which may be irrelevant to your goals.
1

One more possible regex:

/\d+\.?\d{0,2}/

This means:

  • \d: one or more digits
  • \.?: zero or one period
  • d{0,2} up to 2 digits

http://jsfiddle.net/cvw8g/7/

Comments

1

No regex, uses parse aswell (so will return NaN if no number found).
Finds the first number in the string, then attempt to parse it from that point.

Passes all of your tests, and returns a number, not a string, so you can immediately use it for comparisons or arithmatic.

function parseSentenceForNumber(str) {
    //tacked on to support the new "1,000" -> 1000 case
    str = str.replace(',', '');

    var index;
    //find the first digit
    for (index = 0; index < str.length; ++index) {
        if (str.charAt(index) >= '0' && str.charAt(index) <= '9')
            break;
    }

    //checking for negative or decimal point (for '.5')
    if (index > 0 && (
        str.charAt(index - 1) == '-' ||
        str.charAt(index - 1) == '.'
    ))
        //go back one character
        --index;

    //get the rest of the string, accepted by native parseFloat
    return parseFloat(str.substring(index));
}

Comments

1

Passes all tests and I think it is a lot more readable:

function parseSentenceForNumber(sentence){
    return parseFloat(sentence.replace(/,(?=\d)/g,"").match(/-?\.?\d.*/g));
}

...well almost all tests: it returns 'NaN' instead of 'null' when no number is in sentence. But I think 'NaN' is more informative than a simple 'null'.

Here is the jsFiddle: http://jsfiddle.net/55AXf/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.