1

This question is continuation of this question. The problem is that regex "[-+]?\\d*\\.?\\d+([eE][-+]?\\d+)?" doesn't correctly find doubles.

For instance, input sdf9.99e.23 contains no doubles, cause if we have [eE], after it MUST be a [+-] or just [0-9].

So I need some kind of "if" in the regex. In pseudo-code it'll be like this: if(char[i]==(e|E)) then if(char[i+1] == ('+'|'-')) else return null.

15
  • 7
    So would sdf9.99f.23 contain a double (or two)? I'd argue that sdf9.99e.23 contains two double values: 9.99 and .23. The whole problem of parsing unstructured text is that it's simply far too open to interpretation. Unless you have a very specific definition you can always find a case that could be argued over. Commented Mar 1, 2012 at 7:54
  • 3
    @Helgus: but why is e so special that it "breaks" the string. Why doesn't f do the same? Or an empty space? Why do you choose to ignore any other malformed non-number, but that specific case should cause your algorithm to return an error? In 1a you find 1, in 1b you find 1, in 1c you find 1, in 1d you find 1, but in 1e you return an error. Why is that? Commented Mar 1, 2012 at 8:38
  • 1
    What about e1e2e3e4e5e Does this have five, three, two or no doubles? Commented Mar 1, 2012 at 9:06
  • 1
    @Helgus but e could also be a plain e Commented Mar 1, 2012 at 9:07
  • 1
    This is not an exact definition. Unfortunately for you, your "correctly" (according to comments) diverges with others' ones. Therefore you need to provide exact definition. "string like this 9.99e.23 is wrong" isn't full and exact definition. Sorry, but if I (or others) will not cavil at every word, you'll get inexact/wrong algoritm or will not get at all. Commented Mar 2, 2012 at 8:55

1 Answer 1

0

Using telepathy(anticipation) to extend your algorithm and disallow digits, dot, non-digit as number I may suggest these 3 regex's. Use them consecutively on the same string and unite (concat, append) results.

"[+-]?\\d+((?![\d.])|$)" // ±digits w/o dot after them (actually, this is integer)
"[+-]?\\d+\\.\\d+((?![\deE])|$)" // ±digits, dot, digit w/o [eE] after them
"[+-]?\\d+\\.\\d+[eE][+-]?\\d+" // full variant: ±digits, dot, digits, "e", ±digits

I tried some approach for combining that into one regexp, but unfortunately it doesn't work.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.