2

I have a string with the following value:

TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00

I need to split this string as a key/value pair.

TOTAL DUE-STATEMENT $240.05
911 Fee $10.00
FRANCHISE TAX $.17
2VSALES TAX $.53
LOCAL-TAX $.23
SERVICE DISCOUNT -$50.00
PAYMENT - THANK YOU -$100.00
HBO+STARLET $100.00

My String value will be always dynamic and the description is dynamic except 911 Fee I wrote a regex as follows.

([911 a-zA-Z |911 a-zA-Z|a-zA-Z |a-zA-Z \\-? a-zA-Z|! ?|+? ]+)(-?\\$[0-9|,]*\\.[0-9][0-9])

I am getting the key/value pairs correctly, except the description contains numerals and letters and special characters. My output is as follows:

TOTAL DUE-STATEMENT $240.05
911 Fee $10.00
FRANCHISE TAX $.17
SALES TAX $.53   ** Which is wrong**(Expected is 2VSALES TAX as key)
LOCAL-TAX $.23
SERVICE DISCOUNT -$50.00
PAYMENT - THANK YOU-  $100.00 "-" is coming as key (Expected is PAYMENT - THANK YOU)
STARLET $100.00 **- Which is wrong** (Expected is HBO+STARLET)

Could some one please help me what I need to change in this regex?

1
  • This is a brilliant question. It has a clear objective, sample text which covers all the edge cases the requester could find, desired output, and my +1 vote. Commented Jul 2, 2013 at 5:02

6 Answers 6

2

Example: http://regexr.com?35dsq

Use this RegEx

/([-]{0,1}\$\d*\.\d\d)/g

It finds a $ followed by any number of digits, then a . then 2 digits.

Then in your replace use

 \1\n
Sign up to request clarification or add additional context in comments.

3 Comments

You'll miss the negative sign that appears sometimes before $.
@dda Thanks for that. Should be fixed now.
Why the down-vote? In the example, input matches expected output exactly.
1

Description

This regular expression solution assumes the money column sometimes has a - prefix but always contains a $ followed by zero or more digits, a dot, and exactly 2 digits. The rest of the characters are part of a name.

([^$]*?)(-?\$\d*\.\d{2})

enter image description here

Each capture group 1 will have the name, and capture group 2 will have the dollar value.

Examples:

Working example: http://www.rubular.com/r/9ODCQXyFoZ

Sample Text

TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00

Java Code

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("([^$]*?)(-?\\$\\d*\\.\\d{2})",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

Capture Groups

$matches Array:
(
    [0] => Array
        (
            [0] => TOTAL DUE-STATEMENT$240.05
            [1] => 911 Fee$10.00
            [2] => FRANCHISE TAX$.17
            [3] => 2VSALES TAX$.53
            [4] => LOCAL-TAX$.23
            [5] => SERVICE DISCOUNT-$50.00
            [6] => PAYMENT - THANK YOU-$100.00
            [7] => HBO+STARLET$100.00
        )

    [1] => Array
        (
            [0] => TOTAL DUE-STATEMENT
            [1] => 911 Fee
            [2] => FRANCHISE TAX
            [3] => 2VSALES TAX
            [4] => LOCAL-TAX
            [5] => SERVICE DISCOUNT
            [6] => PAYMENT - THANK YOU
            [7] => HBO+STARLET
        )

    [2] => Array
        (
            [0] => $240.05
            [1] => $10.00
            [2] => $.17
            [3] => $.53
            [4] => $.23
            [5] => -$50.00
            [6] => -$100.00
            [7] => $100.00
        )

)

Comments

0

Considering there are always two decimal places

Your regex could be simplified to

.+?[$]\d*[.]\d{2}

You need to match the pattern with above regex not split

Matcher m =Pattern.compile(regex).matcher(input);
while(m.find())
{
m.group();
}

4 Comments

You'll miss the negative sign that appears sometimes before $.
@dda this RegEx will match the lines the OP is wanting to separate by. The only issue is that OP has desired output containing spaces before the $ which this will not easily allow.
@DavidStarkey have you tried..am using .+? and since regex engine goes from left to right it would match perfectly..first try then comment
@Anirudh I have tried. I mentioned that your RegEx WORKS, the only issue is that the OP has spaces after the dollar amounts (TOTAL DUE-STATEMENT$240.05 becomes TOTAL DUE-STATEMENT $240.05) which your RegEx will not easily accomplish
0

As your price format is known, search for it, and everything in between is the description:

    String in = "TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00";
    Pattern price = Pattern.compile("-?\\$\\d*\\.\\d{2}");
    Matcher matcher = price.matcher(in);
    int offset = 0;
    while (matcher.find(offset)) {
        String description = in.substring(offset, matcher.start());
        String value = matcher.group();
        System.out.println(description + " " + value);
        offset = matcher.end();
    }

Comments

0
class Main {
    public static void main(String[] args) {
        String test = "TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00";
        java.util.regex.Pattern p = java.util.regex.Pattern.compile("(?<KEY>.+?(?=-?\\$[\\d,]*\\.\\d{2}))(?<VAL>-?\\$[\\d,]*\\.\\d{2})");
        java.util.regex.Matcher m = p.matcher(test);
        while(m.find()) {
            System.out.println(m.group("KEY") + " : " + m.group("VAL"));
        }
    }
}

You just need a non-greedy match for the KEY .+? and then a lookahead for the VALUE that will always end in a point and 2 digits for the cents.

Comments

-1

This should do it:

^(.+) (-?\$\d*\.\d\d)$

The second half of the regex matches the dollar amount, including the optional - sign. The first part takes everything else except the separating space.

1 Comment

Works for me with the text provided by the OP -- separate lines.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.