1

I'm not going to lie I'm really bad at making regular expressions. I'm currently trying to parse a text file that is giving me a lot of issues. The goal is to extract the data between their respective "tags/titles". The file in question is a .qbo file laid out as follows personal information replaced with "DATA": The parts that I care about retrieving are between the "STMTTRM" and "/STMTTRM" tags as the rest I don't plan on putting in my database, but I figured it would help others see the file content I'm working with. I apologize for any confusion prior to this update.

FXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE

<OFX>
<SIGNONMSGSRSV1><SONRS>
    <STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
    <DTSERVER>20190917133617.000[-4:EDT]</DTSERVER>
    <LANGUAGE>ENG</LANGUAGE>
    <FI>
        <ORG>DATA</ORG>
        <FID>DATA</FID>
    </FI>
    <INTU.BID>DATA</INTU.BID>
    <INTU.USERID>DATA</INTU.USERID>
</SONRS></SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
    <TRNUID>0</TRNUID>
    <STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
    <STMTRS>
        <CURDEF>USD</CURDEF>
        <BANKACCTFROM>
            <BANKID>DATA</BANKID>
            <ACCTID>DATA</ACCTID>
            <ACCTTYPE>CHECKING</ACCTTYPE>
            <NICKNAME>FREEDOM CHECKING</NICKNAME>
        </BANKACCTFROM>
        <BANKTRANLIST>
            <DTSTART>20190717</DTSTART><DTEND>20190917</DTEND>
            <STMTTRN><TRNTYPE>POS</TRNTYPE><DTPOSTED>20190717071500</DTPOSTED><TRNAMT>-5.81</TRNAMT><FITID>3893120190717WO</FITID><NAME>DATA</NAME><MEMO>POS Withdrawal</MEMO></STMTTRN>
            <STMTTRN><TRNTYPE>DIRECTDEBIT</TRNTYPE><DTPOSTED>20190717085000</DTPOSTED><TRNAMT>-728.11</TRNAMT><FITID>4649920190717WE</FITID><NAME>CHASE CREDIT CRD</NAME><MEMO>DATA</MEMO></STMTTRN>
            <STMTTRN><TRNTYPE>ATM</TRNTYPE><DTPOSTED>20190717160900</DTPOSTED><TRNAMT>-201.99</TRNAMT><FITID>6674020190717WA</FITID><NAME>DATA</NAME><MEMO>ATM Withdrawal</MEMO></STMTTRN>
        </BANKTRANLIST>
        <LEDGERBAL><BALAMT>2024.16</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></LEDGERBAL>
        <AVAILBAL><BALAMT>2020.66</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></AVAILBAL>
    </STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>

I want to be able to end with data that looks or acts like the following so that each row of data can easily be added to a database: Example Parse

4
  • 1
    It looks like an XML file, you can parse an XML file and get the required information. Regex may not be required. If it is single string, you can use regex to get the information. Commented Nov 14, 2019 at 16:57
  • Check out the documentation for XMLReader: docs.oracle.com/javase/7/docs/api/org/xml/sax/XMLReader.html Commented Nov 14, 2019 at 17:01
  • Not only is regex not required, it is absolutely the wrong tool for the job. Use a real XML parser. Commented Nov 14, 2019 at 18:20
  • Thank you for the help! I edited the post to specify that it is a .qbo file NOT XML. I'm new here and not a great programmer Commented Nov 14, 2019 at 18:39

3 Answers 3

1

As David has already answered, It is good to parse the POS output XML using Java. If you are more interested about about regex to get all the information, you can use this regular expression.

<[^>]+>|\\n+

You can test in the following sites.

https://rubular.com/ https://www.regextester.com/

Sign up to request clarification or add additional context in comments.

1 Comment

I did look into using what David mentioned as it sounded like it would work. It ended up not working correctly, I believe because it is a .qbo file that I failed to mention in the original post. I appreciate the response though as I'm a newbie!
0

Given this is XML, I would do one of two things:

  • either use the Java DOM objects to marshall/unmarshall to/from Java objects (nodes and elements), or
  • use JAXB to achieve something similar but with better POJO representation.

Mkyong has tutorials for both. Try the dom parsing or jaxb. His tutorials are simple and easy to follow.

JAXB requires more work and dependencies. So try DOM first.

1 Comment

I'm sorry I should've specified the file type is .qbo I will edit my post to reflect that. I'm new to posting on stackoverflow. I did try using the DOM method and I don't know if it will work for .qbo files. I received the following: "[Fatal Error] testFile.qbo:1:1: Content is not allowed in prolog".
0

I would propose the following approach.

Read file line by line with Files:

final List<String> lines = Files.readAllLines(Paths.get("/path/to/file"));

At this point you would have all file line separated and ready to convert the string lines into something more useful. But you should create class beforehand.

Create a class for your data in line, something like:

public class STMTTRN {
   private String TRNTYPE;
   private String DTPOSTED;
   ...
   ...
   //constructors
   //getters and setters
}

Now when you have a data in each separate string and a class to hold the data, you can convert lines to objects with Jackson:

final XmlMapper xmlMapper = new XmlMapper();
final STMTTRN stmttrn = xmlMapper.readValue(lines[0], STMTTRN.class);

You may want to create a loop or make use of stream with a mapper and a collector to get the list of STMTTRN objects:

final List<STMTTRN> stmttrnData = lines.stream().map(this::mapLine).collect(Collectors.toList());

Where the mapper might be:

private STMTTRN mapLine(final String line) {
    final XmlMapper xmlMapper = new XmlMapper();

    try {
        return xmlMapper.readValue(line, STMTTRN.class);

    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

2 Comments

Will this work for a non XML file? I updated the post to specify that it is a .qbo file as I failed to mention the file type. I apologize for any confusion.
Well not quite, however you could pre-process the file, i.e. extract the necessary lines, and then use the approach I have described.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.