I am working on some code to parse text into XML. I am currently using java and jaxb to handle the XML and the in-program representation of my data. I need to setup an easily expandable and adaptable method to parse the info from my text files into my java classes. The data will for the most part stay the same, but I need to be able to support later changes in the text input format. (I am parsing airline pilot flight schedules, and I want to support the schedules of other airlines down the road.) It seems like regular expressions are the way to go, but the little I have worked with java RE makes it seem a poor solution compared to python - named captures specifically. But, I know less about python than I do about Java!
So, I am looking for a modular system to parse text data that I can easily adapt, extend, and distribute later on. I am willing to learn more python if I need it, but my time and abilities are limited. Any suggestions? An example of the text I am parsing follows.
=================================================================================================
8122 TU REPORT AT 06.45/N EFFECTIVE JUN 08-JUN 29
1 CAPT, 1 F/O
DAY FLT. EQP DEPARTS ARRIVES BLK. BLK. DUTY CR. LAYOVER MO TU WE TR FR SA SU
TU 180 320 PHX 0745 SAN 0857* 1.12 -- -- -- -- -- --
TU 005 320 SAN 0950 PHX 1106 1.16 -- 8 -- -- -- -- --
TU 592 L 320 PHX 1215 MCI 1652 2.37 -- 15 -- -- -- -- --
Radisson A/P 5.05 8.22 5.05 MCI 12.18 -- 22 -- -- -- -- --
(816) 464-2423 -- 29 --
WE 403 B 320 MCI 0610 PHX 0657 2.47
WE 149 320 PHX 0859 CMH 1547 3.48
Holiday Inn City Center 6.35 9.37 6.35 CMH 15.13
(614) 221-3281
TH 335 B 320 CMH 0800 PHX 0913 4.13
TH 343 L 320 PHX 1029 PVR 1508 2.39
Marriott Casamagna 6.52 9.23 6.52 PVR 15.52
52-322-2260000 TRANS: Hotel Shuttle
FR 621 320 PVR 0815 PHX 0839 2.24
2.24 3.39 2.24
CREDIT HRS. 21.00 BLK. HRS. 20.56 LDGS: 8 TAFB 74.24
=================================================================================================
pyparsing, and a parser generator like ANTLR. You'll need to decide which of these to use based on how much work you want to do.