I need to scrape an old mainframe text file containing Printer Control Language (PCL) for a data import. Altering the mainframe functions isn't an option. The print out contains product sales information and has a hierarchical output.
My hope is that I setup a Sql Server Integration Service import (SSIS). Ultimately this will be a data import ASP.NET MVC 3 website with a SQL 2005 database, so we could avoid SSIS. I currently build C# ASP.NET MVC 3 websites, so using related technologies should be manageable.
Has anyone succeeded in parsing a text report back in to a useful data import with text patterns (like Regular Expressions) in C# or SSIS? Are there any examples out there using a state design pattern?
I find a lot of these answers showing a small part of the answer: how to load a text file and take the nth column in C#. This is more involved. I need to identify each line type with a pattern based on what import state I am within. Off the shelf software would be even better.
Text file example:
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
first line containing prices
second line containing product description for the first line
third line containing a related product (listing all flavors)
fourth line containing a description for the third line
[third and forth may repeat]
[product set summary line]
[ repeat for next product]
this part may be a footer for the page that needs skipped
this part may be a footer for the page that needs skipped
at any point, the products will span between pages,
having header and footer lines between product data.