1

I need to scrape an old mainframe text file containing Printer Control Language (PCL) for a data import. Altering the mainframe functions isn't an option. The print out contains product sales information and has a hierarchical output.

My hope is that I setup a Sql Server Integration Service import (SSIS). Ultimately this will be a data import ASP.NET MVC 3 website with a SQL 2005 database, so we could avoid SSIS. I currently build C# ASP.NET MVC 3 websites, so using related technologies should be manageable.

Has anyone succeeded in parsing a text report back in to a useful data import with text patterns (like Regular Expressions) in C# or SSIS? Are there any examples out there using a state design pattern?

I find a lot of these answers showing a small part of the answer: how to load a text file and take the nth column in C#. This is more involved. I need to identify each line type with a pattern based on what import state I am within. Off the shelf software would be even better.

Text file example:

this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped

first line containing prices
  second line containing product description for the first line
    third line containing a related product (listing all flavors)
      fourth line containing a description for the third line
    [third and forth may repeat]
  [product set summary line]
[ repeat for next product]

this part may be a footer for the page that needs skipped
this part may be a footer for the page that needs skipped

at any point, the products will span between pages, 
having header and footer lines between product data.    

2 Answers 2

1

I've done a lot of parsing in C#. However, here, it's not clear to me what kind of text you need to parse (your example doesn't appear to show the actual text). Obviously, you need some way to identify the type of each line.

Here are a couple of articles that may help:

A Text Parsing Helper Class

A sscanf() Replacement for .NET

Sign up to request clarification or add additional context in comments.

Comments

1

I've been worked some years with cobol integrations, I had to broken text strings based in a "cobol book" that had fields specifications.

You can use the agpc.fixedlayout to help integration without need to use substrings to get informations about each field

This is the nuget https://www.nuget.org/packages/AGPC.FixedLayout

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.