5

If you had to design a file processing component/system, that could take in a wide variety of file formats (including proprietary formats such as Excel), parse/validate and store this information to a DB.. How would you do it?

NOTE : 95% of the time 1 line of input data will equal one record in the database, but not always.

Currently I'm using some custom software I designed to parse/validate/store customer data to our database. The system identifies a file by location in the file system(from an ftp drop) and then loads an XML "definition" file. (The correct XML is loaded based on where the input file was dropped off at).

The XML specifies things like file layout (Delimited or Fixed Width) and field specific items (Length, Data Type(numeric, alpha, alphanumeric), and what DB column to store the field to).

         <delimiter><![CDATA[ ]]></delimiter>
   <numberOfItems>12</numberOfItems>
   <dataItems>
    <item>
     <name>Member ID</name>
     <type>any</type>
     <minLength>0</minLength>
     <maxLength>0</maxLength>
     <validate>false</validate>
     <customValidation/>
     <dbColumn>MembershipID</dbColumn>
    </item>

Because of this design the input files must be text (fixed width or delimited) and have a 1 to 1 relation from input file data field to DB column.

I'd like to extend the capabilities of our file processing system to take in Excel, or other file formats.

There are at least a half dozen ways I can proceed but I'm stuck right now because I don't have anyone to really bounce the ideas off of.

Again : If you had to design a file processing component, that could take in a wide variety of file formats (including proprietary formats such as Excel), parse/validate and store this information to a DB.. How would you do it?

3
  • Serverfault.com is a site you could discuss system design with Commented Oct 12, 2009 at 14:50
  • No. Serverfault.com has server related questions only. Commented Oct 12, 2009 at 14:51
  • @Andrejs- Servefault is designed for System admins and IT professionals so it depends on what his question is about system design Commented Oct 12, 2009 at 14:54

3 Answers 3

1

Well, a straightforward design is something like...

+-----------+
| reader1   |
|           |---
+-----------+   \---
                    \---   +----------------+               +-------------+
                        \--|  validation    |               |  DB         |
                       /---|                |---------------|             |
+-----------+    /-----    +----------------+               +-------------+
| reader2   |----
|           |
+-----------+

Readers take care of file validation(does the data exist?) and parsing, the Validation section takes care of any business logic, and the DB...is a DB.

So part of what you'd have to design is the Generic ReaderToValidator data container. That's more of a business logic kind of container. I suspect you want the same kind of data regardless of the input format, so G.R.2.V. is not going to be too hard.

You can polymorphic this by designing a GR2V superclass with the Validator method and the data members, then each reader subclasses off of GR2V and fills up the data with its own ReadParseFile method. That's going to introduce a bit more coupling though than having a strict procedural approach. I'd go procedural for this, since data is being procedurally processed in the conceptual design.

Sign up to request clarification or add additional context in comments.

Comments

0

You may want to start a blog, then perhaps if you are on something like LinkedIn you can point the discussion to your blog, or start a discussion on LinkedIn, as some of the discussions there go on for a while.

Comments

0

SO is good for specifics, it seems like true discussion is not so easily done here. Comments are too small for interchange of ideas. I would tend to go elsewhere.

Although such discussions should be technology-agnostic, I suspect that you'll probably find that the Java and .Net camps don't meet too much. I would look at The Server Side but I do Java and hence look for Java stuff.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.