0

I am a little rusty on my C# and I am trying to figure out a simple script to use in SSIS that will comb through a text file and extract values based on a specific set of patterns and a specific sequence of said patterns.

I need to specify the individual rows of values from this text input to pass to a text file as output. So contract no, bank num, etc are heads and each row is a wrapped value within the file. I just need to be able to comb through and ID rows for output and was thinking that a regular expression could do the trick but I am unsure how to put something like that together. Is it possible to have it ID each row by looking for value patterns in a particular sequence?

I.E.

Pattern1 = [0-9] {9} for contract num
Pattern2 = [a-z][0-9] {6} for bank num

But look for instances where Pattern1 is before Pattern2?

I hope that makes sense.

Any and all help is greatly appreciated.

Thanks.

Sample Text

SampleText

8
  • Your first pattern should only match if not preceded with a letter, use Pattern1 = "(?<![a-z])[0-9]{9}" Commented Feb 12, 2018 at 20:53
  • 1
    Not completely clear on your requirements. Can you perhaps provide a sample showing what the expected output is? Commented Feb 12, 2018 at 20:53
  • Perhaps 0759651386 X08 606 0209784104 BURTON Commented Feb 12, 2018 at 21:39
  • Or 0759651386 | X08 606| 0209784104| BURTON| Commented Feb 12, 2018 at 21:39
  • Your example in the comments have duplicate spaces truncated so the actual sample is unknown to us. Note however, you can just passively look for the most important stuff first within a list of like values. What I mean by passively is without the use of assertive constraints. Example: (?:[a-z][0-9]{6}|[0-9]{9}) contains a list of alternations where the most important is listed first. It's equivalent to (?:abcd|abc|ab|a) sort of.. but you get the idea. Commented Feb 12, 2018 at 21:49

1 Answer 1

1

The file you're working with appears to be fixed width; whoever wrote the program that generates this file, he was communicating the meaning of each field by its position. So it is best that your program consume the information the way it was passed, by interpreting the data based on its position, not its ability to match a particular regular expression. That being said, regular expressions would be a great way to validate the data after it is parsed.

To work with this kind of data, I would probably build a class that represents a single record, and give it methods for parsing and validating. Here is something I came up with pretty quickly:

public class DetailRecord
{
    private readonly string _originalText;

    static private Dictionary<string, Func<string,string>> _map = new Dictionary<string, Func<string,string>>
    {
        { "ContractNo", s => s.Substring( 1  ,10 )        },
        { "BankNum",    s => s.Substring( 15 , 8 )        },
        { "ShortName",  s => s.Substring( 35 ,10 ).Trim() }
    };

    public DetailRecord(string originalText)
    {
        _originalText = originalText;
    }

    public string this[string key]
    {
        get
        {
            return _map[key](_originalText);
        }
    }
    public string BankNum
    {
        get { return this["BankNum"]; }
    }
    public string ContractNo
    {
        get { return this["ContractNo"]; }
    }
    public string ShortName
    {
        get { return this["ShortName"]; }
    }

    public bool IsValid
    {
        get
        {
            int dummy;

            if (!int.TryParse(this.ContractNo, out dummy)) return false;
            if (!Regex.IsMatch(this.BankNum, @"[A-Z]\d\d\s\s\d\d\d")) return false;

            return true;
        }
    }
}

You'll notice this class keeps a static dictionary (the _map) which contains a list of functions for parsing each field.

Also notice there is an IsValid property which uses a regular expression to validate the bank number. The contract number appears to be straight numeric, and it validates that too.

Test program:

public class Program
{
    public static void Main()
    {
        var input = " 0759651386    X08  606 0209784104 BURTON                                             3334.24";

        var line = new DetailRecord(input);
        if (line.IsValid)
        {
            Console.WriteLine("Contract number: '{0}'", line.ContractNo);
            Console.WriteLine("Bank number: '{0}'", line.BankNum);
            Console.WriteLine("Short name: '{0}'", line.ShortName);
        }
    }
}

Output:

Contract number: '0759651386'
Bank number: 'X08  606'
Short name: 'BURTON'

See my code on DotNetFiddle

Sign up to request clarification or add additional context in comments.

1 Comment

John Wu, thanks so much for the input! This will help immensely! The source file is generated using some pretty antiquated stuff based on what Ive been told. These files are very confusing and hard to follow just when reading them alone. I think that your approach is definitely the right way to approach it. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.