1

This might sound like a very basic question, but it's one that's given me quite a lot of trouble in C#.

Assume I have, for example, the following Strings known as my chosenTarget.titles:

2008/SD128934 - Wordz aaaaand more words (1233-26-21)
20998/AD1234 - Wordz and less words (1263-21-21)
208/ASD12345 - Wordz and more words (1833-21-21)

Now as you can see, all three Strings are different in some ways.

What I need is to extract a very specific part of these Strings, but getting the subtleties right is what confuses me, and I was wondering if some of you knew better than I.

What I know is that the Strings will always come in the following pattern:

yearNumber + "/" + aFewLetters + theDesiredNumber + " - " + descriptiveText + " (" + someDate + ")"

In the above example, what I would want to return to me would be:

128934
1234
12345

I need to extract theDesiredNumber.

Now, I'm not (that) lazy so I have made a few attempts myself:

var a = chosenTarget.title.Substring(chosenTarget.title.IndexOf("/") + 1, chosenTarget.title.Length - chosenTarget.title.IndexOf("/"));

What this has done is sliced out yearNumber and the /, leaving me with aFewLetter before theDesiredNumber.

I have a hard time properly removing the rest however, and I was wondering if any of you could aid me in the matter?

6
  • What years are 208 and 20998? Really 208 and 20998? Commented Feb 18, 2016 at 10:50
  • Heh, I just made those Strings up to suit the example. I think what I copied in was 2008 and I just added and removed number for variety and to demonstrate that I don't know the length of the year beforehand. Commented Feb 18, 2016 at 10:52
  • for such case, Regex solution might be best Commented Feb 18, 2016 at 10:54
  • I would prefer regex. Try this one for example: stackoverflow.com/questions/841883/… and this one: stackoverflow.com/questions/21410065/… Commented Feb 18, 2016 at 10:54
  • Avoid Regex. WAY overkill. From what you currently have, just find the first index for which char.IsDigit returns true, and add another Substring. Commented Feb 18, 2016 at 10:57

6 Answers 6

3

It sounds as if you only need to extract the number behind the first / which ends at -. You could use a combination of string methods and LINQ:

int startIndex = str.IndexOf("/");
string number = null;
if (startIndex >= 0 )
{
    int endIndex = str.IndexOf(" - ", startIndex);
    if (endIndex >= 0)
    {
        startIndex++;
        string token = str.Substring(startIndex, endIndex - startIndex); // SD128934
        number = String.Concat(token.Where(char.IsDigit)); // 128934
    }
}

Another mainly LINQ approach using String.Split:

number = String.Concat(
            str.Split(new[] { " - " }, StringSplitOptions.None)[0]
              .Split('/')
              .Last()
              .Where(char.IsDigit));
Sign up to request clarification or add additional context in comments.

1 Comment

Good answers all around, but considering I don't know much about Regex (which I will be researching more now!) and Tim kept the answer within my limited understanding so point goes to him.
1

Try this:

 int indexSlash = chosenTarget.title.IndexOf("/");
 int indexDash = chosenTarget.title.IndexOf("-");
 string out = new string(chosenTarget.title.Substring(indexSlash,indexDash-indexSlash).Where(c => Char.IsDigit(c)).ToArray());

Comments

1

You can use a regex:

var pattern = "(?:[0-9]+/\w+)[0-9]";
var matcher = new Regex(pattern);
var result = matcher.Matches(yourEntireSetOfLinesInAString);

Or you can loop every line and use Match instead of Matches. In this case you don't need to build a "matcher" in every iteration but build it outside the loop

Comments

1

Regex is your friend:

(new [] {"2008/SD128934 - Wordz aaaaand more words (1233-26-21)",
"20998/AD1234 - Wordz and less words (1263-21-21)",
"208/ASD12345 - Wordz and more words (1833-21-21)"})
.Select(x => new Regex(@"\d+/[A-Z]+(\d+)").Match(x).Groups[1].Value)

Comments

1

The pattern you had recognized is very important, here is the solution:

const string pattern = @"\d+\/[a-zA-Z]+(\d+).*$";
string s1 = @"2008/SD128934 - Wordz aaaaand more words(1233-26-21)";
string s2 = @"20998/AD1234 - Wordz and less words(1263-21-21)";
string s3 = @"208/ASD12345 - Wordz and more words(1833-21-21)";
var strings = new List<string> { s1, s2, s3 };
var desiredNumber = string.Empty;

foreach (var s in strings)
{
    var match = Regex.Match(s, pattern);
    if (match.Success)
    {
        desiredNumber = match.Groups[1].Value;
    }
}

Comments

1

I would use a RegEx for this, the string you're looking for is in Match.Groups[1]

        string composite = "2008/SD128934 - Wordz aaaaand more words (1233-26-21)";
        Match m= Regex.Match(composite,@"^\d{4}\/[a-zA-Z]+(\d+)");
        if (m.Success) Console.WriteLine(m.Groups[1]);

The breakdown of the RegEx is as follows

"^\d{4}\/[a-zA-Z]+(\d+)"

^           - Indicates that it's the beginning of the string
\d{4}       - Four digits
\/          - /
[a-zA-Z]+   - More than one letters
(\d+)       - More than one digits (the parenthesis indicate that this part is captured as a group - in this case group 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.