1

TL;DR summary: I want a formula that will find the Nth "_" (for any N) in a string, and return its index; OR to find the Nth substring, separated by "_". I have VBA to do this, but it's slow.

Long version: I am working with advertising campaign data. My marketers (fortunately) use a consistent naming scheme for their campaigns. Unfortunately, it's very long.

The campaign names contain exactly 1 piece of data that I cannot otherwise get from reports.

For reference, campaign names are of the format:

ADV_CO_BG_Product_UniqueID_XX_mm.dd.yyyy_mm.dd.yyyy_TYP_NUM

... and I have a column of about 200K of them (growing by a couple hundred each week).

Edit:
The important part is that there are multiple parts of the campaign name, with _ as a delimiter between them. In this case, I want the 9th part, but i want an option that is flexible enough that I don't have to add or remove lines to change which part I target.

I've seen on other questions to use a nested formula like:

=MID(
  Data_OLV[@Campaign],
  FIND("_",Data_OLV[@Campaign],
    FIND("_",Data_OLV[@Campaign],
      FIND("_",Data_OLV[@Campaign],
        FIND("_",Data_OLV[@Campaign],
          FIND("_",Data_OLV[@Campaign],
            FIND("_",Data_OLV[@Campaign],
              FIND("_",Data_OLV[@Campaign],
                FIND("_",Data_OLV[@Campaign])+1)
              +1)
            +1)
          +1)
        +1)
      +1)
    +1)
  +1,
3)

... but that is hard to modify if I need something in a different position.

I have a UDF called StringSplit (see below) that provides the desired results, but it's extremely slow (and only works if you enable macros, which not all of my audience does).

Is there a better way to do what I'm trying to do?

    Public Function StringSplit(input_ As String, delimiter_ As String, index_ As Integer)
        On Error GoTo err

        out = Split(input_, delimiter_, -1, vbTextCompare)
        StringSplit = out(index_ - 1)
        Exit Function
    err:
        If err.Number = 9 Then
            StringSplit = CVErr(xlErrRef)
            Exit Function
        End If
        StringSplit = err.Description
    End Function
16
  • Are they carriage returns at the end, you could try along these lines split(split(strInput,"[typ]")(1),chr(10))(0) Commented Mar 27, 2019 at 10:26
  • 2
    "but that seems a bit absurd". But if it works, does that matter? :) Commented Mar 27, 2019 at 10:28
  • 1
    Here you can find an answer to your question: exceljet.net/formula/find-nth-occurrence-of-character Commented Mar 27, 2019 at 10:50
  • 1
    I had a complete formula with ADV_CO_BG_Product_UniqueID_XX_mm.dd.yyyy_mm.dd.yyyy_TYP_NUM ADV: Advertiser (Abbreviated) CO: Country (2 letter code) BG: Business (2- or 3-letter code) Product: Product Line (arbitrary string) UniqueID: structured text (variable length) XX: 2-letter code mm.dd.yyyy: Start / End of campaign TYP: The type of advertisment NUM: An internal identifier for a specific advertiser (not always present) and now I see i wasted my time. So you want to get the text between the last two underscores (if NUM is present) or last one (no NUM present) Commented Mar 27, 2019 at 12:57
  • 1
    The formula I posted can help you to a certain extent as you don't need to modify the formula, but change the Nth instance value and you can also change the delimiter type if required. The drawback of this formula is if the Nth instance changes, you have to manually count it and change the value. Commented Mar 27, 2019 at 13:12

5 Answers 5

3

If, for example, you want to locate the third instance of ? in cell A1, try:

=FIND(CHAR(1),SUBSTITUTE(A1,"?",CHAR(1),3))

enter image description here

NOTE:

We assume that CHAR(1) does not appear in the original string.
To get the last instance, use:

=FIND(CHAR(1),SUBSTITUTE(A1,"?",CHAR(1),(LEN(A1)-LEN(SUBSTITUTE(A1,"?","")))))
Sign up to request clarification or add additional context in comments.

Comments

3

I think this is the formula you are looking for -

=MID(A2, FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2))+1, FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2+1)) -  FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2))-1)

This is how to do it -

enter image description here

Here B2 is the Delimiter type and C2 is the Nth occurrence of the Delimiter. you can modify the code as per your need. Just change the B2 & C2.

Comments

0

You're saying, if I am correct, that the data you receive is always in format you posted and that you consistently want to extract the TYP data.

Why not search for TYP in the string, and additionally search for NUM as that indicates the following subdata?

Then, you would end up with a formula such as

=TRIM(MID(W20,SEARCH("TYP",W20),SEARCH("NUM",W20)-SEARCH("TYP",W20)))

In this formula, cell W20 holds the entire data-string. Naturally you can edit this range or instead paste the whole string in its place.

EDIT

Since OP mentioned the title strings are not consistent:

=TRIM(MID(W20,SEARCH(A1,W20),IF(A2="",LEN(W20),SEARCH(A2,W20)-SEARCH(A1,W20))))

In cell A1 would be the title string of the data that has to be extracted, in this case being TYP

In cell A2 would be the title string of the next subdata. If empty, the formula returns all characters found from the first SEARCH function using cell A1.

9 Comments

I am not sure whether they would be in the same order, so not sure searching for NUM would work, @farfromunique, could you clarify, as this will help.
They are always in the same order, but the text "TYP" isn't always there -- it can be (for instance) "OLV", "GDN", "Banner", "TVC", or others. Also, NUM is not always present.
So perhaps it would be a sound choice to place the 'titles', so to say, which need to be searched for in two separate cells and refer to those instead of the strings as I wrote it now? That way you'd be able to search dynamically with just a little user input.
@Farfromunique Ok, TYP can change, and NUM is not always present, but order is always the same. If NUM is not present, is there any other tag after TYP? or TYP would be the last one if there is no NUM?
Oh, that changes everything
|
0

As Egan Wolf commented, there is a solution at http://exceljet.net/formula/find-nth-occurrence-of-character =MID([@[Campaign]],FIND(CHAR(160),SUBSTITUTE([@[Campaign]],"_",CHAR(160),9))+1,4)

Or, more generally: =MID(TextToSearch,FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber ))+1,LengthOfDesiredSection)

LengthOfDesiredSection can, of course, by found with a subsection of the first formula, like so (line breaks added for clarity):

  =MID(TextToSearch,
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber))+1,
   IFERROR(
  (FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber+1)-
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber)))-1,
   LEN(TextToSearch)-
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber))))

The IFERROR() protects against situations where the Delimiter only appears InstanceNumber times in the TextToSearch.

Comments

0

One way to find the nth instance of an underscore delimited string, and return that sub-string, is with this formula:

=TRIM(MID(SUBSTITUTE(A1,"_",REPT(" ",999)),MAX(1,999*(n-1)),999))

where n is the instance you are looking for.

But, of course, this requires that the elements are present in the same order, and are always present (or replaced by an underscore if they are not).

If you are using a version of Excel with the FILTERXML function, you can use this formula:

=INDEX(FILTERXML("<t><s>" & SUBSTITUTE(A1,"_","</s><s>") & "</s></t>","//s"),n)

Not sure which one would be more efficient (faster) on a large database

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.