1

I have a list of records that are character vectors. Here's an example:

'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'

From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.

So in this case the output would be:

0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100

I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.

Or are there some other ways to do this without using regexp?

2
  • 1mil_(.*)_ks_drivers_sorted\.csv and use captured group #1 Commented Sep 8, 2017 at 13:45
  • @anubhava What do you mean by "use captured group #1"? Commented Sep 8, 2017 at 13:48

2 Answers 2

4

Let the data be:

x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
     '1mil_0_1_lb100_ks_drivers_sorted.csv'
     '1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
     '1mil_1_1_lb100_ks_drivers_sorted.csv'};

You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:

result = cellfun(@(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);

Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks @excaza for noticing):

result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');

In your example, either of the above gives

result =
  4×1 cell array
    '0,1_1_1_lb200'
    '0_1_lb100'
    '1_1_lb2_100_100'
    '1_1_lb100'
Sign up to request clarification or add additional context in comments.

5 Comments

Exactly what I needed. Thank you!
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted.csv)','match'); [result{:}].' (if you want to avoid cellfun)
@SardarUsama Good idea. But I'm not sure that result = [result{:}].' is much clearer than result = cellfun(@(c) ..., x)
I agree with that though :)
@SardarUsama If you use regexp with 'once' you don't need to denest the cells.
0

For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.

If is a list, you can use a loop to do this.

Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""

newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.