0

I have html content with a <script> tag in it. In those <script> tags I have an url pointing to a video.

What I want is replace those html tags with my specific tag which use this pattern : [VIDEO]MY_URL_[/VIDEO]

I'm using hpple for parsing the html content.

I'm using this xPath query : //script

When the parser find a result for my query I'm using this function for extracting the video url :

NSDataDetector* detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray* matches         = [detector matchesInString:raw options:0 range:NSMakeRange(0, [raw length])];
NSString *finalUrl = [self urlMatchingRegexResults:matches withExtensionArray:[self videosExtensionsArray]];

if (finalUrl) {

        NSString *replacement = [NSString stringWithFormat:@"[%@]%@[/%@]",tag,finalUrl,tag];
        NSString *pattern = [NSString stringWithFormat:@"<script.*>.*%@.*</script>",finalUrl];

        NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:nil];
        NSArray *matches = [regex matchesInString:self.store options:0 range:NSMakeRange(0, self.store.length)];
        modifiedString = [regex stringByReplacingMatchesInString: modifiedString options:0 range:NSMakeRange(0, modifiedString.length) withTemplate:replacement];
}

where "raw" is the result of [TFHppleElement raw] where [self videosExtensionsArray] is an array of videos extensions :

- (NSArray *)videosExtensionsArray {


    static NSArray *videosExtensionsArray;
    static dispatch_once_t onceToken;

    dispatch_once(&onceToken, ^{

        videosExtensionsArray =  @[@"mp4",@"mov",@"avi",@"flv",@"mkv"];

    });

    return videosExtensionsArray;
}

The problem is that if i have multiple tag in my html content, my regex take the first opening tag and take the last closing tag.

How can i modify my regex to avoid this issue ?

NSString *pattern = [NSString stringWithFormat:@"<script.*>.*%@.*<\\/script>",finalUrl];

EDIT :

Content of the HTML :

<html><body><p style="text-align: center;"><a href="http://www.tuxboard.com/nba-jam-avec-gerald-green/gerald-green-nba-jam/" rel="attachment wp-att-171429">[IMG]http://www.tuxboard.com/photos/2014/03/Gerald-Green-NBA-Jam.jpg[/IMG]
</a>
</p>
<p><span id="more-171399"/><br/>
Si le jeu <strong>NBA Jam</strong> était édité cette année, le joueur des Phoenix Suns <strong>Gerald Green</strong> serait la star en couverture. L’arrière des Suns est à la fois un immense dunkeur avec une <a href="http://www.tuxboard.com/la-detente-de-gerald-green/" target="_blank">détente phénoménale</a>, mais aussi une fine gâchette.</p>
<p style="text-align: center;"><a href="http://www.tuxboard.com/nba-jam-avec-gerald-green/video-nba-jam-gerald-green/" rel="attachment wp-att-171431">[IMG]http://www.tuxboard.com/photos/2014/03/Video-NBA-Jam-Gerald-Green.jpg[/IMG]
</a>
</p>
<p>L’équipe de Phoenix l’a intégré dans le jeu <strong>NBA Jam</strong>, suite à ses <a href="http://www.tuxboard.com/plus-lourde-defaite-de-lhistoire-des-lakers-et-duel-spurs-heat/" target="_blank">performances hors normes face au Thunder</a> avec notamment 41 pts.    </p>
<p>On vous laisse savourer cette vidéo, avec une jolie pépite à la fin (on n’en dit pas plus…)</p>
<div id="tuxplayer">Chargement du player …</div>
<p><script type="text/javascript"><![CDATA[jwplayer("tuxplayer").setup({ flashplayer: "http://medias.tuxboard.com/playerv2.swf", file: "http://medias2.tuxboard.com/NBA_Jam_Gerald_Green.mp4",image: "http://www.tuxboard.com/photos/2014/03/NBA-Jam-Gerald-Green-on-Fire-640x357.jpg", height: 370,width: '100%', 'plugins': 'sharing-3'});]]></script></p>
<p>
Les dernières actions du bonhomme qui devrait remporter le titre du joueur ayant le plus progressé !</p>
<p style="text-align: center;">[IMG]http://www.tuxboard.com/photos/2014/03/Gerald-Green-Poster-Mason-Plumlee.gif[/IMG]
</p>
<p style="text-align: center;">[IMG]http://www.tuxboard.com/photos/2013/11/Dunk-Gerald-Green.gif[/IMG]
</p>
<p style="text-align: center;">[IMG]http://www.tuxboard.com/photos/2014/01/gerald-green-windmill.gif[/IMG]
</p>
<p><iframe width="640" height="360" src="http://www.youtube.com/embed/xnzQ3FWc7Oo?feature=oembed" frameborder="0" allowfullscreen=""/></p>
<p><iframe width="640" height="360" src="http://www.youtube.com/embed/Yyr6mkAbCQw?feature=oembed" frameborder="0" allowfullscreen=""/></p>
<p>Et surement son plus beau dunk :</p>
<p style="text-align: center;">
</p><div id="Gerald">Chargement du player …</div>
<p><script type="text/javascript"><![CDATA[
jwplayer("Gerald").setup({ flashplayer: "http://medias.tuxboard.com/playerv2.swf", file: "http://medias2.tuxboard.com/Gerald_Green_Windmill_Alley-Oop.mp4",image: "http://www.tuxboard.com/photos/2012/03/Video-Gerald-GreenAlley-Oop.jpg", height: 390,width: 640, 'plugins': 'sharing-3'});]]></script></p>
</body></html>

Log of the pattern :

<script.*?>.*http://medias2.tuxboard.com/NBA_Jam_Gerald_Green.mp4.*?</script>

1 Answer 1

1

Matching usually finds the longest match, you need the shortest which is indicated by *? for shortest zero or more. See Regular Expressions - ICU User Guide referenced by Apple's `NSRegularExpression" documentation.

Sign up to request clarification or add additional context in comments.

6 Comments

Like this : @"<script.*?>.*?%@.*?<\\/script>" ?
I see you are matching "<\/script>", is the backslash meant to be there? Failing that place your original HTML into TextWrangler (or other good editor, and build up your RE in that. Once it matches what you need transfer to code (translating the RE syntax if needed).
I don't understand... On freeformatter.com/regex-tester.html the regex seems to work but not when i execute my code... Even if i remove the "\" from the closing <scrip> tag
Add to your question a sample of the HTML containing what you are trying to match and the result of NSLoging pattern then someone might spot the remaining issue
If you are using hpple and doing the query //script then you should get and array of elements, one for each script tag. So your RE cannot match the content of more than one script tag at a time if you are applying it to these elements. To have a match spanning more than one script tag you must be operating on the full raw HTML. Even then as dot does not be default match a newline, to span script tags they'll both need to be on the same line. (cont...)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.