2

Below my sample html file:

some text here <img src="http://site.com/7b399e20/77165/5fa/2a31ffb8.jpg"/> sometext here

some text here <img src="http://site.com/7b399e20/2a31ffb8.jpg"/> sometext here

some text here <img src="http://site.com/7b399e20/2a31ffb8.png"/> sometext here

some text here <img src="http://site.com/2a31ffb8.jpeg"/> sometext here

how do I make such a transformation:

some text here <img src="web/2a31ffb8.jpg"/> sometext here

some text here <img src="web/2a31ffb8.jpg"/> sometext here

some text here <img src="web/2a31ffb8.png"/> sometext here

some text here <img src="web/2a31ffb8.jpeg"/> sometext here

Thanks

3 Answers 3

3

I'll use Perl, because I know the syntax without having to look it up, but it would be very similar in awk or sed, as tekknolagi says:

perl -pi -e 's|http://site.com/.*([^/]+)"/>|web/$1"/>|;'  <filename>

This will preserve everything between the last / and the "

Sign up to request clarification or add additional context in comments.

7 Comments

It will fail on lines when there are " chars after closing the img src="..." part as this regex can be greedy.
+1. Perl's the way to go for this kind of thing although your regexp would munge other URLs in the file as well which might not be desired.
@ZsoltBotykai: true. I have edited the script to make it pickier.
@ColinFine still no good. It will match the whole string (e.g.): http://site.com/valamisemmi.jpg" akarmi " barmi " semmi "
@ZsoltBotykai: No, only where the " is immediately followed by />
|
1
sed -i 's:\(img src="\).*\(/[^"/]\+\.[^"]\+"\):\1web\2:' INPUTFILE

Might do it in place.

HTH

Comments

-1

What about using perl script? I have put your sample text into file foo.txt and here is the result:

$ cat foo.txt | perl -pe 's#http://.*/([a-z0-9A-Z]*\.)#web/\1#'
some text here <img src="web/2a31ffb8.jpg"/> sometext here
some text here <img src="web/2a31ffb8.jpg"/> sometext here
some text here <img src="web/2a31ffb8.png"/> sometext here
some text here <img src="web/2a31ffb8.jpeg"/> sometext here

1 Comment

Won't work correctly if there are more non img related links are on the line (greedy regex).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.