Explanation
In my log I have many of such lines:
"[14/Oct/2014:13:02:15 +0200]","70","-","192.168.1.1","/API-1.2/testeo_keyword/vcn,ge/channel,rateber/site,bla_.de/keyword,null/px2.js","?ts=0.3054514767395726", "200","+", "http://www.bla.de/Arzt/Baden-W%C3%BCrttemberg/328-Heidelberg/Neurochirurgie/","Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50527; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2; MS-RTC LM 8)","-"0/hurlau,superman;tile,4;status,0/pxl.js","?ts=0.3001205851715877", "200","+", "http://www.super.de/news/audio-video/carl-zeiss-praesentiert-3d-brille-100-euro-742545.html","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0","-"
What to capture?
From the n - 2nd field (the one with the URL) I need to capture the domain name and for every domain name=super.de I need to collect the whole URL.
What do I have?
I have this RegEx: http://regexr.com/39q1b where I managed to capture all I need, but is this correct the way I am doing it? ((match)match). Later I need to, everywhere where domainname="super.de, collect the whole URL. Also the www is optional. Note: The first URL occurence (www.bla.de) needs to be ignored.