How to extract Full Name From a Url in Java

Question

i need a library to extract file's full name from it's URL(Direct Download Link). I want a powerful library. I use FileNameUtils from Apache commons, But this class does not support a lot of URLs.

I want a library which supports these Urls:

https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1

name : file_795f32460d111df334849ee8336e56ca.mp4

http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip

name : Jozve-Kamele-arbi.abp.zip

http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar

name : dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar

https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j

name: pdf.pdf

Can anyone help me? Thanks.

I apologize in advance if the grammar of my sentence is not correct. because I can't speak English well.

you can use regex

Derrick
– Derrick

2018-08-29 11:55:05 +00:00
Commented Aug 29, 2018 at 11:55 — Derrick
– Derrick, Commented Aug 29, 2018 at 11:55

gil.fernandes · Accepted Answer · 2018-08-29 12:03:43Z

You could actually also try to solve this problem with regular expressions (like e.g (?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b), if you have a list of the files extensions you are interested in.

Here is an example of such a method which extracts a file from a URL:

private static final String EXTENSIONS = "ez|aw|atom|atomcat|atomsvc|ccxml|cdmia|cdmic|cdmid|cdmio|cdmiq|cu|davmount|dbk|dssc|xdssc|ecma|emma|epub|exi|pfr|gml|gpx|gxf|stk|ipfix|jar|ser|class|js|json|jsonml|lostxml|hqx|cpt|mads|mrc|mrcx|mathml|mbox|mscml|metalink|meta4|mets|mods|mp4s|mp4|mxf|oda|opf|ogx|omdoc|oxps|xer|pdf|pgp|prf|p10|p7s|p8|ac|cer|crl|pkipath|pki|pls|cww|pskcxml|rdf|rif|rnc|rl|rld|rs|gbr|mft|roa|rsd|rss|rtf|sbml|scq|scs|spq|spp|sdp|setpay|setreg|shf|rq|srx|gram|grxml|sru|ssdl|ssml|tfi|tsd|plb|psb|pvb|tcap|pwn|aso|imp|acu|air|fcdt|xdp|xfdf|ahead|azf|azs|azw|acc|ami|apk|cii|fti|atx|mpkg|m3u8|swi|iota|aep|mpm|bmi|rep|cdxml|mmd|cdy|cla|rp9|c11amc|c11amz|csp|cdbcmsg|cmc|clkx|clkk|clkp|clkt|clkw|wbs|pml|ppd|car|pcurl|dart|rdz|fe_launch|dna|mlp|dpg|dfac|kpxx|ait|svc|geo|mag|nml|esf|msf|qam|slt|ssf|ez2|ez3|fdf|mseed|gph|ftc|fnc|ltf|fsc|oas|oa2|oa3|fg5|bh2|ddd|xdw|xbd|fzs|txd|ggb|ggt|gxt|g2w|g3w|gmx|kml|kmz|gac|ghf|gim|grv|gtm|tpl|vcg|hal|zmm|hbci|les|hpgl|hpid|hps|jlt|pcl|pclxl|sfd-hdstx|mpy|irm|sc|igl|ivp|ivu|igm|i2g|qbo|qfx|rcprofile|irp|xpr|fcs|jam|rms|jisp|joda|karbon|chrt|kfo|flw|kon|ksp|htke|kia|sse|lasxml|lbd|lbe|123|apr|pre|nsf|org|scm|lwp|portpkg|mcd|mc1|cdkey|mwf|mfm|flo|igx|mif|daf|dis|mbk|mqy|msl|plc|txf|mpn|mpc|xul|cil|cab|xlam|xlsb|xlsm|xltm|eot|chm|ims|lrm|thmx|cat|stl|ppam|pptm|sldm|ppsm|potm|docm|dotm|wpl|xps|mseq|mus|msty|taglet|nlu|nnd|nns|nnw|ngdat|n-gage|rpst|rpss|edm|edx|ext|odc|otc|odb|odf|odft|odg|otg|odi|oti|odp|otp|ods|ots|odt|odm|ott|oth|xo|dd2|oxt|pptx|sldx|ppsx|potx|xlsx|xltx|docx|dotx|mgp|dp|esa|paw|str|ei6|efif|wg|plf|pbd|box|mgz|qps|ptid|bed|mxl|musicxml|cryptonote|cod|rm|rmvb|link66|st|see|sema|semd|semf|ifm|itp|iif|ipk|mmf|teacher|dxp|sfs|sdc|sda|sdd|smf|sgl|smzip|sm|sxc|stc|sxd|std|sxi|sti|sxm|sxw|sxg|stw|svd|xsm|bdm|xdm|tao|tmo|tpt|mxs|tra|utz|umj|unityweb|uoml|vcx|vis|vsf|wbxml|wmlc|wmlsc|wtb|nbp|wpd|wqd|stf|xar|xfdl|hvd|hvs|hvp|osf|osfpvg|saf|spf|cmp|zaz|vxml|wgt|hlp|wsdl|wspolicy|7z|abw|ace|dmg|aam|aas|bcpio|torrent|bz|vcd|cfs|chat|pgn|nsc|cpio|csh|dgc|wad|ncx|dtb|res|dvi|evy|eva|bdf|gsf|psf|pcf|snf|arc|spl|gca|ulx|gnumeric|gramps|gtar|hdf|install|iso|jnlp|latex|mie|application|lnk|wmd|wmz|xbap|mdb|obd|crd|clp|mny|pub|scd|trm|wri|nzb|p7r|rar|ris|sh|shar|swf|xap|sql|sit|sitx|srt|sv4cpio|sv4crc|t3|gam|tar|tcl|tex|tfm|obj|ustar|src|fig|xlf|xpi|xz|xaml|xdf|xenc|dtd|xop|xpl|xslt|xspf|yang|yin|zip|adp|s3m|sil|eol|dra|dts|dtshd|lvp|pya|ecelp4800|ecelp7470|ecelp9600|rip|weba|aac|caf|flac|mka|m3u|wax|wma|rmp|wav|xm|cdx|cif|cmdf|cml|csml|xyz|ttc|otf|ttf|woff|woff2|bmp|cgm|g3|gif|ief|ktx|png|btif|sgi|psd|sub|dwg|dxf|fbs|fpx|fst|mmr|rlc|mdi|wdp|npx|wbmp|xif|webp|3ds|ras|cmx|ico|sid|pcx|pnm|pbm|pgm|ppm|rgb|tga|xbm|xpm|xwd|dae|dwf|gdl|gtw|mts|vtu|appcache|css|csv|n3|dsc|rtx|tsv|ttl|vcard|curl|dcurl|mcurl|scurl|sub|fly|flx|gv|3dml|spot|jad|wml|wmls|java|nfo|opml|etx|sfv|uu|vcs|vcf|3gp|3g2|h261|h263|h264|jpgv|ogv|dvb|fvt|pyv|viv|webm|f4v|fli|flv|m4v|mng|vob|wm|wmv|wmx|wvx|avi|movie|smv|ice";

private static final Pattern FILE_DETECT = Pattern.compile("(?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b");

public static Optional<String> extractFileFrom(String url) {
    Matcher matcher = FILE_DETECT.matcher(url);
    return (matcher.find()) ? Optional.of(matcher.group(1)) : Optional.empty();
}

And here is a test which demonstrates how to use the method above:

public static void main(String[] args) throws ParseException {
    List<String> strings = Arrays.asList(
            "https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1",
            "http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip",
            "http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar",
            "https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j",
            "https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.PDF&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j");
    strings.stream().map(s -> extractFileFrom(s)).collect(Collectors.toList())
        .forEach(System.out::println);
}

If you execute the main method you will see this on the console:

Optional[file_795f32460d111df334849ee8336e56ca.mp4]
Optional[Jozve-Kamele-arbi.abp.zip]
Optional[Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar]
Optional[pdf.pdf]
Optional[pdf.PDF]

Khemraj Sharma · Accepted Answer · 2018-08-29 10:20:51Z

1

I use this method, hope it helps you too. It will parse from question marks, hash too.

public static String parseFileNameFromUrl(String url) {
    if (url == null) {
        return "";
    }
    try {
        URL res = new URL(url);
        String resHost = res.getHost();
        if (resHost.length() > 0 && url.endsWith(resHost)) {
            // handle ...example.com
            return "";
        }
    } catch (MalformedURLException e) {
        e.printStackTrace();
        return "";
    }

    int startIndex = url.lastIndexOf('/') + 1;
    int length = url.length();

    // find end index for ?
    int lastQuestionMarkPos = url.lastIndexOf('?');
    if (lastQuestionMarkPos == -1) {
        lastQuestionMarkPos = length;
    }

    // find end index for #
    int lastHashPos = url.lastIndexOf('#');
    if (lastHashPos == -1) {
        lastHashPos = length;
    }

    // calculate the end index
    int endIndex = Math.min(lastQuestionMarkPos, lastHashPos);
    return url.substring(startIndex, endIndex);
}

answered Aug 29, 2018 at 10:20

Khemraj Sharma

59.1k30 gold badges215 silver badges229 bronze badges

1 Comment

Hadi Over a year ago

Thank you so much. but this method throws java.lang.StringIndexOutOfBoundsException: String index out of range: -57 for the third and the fourth Url..

Collectives™ on Stack Overflow

How to extract Full Name From a Url in Java

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related