410

In Java, I want to convert this:

https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type

To this:

https://mywebsite/docs/english/site/mybook.do&request_type

This is what I have so far:

class StringUTF 
{
    public static void main(String[] args) 
    {
        try{
            String url = 
               "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do" +
               "%3Frequest_type%3D%26type%3Dprivate";

            System.out.println(url+"Hello World!------->" +
                new String(url.getBytes("UTF-8"),"ASCII"));
        }
        catch(Exception E){
        }
    }
}

But it doesn't work right. What are these %3A and %2F formats called and how do I convert them?

4
  • @Stephen .. Why can't a url be UTF-8 encoded String .. ? Commented May 26, 2011 at 12:14
  • The problem is that just because the URL can be UTF-8, the question really has nothing to do with UTF-8. I've edited the question suitably. Commented May 26, 2011 at 12:19
  • It could be (in theory) but the string in your example is not a UTF-8 encoded String. It is a URL-encoded ASCII string. Hence the title is misleading. Commented May 26, 2011 at 12:20
  • It is also worth noting that all the characters in the url string are ASCII, and this is also true after the string has been URL decoded. '%' is an ASCII char and %xx represents an ASCII char if xx is less than (hexadecimal) 80. Commented May 26, 2011 at 12:34

11 Answers 11

775

This does not have anything to do with character encodings such as UTF-8 or ASCII. The string you have there is URL encoded. This kind of encoding is something entirely different than character encoding.

Try something like this:

try {
    String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
    // not going to happen - value came from JDK's own StandardCharsets
}

Java 10 added direct support for Charset to the API, meaning there's no need to catch UnsupportedEncodingException:

String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8);

Note that a character encoding (such as UTF-8 or ASCII) is what determines the mapping of characters to raw bytes. For a good intro to character encodings, see this article.

Sign up to request clarification or add additional context in comments.

10 Comments

The methods on URLDecoder are static so you don't have to create a new instance of it.
@Trismegistos Only the version where you don't specify the character encoding (the second parameter, "UTF-8") is deprecated according to the Java 7 API documentation. Use the version with two parameters.
If using java 1.7+ you can use the static version of the "UTF-8" string: StandardCharsets.UTF_8.name() from this package: java.nio.charset.StandardCharsets. Relevant to this: link
Be careful with this. As noted here: blog.lunatech.com/2009/02/03/… This is not about URLs, but for HTML form encoding.
Doesn't work if there is a '+' in url. See bugs.openjdk.java.net/browse/JDK-8179507
|
82

The string you've got is in application/x-www-form-urlencoded encoding.

Use URLDecoder to convert it to Java String.

URLDecoder.decode( url, "UTF-8" );

Comments

59

This has been answered before (although this question was first!):

"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."

As URL class documentation states:

The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL().

The URLEncoder and URLDecoder classes can also be used, but only for HTML form encoding, which is not the same as the encoding scheme defined in RFC2396.

Basically:

String url = "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type";
System.out.println(new java.net.URI(url).getPath());

will give you:

https://mywebsite/docs/english/site/mybook.do?request_type

8 Comments

In Java 1.7 the URLDecoder.decode(String, String) overload is not deprecated. You must be referring to the URLDecoder.decode(String) overload without the encoding. You might want to update your post for clarification.
This answer is misleading; that block quote has nothing to do with the deprecation. The Javadoc of the deprecated method states, and I actually quote @deprecated The resulting string may vary depending on the platform's default encoding. Instead, use the decode(String,String) method to specify the encoding.
getPath() for URIs only returns the path part of the URI, as noted above.
Unless I'm mistaken, the "path" is known to be that part of a URI after the authority part (see: en.wikipedia.org/wiki/Uniform_Resource_Identifier for definition of path) - it seems to me the behaviour I am seeing is the standard/correct behaviour. I'm using java 1.8.0_101 (on Android Studio). I'd be curious to see what you get as "getAuthority()" is called. Even this article/example seems to indicate that path is only the /public/manual/appliances part of their URI:quepublishing.com/articles/article.aspx?p=26566&seqNum=3
@Pelpotronic The code in the post actually does print the output that it shows (at least for me). I think the reason for this is that, because of the URL encoding, the URI constructor is actually treating the entire string, (https%3A%2F...), as just the path of a URI; there is no authority, or query, etc. This can be tested by calling the respective get methods on the URI object. If you pass the decoded text to the URI constructor: new URI("https://mywebsite/do....."), then calling getPath() and other methods will give correct results.
|
19

%3A and %2F are URL encoded characters. Use this java code to convert them back into : and /

String decoded = java.net.URLDecoder.decode(url, "UTF-8");

2 Comments

it not convert %2C too, it's (,)
this needs to be wrapped in a try/catch block.. read more about checked exceptions (this one) vs unchecked stackoverflow.com/questions/6115896/…
8
public String decodeString(String URL)
    {

    String urlString="";
    try {
        urlString = URLDecoder.decode(URL,"UTF-8");
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block

        }

        return urlString;

    }

1 Comment

Could you please elaborate more your answer adding a little more description about the solution you provide?
7
 try {
        String result = URLDecoder.decode(urlString, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

Comments

6

I use apache commons

String decodedUrl = new URLCodec().decode(url);

The default charset is UTF-8

Comments

2
import java.io.UnsupportedEncodingException;
import java.net.URISyntaxException;

public class URLDecoding { 

    String decoded = "";

    public String decodeMethod(String url) throws UnsupportedEncodingException
    {
        decoded = java.net.URLDecoder.decode(url, "UTF-8"); 
        return  decoded;
//"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
    }

    public String getPathMethod(String url) throws URISyntaxException 
    {
        decoded = new java.net.URI(url).getPath();  
        return  decoded; 
    }

    public static void main(String[] args) throws UnsupportedEncodingException, URISyntaxException 
    {
        System.out.println(" Here is your Decoded url with decode method : "+ new URLDecoding().decodeMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type")); 
        System.out.println("Here is your Decoded url with getPath method : "+ new URLDecoding().getPathMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest")); 

    } 

}

You can select your method wisely :)

Comments

2

If it is integer value, we have to catch NumberFormatException also.

try {
        Integer result = Integer.valueOf(URLDecoder.decode(urlNumber, "UTF-8"));
    } catch (NumberFormatException | UnsupportedEncodingException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

Comments

1

Using java.net.URI class:

public String getDecodedURL(String encodedUrl) {
    try {
        URI uri = new URI(encodedUrl);
        return uri.getScheme() + ":" + uri.getSchemeSpecificPart();
    } catch (Exception e) {
        return "";
    }
}

Please note that exception handling can be better, but it's not much relevant for this example.

Comments

-1

I was having this problem too and came here as an answer. But I used the code of the friend whose question was approved, it didn't work. I tried something different and it worked, so I'm sharing the following line of code in case it helps.

URLDecoder.decode(URLDecoder.decode(url, StandardCharsets.UTF_8)))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.