Consider this method:
import javax.ws.rs.core.UriBuilder;
import java.net.URI;
public class AnyParamRedirecter {
/**
* Generate a URI to target.com with one query parameter "extra"
*/
public URI generateNestedRedirectUrl(String extraParam) {
return UriBuilder.fromPath("http://target.com")
// encodes all special characters, except when they seem already be an encoding
.queryParam("extra", extraParam)
.build();
}
}
It seems this code has a bug such that when the argument extraParam has encoded values but still needs encoding (double-encoding), queryParam() will not do the right thing.
(Note that queryParam does single encoding of all characters that need encoding.)
As an example:
@Test
public void doubleEncodeTest() {
final String deeplyEmbeddedUrlEncoded = URLEncoder.encode("http://other2.com?x=1?y=2", StandardCharsets.UTF_8);
assertEquals(deeplyEmbeddedUrlEncoded, "http%3A%2F%2Fother2.com%3Fx%3D1%3Fy%3D2");
final String embeddedUrl = "http://other.com?c=" + deeplyEmbeddedUrlEncoded;
var redirecter = new AnyParamRedirecter();
URI result = redirecter.generateNestedRedirectUrl(embeddedUrl);
String nestedUrl = result.getQuery().substring("extra=".length());
// assert we get the same thing back that we put it
assertEquals(embeddedUrl, nestedUrl);
}
Gives:
Expected :http://other.com?c=http%3A%2F%2Fother2.com%3Fx%3D1%3Fy%3D2
Actual :http://other.com?c=http://other2.com?x=1?y=2
where now obviously x and y params belong to "other.com", not "other2.com" url.
I could just always encode .queryParam("extra", URLEncoder.encode(extraParam, StandardCharsets.UTF_8)), thus always double-encode, but why would queryParam method have such a weird logic where it refuses to re-encode percent encoded values again?
The javadoc for abstract class javax.ws.rs.core.UriBuilder says:
URI template-aware utility class for building URIs from their components. See {@link javax.ws.rs.Path#value} for an explanation of URI templates.
Builder methods perform contextual encoding of characters not permitted in
* the corresponding URI component following the rules of the
* <a href="http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1">application/x-www-form-urlencoded</a>
* media type for query parameters...
Note that only characters not permitted in a particular component are subject to encoding so,
e.g., a path supplied to one of the {@code path} methods may contain matrix parameters or
multiple path segments since the separators are legal characters and will not be encoded.
Percent encoded values are also recognized where allowed and will not be double encoded.
To me it seems like a specification bug, to both say "utility class for building URIs from their components" and to make an exception "Percent encoded values are also recognized where allowed and will not be double encoded".
One can either specify that a class will build URIs from it's components, or that it NOT correctly encode strings to be correctly decoded later. Encoding some special values but NOT encoding percent-encoded values seems like a specification that never makes any sense, unless one only ever expected to handle values that either only contain non-escaped special symbols or only escaped special symbols (And if that were the case, it would have been useful to fail when that assumption does not hold, instead of returning an irreversible incorrect hodge-podge).
The method javadoc itself gives zero warning of this behavior:
Append a query parameter to the existing set of query parameters. If multiple values are supplied the parameter will be added once per value.
Parameters:
name - the query parameter name, may contain URI template parameters
values - the query parameter value(s), each object will be converted to a String using its toString() method. Stringified values may contain URI template parameters.
Returns:
the updated UriBuilder
Throws:
java.lang.IllegalArgumentException - if name or values is null