1

When setting the pathname of a URL, when should you encode the value you are setting it to?

When I say URL I mean this API: https://developer.mozilla.org/en-US/docs/Web/API/URL

When I say "setting the pathname" I mean to do this:

url.pathname = 'some/path/to/a/resource.html';

Based on the MDN documentation, I would think the answer is "you shouldn't need to", as there is an example covering this case:

URLs are encoded according to the rules found in RFC 3986. For instance:

url.pathname = 'démonstration.html';
console.log(url.href); // "http://www.example.com/d%C3%A9monstration.html"

However, I have run into a case where it seems I do need to encode the value I am setting pathname to:

url.pathname = 'atest/New Folder1234/!@#$%^&*().html';
console.log(url.href);

I would expect this to output: http://example.com/atest/New%20Folder1234/!%40%23%24%25%5E%26*().html

But instead I am getting: https://example.com/atest/New%20Folder1234/!@%23$%^&*().html

It seems to get what I expect I have to do:

url.pathname = 'atest/New Folder1234/!@#$%^&*()'.split('/').map(encodeURIComponent).join('/')

What is going on here? I cannot find anything on the MDN doc page for either URL or pathname that explains this. I took quick look through RFC 3986, but that just seems to describe the URI syntax. I have run some experiments in an effort to find some sort of pattern to this problem, but nothing is standing out to me.

1 Answer 1

1

See the specification for path state, in particular...

UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

with the path percent-encode set being defined as...

the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).

and the query percent-encode set being...

the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).

you can keep diving down the rabbit-hole if you want but I feel that's enough

Note that none of these sets include @$%^& which are the characters you pointed out.

Compare these to the specification for Encode which is much more thorough.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for linking that doc. I think reading through that closes this case for me. It seems likely that the URL parser is probably terminating mid modification of the URL.
@rolledback not really sure what you mean. What makes you think is terminating early?
sorry, I misread the algorithm. It is not terminating. It is just not encoding those characters because they aren't in the set to be encoded like you said. So while that answers the question of "how is this happening", do you have any insight into why? If I have a file on a server at the example path, I have to put the expected value in my browser address bar to reach it. Why would the standard be written to give a wrong (wrong in at least this case) result? Why don't the docs mention situations like this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.