JavaScript `URL`: when to encode when setting `pathname`?

Question

When setting the pathname of a URL, when should you encode the value you are setting it to?

When I say URL I mean this API: https://developer.mozilla.org/en-US/docs/Web/API/URL

When I say "setting the pathname" I mean to do this:

url.pathname = 'some/path/to/a/resource.html';

Based on the MDN documentation, I would think the answer is "you shouldn't need to", as there is an example covering this case:

URLs are encoded according to the rules found in RFC 3986. For instance:
url.pathname = 'démonstration.html';
console.log(url.href); // "http://www.example.com/d%C3%A9monstration.html"

However, I have run into a case where it seems I do need to encode the value I am setting pathname to:

url.pathname = 'atest/New Folder1234/!@#$%^&*().html';
console.log(url.href);

I would expect this to output: http://example.com/atest/New%20Folder1234/!%40%23%24%25%5E%26*().html

But instead I am getting: https://example.com/atest/New%20Folder1234/!@%23$%^&*().html

It seems to get what I expect I have to do:

url.pathname = 'atest/New Folder1234/!@#$%^&*()'.split('/').map(encodeURIComponent).join('/')

What is going on here? I cannot find anything on the MDN doc page for either URL or pathname that explains this. I took quick look through RFC 3986, but that just seems to describe the URI syntax. I have run some experiments in an effort to find some sort of pattern to this problem, but nothing is standing out to me.

Phil · Accepted Answer · 2022-02-15 01:25:26Z

1

See the specification for path state, in particular...

UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

with the path percent-encode set being defined as...

the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).

and the query percent-encode set being...

the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).

_{you can keep diving down the rabbit-hole if you want but I feel that's enough}

Note that none of these sets include @$%^& which are the characters you pointed out.

Compare these to the specification for Encode which is much more thorough.

edited Feb 15, 2022 at 1:25

answered Feb 15, 2022 at 1:01

Phil

166k25 gold badges265 silver badges269 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

rolledback Over a year ago

Thanks for linking that doc. I think reading through that closes this case for me. It seems likely that the URL parser is probably terminating mid modification of the URL.

Phil Over a year ago

@rolledback not really sure what you mean. What makes you think is terminating early?

rolledback Over a year ago

sorry, I misread the algorithm. It is not terminating. It is just not encoding those characters because they aren't in the set to be encoded like you said. So while that answers the question of "how is this happening", do you have any insight into why? If I have a file on a server at the example path, I have to put the expected value in my browser address bar to reach it. Why would the standard be written to give a wrong (wrong in at least this case) result? Why don't the docs mention situations like this?

Collectives™ on Stack Overflow

JavaScript `URL`: when to encode when setting `pathname`?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related