1

I converted a Drupal website to a static site using sitesucker (Mac tool). I am running MAMP 6 locally. Some images are missing live, probably because of an encoding issue.

For instance, I have this file:

Akzeptanz_GoA3+_Vorschaubild_Symbolbild_pexels-jéshoots-253647.jpg

And both locally and live it is embedded exactly like this:

<img src="/path/Akzeptanz_GoA3+_Vorschaubild_Symbolbild_pexels-je%CC%81shoots-253647.jpg">

Locally the image works, Live it is 404. What could be the reasons and possible solutions to this different behaviour? Live is a regular LAMP stack.

This behavior consists when I empty both htaccess files.

13
  • Reason is probably either a different character encoding used by the underlying file system; or a problem with Unicode normalization. The character é in UTF-8, URL-encoded, would result in %C3%A9. In your URL however you have a "standard" e, followed by %CC%81, and the latter is the encoding of the COMBINING ACUTE ACCENT. Commented Jan 24, 2024 at 11:57
  • @CBroe you again! :) Thanks! Do you have any idea how I could go on and debug this? Commented Jan 24, 2024 at 11:59
  • @CBroe I found out locally I run Darwin, Live is Linux (uname) Commented Jan 24, 2024 at 12:02
  • Does it work on your live system, with /path/Akzeptanz_GoA3+_Vorschaubild_Symbolbild_pexels-j%C3%A9shoots-253647.jpg? Commented Jan 24, 2024 at 12:04
  • 1
    @CBroe yes it loads. Chrome shows Akzeptanz_GoA3+_Vorschaubild_Symbolbild_pexels-jéshoots-253647.jpg in the address bar though. When I change it in the src attribute it stays like that and loads Commented Jan 24, 2024 at 12:05

1 Answer 1

0

"Fixing" the URL encoding in the HTML source would be the preferred solution, as mentioned in comments (from e%CC%81 to %C3%A9). (Or even "normalising" the filename, so URL-encoding is not an issue.)

However, if that is not an option then you could perhaps do a search/replace in .htaccess.

For example, near the top of the root .htaccess file:

# Replace "e%CC%81" with "%C3%A9"
RewriteCond %{THE_REQUEST} ^GET\s/(.*)e%CC%81([^\s]*)
RewriteRule ^path/ %1\%C3\%A9%2 [NE,L]

Include a /path/ in the RewriteRule pattern (as above) to limit the number of URLs tested.

We match (and capture) the requested URL in the preceding condition against THE_REQUEST server variable, which preserves the URL-encoded URL that was actually requested.

The literal % in the substitution string should be backslash-escaped to avoid potentially being interpreted as a backreference (although since they are not followed by a digit, that should be OK).


Chrome shows Akzeptanz_GoA3+_Vorschaubild_Symbolbild_pexels-jéshoots-253647.jpg in the address bar though. When I change it in the src attribute it stays like that and loads

Chrome will often show the more friendly URL-decoded URL in the address bar, even if the underlying request is URL-encoded.

When you change it in the src attribute (to the URL-decoded version I assume) then the browser automatically "fixes" it and URL-encodes the URL in the HTTP request (you can examine the Network traffic in the browser dev tools).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks so far! The RewriteRules throw a 500 error, though
@Alex Try changing the L flag for END. You could also try (temporarily) changing this to an external redirect (eg. RewriteRule ^path/ /%1\%C3\%A9%2 [R,NE,L] - note the slash prefix on the substitution string). However, I can't immediately see an error with this rule. Can you please check your server's error log for the details of the 500 error. Aside: If this only applies to .jpg files then the regex could be further restricted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.