Replacing specific strings in HTML file

Question

I need to translate some of HTML page content. I have a lot of HTML documents as a list of files and a map with translations like this:

List<File> files
Map<String, String> translations

Only strings in specific tags (p, h1..h6, li) have to be translated. I want to end up with the same files like at the beginning but with replaced strings.

Two solutions that don't work:

Replacing - because I don't want to translate strings inside comments or in javascript, another problem is that one string with original text can be a part of another string with original text.
Parsing libraries like Jsoup - because it cleans, fixes dom structure and I want to have unmodified HTML structure.

Any solutions?

You have basically described a 'write your own' effort. so what have you tried, and what didn't work? — Randy
– Randy, Commented Dec 9, 2013 at 15:17
How about XML parser? Also could you post some example of data that you want to replace and data you want to prevent from replacing? — Pshemo
– Pshemo, Commented Dec 9, 2013 at 15:19

aditsu quit because SE is EVIL · Accepted Answer · 2013-12-09 15:30:57Z

1

You pretty much have to use a proper html parser (which fixes the dom structure), because otherwise there's no way to tell where an element starts and where it ends. There are all sorts of special cases and different types of broken html and if you want to handle them all, you are basically implementing a full html parser.

The only other way I can think of (and which is often used) is to use placeholders in the original files, such as <h1>${title}</h1> <p>${introduction}</p> etc, and find and replace them directly, but I guess that would require a lot of work to change the files if you don't already have them in this form.

answered Dec 9, 2013 at 15:30

aditsu quit because SE is EVIL

4,1092 gold badges37 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replacing specific strings in HTML file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related