2

So yeah, suppose I have this piece of HTML

<p>And finally, how about some <a href="http://www.yahoo.com/">Links?</a></p>

and I want to access and modify the "And finally, how about some" part only, and get this:

<p>new text <a href="http://www.yahoo.com/">Links?</a></p>

I can't seem to figure out how. Here's what I've tried so far:

Document doc = null;
    try {
        doc = Jsoup.connect("http://csb.stanford.edu/class/public/pages/sykes_webdesign/05_simple.html").userAgent("Mozilla").get();
    } catch (IOException e1) {
        e1.printStackTrace();
    }
Elements d = doc.body().children();
Element e = d.get(20); //Assuming the HTML line in question is found at index 20
e.text("new text") //just outputs <p>new value</p>, which is not good for me

It seems that I can access it by

Element e = d.get(20);
System.out.println("\n"+e.ownText()); //outputs: And finally, how about some

but modifying it doesn't work.

Element e = d.get(20);
String s = e.toString().replace(e.ownText(), "new text");
e.text(s);
System.out.println(e.toString());

The output for the code above is

<p>&lt;p&gt;changed &lt;a href=&quot;http://www.yahoo.com/&quot;&gt;Links?&lt;/a&gt;&lt;/p&gt;</p>

It seems to be taking the tags as literals, but I want them as < or > because I then have to re build the webpage with the new text.

Any kind of help will be hugely appreciated.

1 Answer 1

1

How about something like

Element e = d.get(20);
e.text("new text"); 
e.append("<a href=\"http://www.yahoo.com/\">Links?</a>");//lets you add HTML.

If link is dynamic and you don't want to change it you can earlier store it and use later

Element e = d.get(20);
Element link = e.child(0);
e.text("new text"); 
e.append(link.toString());
Sign up to request clarification or add additional context in comments.

4 Comments

Not exactly what I was looking for, but your answer kinda gave me an idea. So, your help was indirectly appreciated, thank you. :)
Can I know what solution you ware looking for?
You see, the codes I posted were hardcoded. I used them only to test where I was going wrong. My real aim is to extract texts from a website, to translate those text and rebuild that webpage locally. I'm using a loop whereby I'm extracting children of elements and translating them, but by doing so, I couldn't seem to access the parent-element's own text. Like for eg a list: This is a list: 1. Item A 2. Item B I was only getting: This is a list: (<- which should've been set to "new text") 1. new text A 2. new text B
@SHA33 OK, makes sense. So basically my answer solves problem described in question, but only points you in direction in your real case. I was just wondering if I could answer your question better, but without that additional informations it seems very unlikely.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.