1

I am scraping a site and I found this

<table>
  <tr>
    <td>
      <b>Status:</b>ACTIVE;
      <b>Type:</b>CN - CONSTRUCTION
      <b>Added:</b>02/24/2012
    </td>
  </tr>
</table>

How do I get status, type, and added individually?

I know I will get downvotes because I am not posting any TRIED CODE... but I cant even seem to think what to try!

This website has POOR HTML structure and I cant seem to find any way.

5
  • What is meaning of I have the main TD as object? Commented Aug 30, 2016 at 17:37
  • sorry, ignore that ... that is confusing ... I removed that from question ... now please see my questoin again. Commented Aug 30, 2016 at 17:38
  • I am on mobile so cannot post a solution as it's difficult to type code. I would give a basic idea. Take the td innerHtml now split this string by <b> you will end up with Status:</b>ACTIVE; as 0 index and Type:</b>CN - CONSTRUCTION as index 1 etc.. now go ahead and split each one of this by </b> and now from the result the string at index 1 must be your required value Commented Aug 30, 2016 at 17:39
  • 1
    @Reddy great ... I am trying now Commented Aug 30, 2016 at 17:41
  • Also you can use regex for this kind of logic. You can just run the regex against the innerHtml to match all the text between </b> and <b> .. this will give you array of matches and all of them are the required data Commented Aug 30, 2016 at 17:48

2 Answers 2

2
  • Use jQueryElement.text() to grab all the text.
  • Use String#spplit to split the string

var text = $('#content').text();
var split = text.trim().split('\n');
split.forEach(function(el) {
  var splitAgain = el.split(':');
  console.log("Key:  " + splitAgain[0].trim() + "   Value:  " + splitAgain[1].trim());
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<table>
  <tr>
    <td id="content">
      <b>Status:</b>ACTIVE;
      <b>Type:</b>CN - CONSTRUCTION
      <b>Added:</b>02/24/2012
    </td>
  </tr>
</table>

Sign up to request clarification or add additional context in comments.

Comments

1

Javascript nextSibling property get next text sibling of element. You can select b elements in td and get next text of it.

$("td > b").each(function(){	
    console.log(this.innerText +" = "+ this.nextSibling.nodeValue.trim());
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table>
  <tr>
    <td>
      <b>Status:</b>ACTIVE;
      <b>Type:</b>CN - CONSTRUCTION
      <b>Added:</b>02/24/2012
    </td>
  </tr>
</table>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.