0

I have a scenario, where I am trying to extract the values for following text in html and store in a variable. As of now I have tried Cheerio But it doesn't seem to work.

HTML :

var htmlbody = <table style="width:100%; border: 1px solid #cccccc; border-collapse: collapse;" border=1 cellspacing="0" cellpadding="4"><tr><td style="background-color: #eeeeee; width: 200px;">Improvement Date (first date)</td><td>Nov 5, 2019 1:57:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document Call existed at</td><td>Nov 5, 2019 3:40:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document creation at</td><td>not available</td></tr><tr><td style="background-color: #eeeeee; width: 200px;">First document sent</td><td>not available</td></tr></table>

What I have tried here

   const cheerio = require('cheerio')
   var html = htmlbody
   const txt = $(html).text()
   console.log(txt)

I want to extract this below values from the html individually in exact order and store in a variable individually.

Nov 5, 2019 1:57:00 PM UTC
Nov 5, 2019 3:40:00 PM UTC
not available
not available

Note : HTML snippet that I have will not have any class or id assigned.

1 Answer 1

1

This can be achieved by parsing through the content. Please refer to the code below.

const cheerio = require('cheerio');

var htmlbody = '<table style="width:100%; border: 1px solid #cccccc; border-collapse: collapse;" border=1 cellspacing="0" cellpadding="4"><tr><td style="background-color: #eeeeee; width: 200px;">Improvement Date (first date)</td><td>Nov 5, 2019 1:57:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document Call existed at</td><td>Nov 5, 2019 3:40:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document creation at</td><td>not available</td></tr><tr><td style="background-color: #eeeeee; width: 200px;">First document sent</td><td>not available</td></tr></table>';

const $ = cheerio.load(htmlbody);

var html = $('table').children();
var tr = $("tr", html);
var val = {};
for(var i = 0; i < tr.length; i++) {
    var td = $("td", tr[i]);
    val[$(td[0]).html()] = $(td[1]).html();
}
// The extracted values are stored in key value pair
// 'Improvement Date (first date)': 'Nov 5, 2019 1:57:00 PM UTC',
// 'Document Call existed at': 'Nov 5, 2019 3:40:00 PM UTC',
// 'Document creation at': 'not available',
// 'First document sent': 'not available'
console.log(val);
Sign up to request clarification or add additional context in comments.

8 Comments

its giving error ReferenceError: $ is not defined
This is based on jQuery framework. You have to import the jquery script by adding the script - <script src="ajax.googleapis.com/ajax/libs/jquery/3.4.1/…>
@Thomas I have put up the code that will help you get the text above. See if this helps.
two questions here wrt. your code : 1-- Is the order of loop executed here is same as the one we have in elements ? 2-- I want to store the elements individually not in a stack how can I do that ?
I had to run a loop for iterating over the columns and then one loop to iterate over the row. When you iterate over the row, you have to sequentially extract values and store it in whatever way you like. Let me know if this answers your question.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.