0

I have a question regarding my last post

How to extract texts from html markup

Oriol's answer help me a lot on separate the html markup between table structure.

However, there is another issue.

var project =[''];

$('#htmlData').contents().each(function(){
    if($(this).is('table')){
         //do something with table
         project.push['end of table'];  //this line of codes is the problem....
    }else{
        project[project.length-1] += (
            this.nodeType === 3  ?  $(this).text()  :
            (this.nodeType === 1  ?  this.outerHTML  :  '')
        );
    }
});

for(var i=0; i<project.length; ++i){
    project[i] = project[i].replace(/\s+/g,' ') // Collapse whitespaces
    .replace(/^\s/,'') // Remove whitespace at the beginning
    .replace(/\s$/,''); // Remove whitespace at the end
}

Lets say I have html data like the following

<em>first part</em> of texts here

    <table>
    ......
    ......
    </table>

<em>second part</em> of texts

My project array ends up like:

 //2 elements
    ('<em>first part</em> of texts here','end of table <em>second part</em> of texts) 

but my desired result is

  //3 elements
    ('<em>first part</em> of texts here','end of table','<em>second part</em> of texts) 

end of table is what I push to array if the selector loop to table markup.

How do I accomplish this? Thanks for the help!

1 Answer 1

1

The problem is that you are not creating a new position in the array after the table has been processed. project.length-1 will always refer to the "end of table" position in this case so it's just concatenating the next "non-table" data with it.

try this:

    var project =[''],
    j = 0;

$('#htmlData').contents().each(function(){
    if($(this).is('table')){
         //do something with table
         project.push('end of table');  //this line of codes is the problem....
         j=project.length;
    }else{
        if (project[j] == undefined) project[j] = "";
        project[j] += (
            this.nodeType === 3  ?  $(this).text()  :
            (this.nodeType === 1  ?  this.outerHTML  :  '')
        );

    }
});
for(var i=0; i<project.length; ++i){
    project[i] = project[i].replace(/\s+/g,' ') // Collapse whitespaces
    .replace(/^\s/,'') // Remove whitespace at the beginning
    .replace(/\s$/,''); // Remove whitespace at the end
}
console.log(project);

I'm sure there's a cleaner way but this should give you the idea.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.