2

I am trying to count words inside a Microsoft word document using JavaScript I managed to count word inside normal text file. is there a way to do it for a Microsoft word file using for example "JavaScript API for Office" or any other method.

check this plunk https://plnkr.co/edit/5TJfNiPxv275GuimdIlj?p=preview

<!DOCTYPE html>
<html>

  <head>
    <link rel="stylesheet" href="style.css">
    <script src="script.js"></script>
  </head>

  <body>
    <h2>Microsoft Word Document Count Words! Using JavaScript?</h2>
    <input type="file" accept=".doc,.txt,.docx" onchange="calculateWords()" id="textDoc"/>
    <div>
      <h1 id="fileInformation">File word Count after choose</h1>
    </div>
  </body>

</html>

JavaScript Code

function calculateWords() {
    if (window.File && window.FileReader && window.FileList && window.Blob) {
        console.log("words");
        var doc = document.getElementById("textDoc");
        var f = doc.files[0];
        if (!f) {
            alert("Failed to load file");
            //validate file types yet to come
        } else if (false) {
            alert(f.type + " is not a valid text file.");
        } else {
            var r = new FileReader();//create file reader object
            r.readAsText(f);//read file as text

            //attach function to execute when loading file finishes. 
            r.onload = function (e) {
                var contents = e.target.result;
                var res = contents.split(" ");
                console.log(res.length);
                var fileInformation = "word Count = "+res.length;
            var info = document.getElementById("fileInformation");
            info.innerHTML = fileInformation;

            }
        }
    } else {
        alert('The File APIs are not fully supported by your browser.');
    }
}

1 Answer 1

2

Microsoft documents are not like normal text files .. they are binary files.

As such you would have to decode them into pure text, remove all formatting, remove headers and footers and continue. This is a significance challenge.

Just as a simple example, this is an piece of an RTF file:

{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
This is some {\b bold} text.\par
}

.DOC files are much more complicated, but binary. DOCX files are different yet.

So, in a simple answer: No, you can't do it.

Sign up to request clarification or add additional context in comments.

3 Comments

yes I know its a binary file, but did you take a look at JavaScript API for Office? cant we use it to work with office files? dev.office.com/reference/add-ins/javascript-api-for-office
That question is far too broad for StackOverflow. Try it and let us know if you run into a specific questions.
currently I am studying it, I will edit my question when I get some results Thank you ver much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.