0

Looking to scrape the comments out of a JS file. Was thinking I can create a function to input a .js file, perform a RegExp match, and output an array of strings using fs.readFile() and string.match();

Here's an over-simplified example:

I have two files class.js (to read) and parse.js (to perform the text parsing)

class.js:

/*
    by: Mike Freudiger
*/

/**
* one
* @returns 'Hello World'
*/
function one () {
        return 'Hello World';
}

alert();

/* end of file */

parse.js:

var fs = require('fs');

var file = fs.readFile('C:\\Users\\mikef\\Desktop\\node_regex_test\\class.js', 'utf8', function(err, doc) {
    var comments = doc.match(/(\/\*\*(.|\n)+?\*\/)/g);
    console.log(comments);
});

when I run node parse.js the console output is null.

However when I run the regex match on a multiline string, I get the expected output:

var doc = `/*
        by: Mike Freudiger
    */

    /**
    * one
    * @returns 'Hello World'
    */
    function one () {
            return 'Hello World';
    }

    alert();

    /* end of file */`

Any idea why the readFile() string would behave differently than a string literal?

...Also, I realize there may be a better way to get these comments out, with another npm package or something, but now I really just want to know why these two strings are different.

1
  • 1
    Could it be that your file uses \r\n or \r as line separators? Commented Feb 11, 2019 at 23:32

1 Answer 1

1

As mentioned by vsemozhetbyt, it seems that newlines used in class.js file are either \r\n or \r.

One of the simplest (and fastest) way to match these newlines would be to use [\s\S] instead of (.|\n) in your regex.

Thus you get:

var fs = require('fs');

var file = fs.readFile('C:\\Users\\mikef\\Desktop\\node_regex_test\\class.js', 'utf8', function(err, doc) {
    var comments = doc.match(/(\/\*\*[\s\S]+?\*\/)/g);
    console.log(comments);
});
Sign up to request clarification or add additional context in comments.

3 Comments

That works! That obviously means that new lines are represented differently in a file than a multi line string? Could anyone point to documentation?
Template literals always use \n, even though you copy a text from a \r\n file and paste it in the source code in a template literal, after the parsing it will contain \n separators.
You can check it here: exploringjs.com/es6/… . Or you can also check the more formal description from ECMAScript specification here : ecma-international.org/ecma-262/6.0/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.