Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.


Ok I'm back with some data to back my assumptions up. :-)

I did a quick and dirty test on your own post and on the first post I found on StackOverflowthe first post I found on StackOverflow, with a pretty advanced tool: wc.

Here is what I had after running wc on the text part and on the code part of those two examples:

First lets look at the English part:

  • The English part of your post (2635 chars, 468 words, 32 lines)
    • 5 chars/word, 82 chars/line, 14 words/line
  • The English part of the other post (1499 chars, 237 words, 12 lines)
    • 6 chars/word, 124 chars/line, 19 words/line

Pretty similar don't you think?

Now lets take a look at the code part!

  • The code part of your post (174 chars, 13 words, 3 lines)
    • 13 chars/word, 58 chars/line, 4 words/line
  • The code part of the other post (4181 chars, 287 words, 151 lines)
    • 14 chars/word, 27 chars/line, 2 words/line

See how not so different those metrics are, but more importantly, how different they are from the English metrics? And this is just using a limited tool. I am now sure you can get something really accurate by measuring more metrics (I'm thinking in particular of chars statistics).

I can haz cookie?

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.


Ok I'm back with some data to back my assumptions up. :-)

I did a quick and dirty test on your own post and on the first post I found on StackOverflow, with a pretty advanced tool: wc.

Here is what I had after running wc on the text part and on the code part of those two examples:

First lets look at the English part:

  • The English part of your post (2635 chars, 468 words, 32 lines)
    • 5 chars/word, 82 chars/line, 14 words/line
  • The English part of the other post (1499 chars, 237 words, 12 lines)
    • 6 chars/word, 124 chars/line, 19 words/line

Pretty similar don't you think?

Now lets take a look at the code part!

  • The code part of your post (174 chars, 13 words, 3 lines)
    • 13 chars/word, 58 chars/line, 4 words/line
  • The code part of the other post (4181 chars, 287 words, 151 lines)
    • 14 chars/word, 27 chars/line, 2 words/line

See how not so different those metrics are, but more importantly, how different they are from the English metrics? And this is just using a limited tool. I am now sure you can get something really accurate by measuring more metrics (I'm thinking in particular of chars statistics).

I can haz cookie?

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.


Ok I'm back with some data to back my assumptions up. :-)

I did a quick and dirty test on your own post and on the first post I found on StackOverflow, with a pretty advanced tool: wc.

Here is what I had after running wc on the text part and on the code part of those two examples:

First lets look at the English part:

  • The English part of your post (2635 chars, 468 words, 32 lines)
    • 5 chars/word, 82 chars/line, 14 words/line
  • The English part of the other post (1499 chars, 237 words, 12 lines)
    • 6 chars/word, 124 chars/line, 19 words/line

Pretty similar don't you think?

Now lets take a look at the code part!

  • The code part of your post (174 chars, 13 words, 3 lines)
    • 13 chars/word, 58 chars/line, 4 words/line
  • The code part of the other post (4181 chars, 287 words, 151 lines)
    • 14 chars/word, 27 chars/line, 2 words/line

See how not so different those metrics are, but more importantly, how different they are from the English metrics? And this is just using a limited tool. I am now sure you can get something really accurate by measuring more metrics (I'm thinking in particular of chars statistics).

I can haz cookie?

Post Made Community Wiki by Omar Kooheji
TESTED!
Source Link

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.


Ok I'm back with some data to back my assumptions up. :-)

I did a quick and dirty test on your own post and on the first post I found on StackOverflow, with a pretty advanced tool: wc.

Here is what I had after running wc on the text part and on the code part of those two examples:

First lets look at the English part:

  • The English part of your post (2635 chars, 468 words, 32 lines)
    • 5 chars/word, 82 chars/line, 14 words/line
  • The English part of the other post (1499 chars, 237 words, 12 lines)
    • 6 chars/word, 124 chars/line, 19 words/line

Pretty similar don't you think?

Now lets take a look at the code part!

  • The code part of your post (174 chars, 13 words, 3 lines)
    • 13 chars/word, 58 chars/line, 4 words/line
  • The code part of the other post (4181 chars, 287 words, 151 lines)
    • 14 chars/word, 27 chars/line, 2 words/line

See how not so different those metrics are, but more importantly, how different they are from the English metrics? And this is just using a limited tool. I am now sure you can get something really accurate by measuring more metrics (I'm thinking in particular of chars statistics).

I can haz cookie?

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.


Ok I'm back with some data to back my assumptions up. :-)

I did a quick and dirty test on your own post and on the first post I found on StackOverflow, with a pretty advanced tool: wc.

Here is what I had after running wc on the text part and on the code part of those two examples:

First lets look at the English part:

  • The English part of your post (2635 chars, 468 words, 32 lines)
    • 5 chars/word, 82 chars/line, 14 words/line
  • The English part of the other post (1499 chars, 237 words, 12 lines)
    • 6 chars/word, 124 chars/line, 19 words/line

Pretty similar don't you think?

Now lets take a look at the code part!

  • The code part of your post (174 chars, 13 words, 3 lines)
    • 13 chars/word, 58 chars/line, 4 words/line
  • The code part of the other post (4181 chars, 287 words, 151 lines)
    • 14 chars/word, 27 chars/line, 2 words/line

See how not so different those metrics are, but more importantly, how different they are from the English metrics? And this is just using a limited tool. I am now sure you can get something really accurate by measuring more metrics (I'm thinking in particular of chars statistics).

I can haz cookie?

More hints
Source Link

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and markother symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • size of words
  • chars used
  • ratio between alphabetic, numeric and mark characters
  • etc.

Maybe that alone could discriminate already between code and the rest.

The good news is: you already have plenty of data to build your statistics upon.

I would be curious to see what are the average metrics of written English on one side, and code on the other side.

  • length of paragraphs
  • length of lines
  • size of words
  • chars used
  • ratio between alphabetic, numeric and other symbol characters
  • number of symbols per word
  • etc.

Maybe that alone could discriminate already between code and the rest. At least I believe code, regardless of language, would show some noticeably different metrics in many cases.

The good news is: you already have plenty of data to build your statistics upon.

Source Link
Loading