4

I'm trying to count the actual characters in a string regardless of their meaning. In example, \n I want to count slash (\ and n) == 2 characters and not as a <EOL> = 1

So a string like a\nb will equal 4 and not 3

Details (1) \n

var a = `a
b`;

console.log(a.length)
>>> 3

But in python

>>> a = r'a\nb'
>>> len(a)
4

(2) smart-quotes Javascript

var a = 'a“b';
console.log(a.length);
>>> 3

Python

>>> b = 'a“b'
>>> len(b)
5

I've tried many functions ( like braking the string to array, but \n is in a single cell )

Any ideas ?

6
  • 2
    You can't count escape characters in a string. If you have a string literal with \t it is compiled identically to a string with a literal tab character, for example. Commented Sep 27, 2018 at 14:35
  • 2
    There isn't a slash character in the string. It only appears in the JavaScript source code. Commented Sep 27, 2018 at 14:35
  • Consider two strings, "\u00e6" and "æ". These are identical strings. Once defined, there is no way to reverse the process. See: jsbin.com/mubajuf/edit?html,output Commented Sep 27, 2018 at 14:47
  • Regular Python strings are *not* unicode, they are just plain bytes. So is three bytes long but 1 character long. Commented Sep 27, 2018 at 14:50
  • 2
    You can do String.raw`a\nb`.length (same way like Python's r) Commented Sep 27, 2018 at 14:57

2 Answers 2

2

OK, I've tried to answer in the comment but it wasn't pleasant to read.

The issue is split in two major problems:

  • counting raw chars
  • counting ASCII length instead of UTF-16 one

I will answer to both issues with examples.

counting raw chars

The only way to consider '\n' string as two chars with one backslash and one n letter, is to use a function tag and a template literal.

const rawlength = tpl => tpl.raw.join('').length;

`a\nb`.length;   // 3
rawlength`a\nb`; // 4

You can copy and paste above code and read the two different results. Bear in mind, not using parenthesis with rawlength is not a typo, but how template literals work.

Also bear in mind if you use a template literal like the following one

`a
b`

its length will still be 3 because there is indeed no backslash in there, so the \n char is considered one char as it should be.

In Python, that would be equivalent

len("""a
b""")

That' a 3.

Edit: the Python r in JavaScript

The equivalent of r in JavaScript would be:

const r = (t, ...v) => {
  const result = [t.raw[0]];
  const length = t.length;
  for (let i = 1; i < length; i++)
    result.push(v[i - 1], t.raw[i]);
  return result.join('');
};

So that:

r`a\nb`

Would produce what you expect.

You can add the following trick around result.join('') to also have the length as ASCII/bytes instead.

counting ASCII length instead of UTF-16 one

This is an old trick to always count bytes:

unescape(encodeURIComponent('a“b')).length;

That's a 5, because encodeURIComponent would return an UTF-8 url friendly version of the text, and unescape will create a char per each %XX encountered.

In this case 'a“b' becomes a%E2%80%9Cb which is ab plus 3 url encoded chars.

Sign up to request clarification or add additional context in comments.

3 Comments

so rawlength`a\nb`; // 4 is exactly what I need, problem, is, it's inside a variable :/ so how do I run this with variable a ?
Like in Python, where you need to use r upfront (which I've edited in my answer so it's clear how to do that in JS too), if you assign s = 'a\nb' that already gets translated into 3 chars string so you either use template literal tag at definition time, or you are out of luck 'cause escaped chars are translated, same as \x27 is a quote in JS, not a 4 length string, \n and others are no different. However, if you literally write a\nb in an input field, that arrives already as 4 length string so I'm not sure what's your issue.
I'll mark the answer, as the r is a helpful match just like coding in Python r ... and I understand what you were trying to say now ! thank you very much !
0

Did you try replace the \n to a double countable character? Something like:

'a\nb'.replace('\n', '--').length # return 4

1 Comment

not exactly, because \n was just an example, it can be any hidden character outthere ... and I don't want to map them manually in code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.