1

My goal is to detect the exact kind of newline a string object is using.

If you open a file, you can make it use universal newline support internally, with 'U' or 'rU'. However suppose you need to work on string objects that are not files. re would do but it sounds like an overkill.

Is it possible to determine the kind of newline of a string object?

Out of the many kinds of representations of EOL, I'm interested in three: "\n" you usually use, "\r\n" for Windows/DOS/CP/M/OS/2 and "\r" for legacy Macs < 10.

3 Answers 3

2

While writing this question I found the answer that eluded me before.

Built-in function str.splitlines(True) allows you to determine the newline. From the docs:

For example, 'ab c\n\nde fg\rkl\r\n'.splitlines()
returns ['ab c', '', 'de fg', 'kl'],

while the same call with splitlines(True)
returns ['ab c\n', '\n', 'de fg\r', 'kl\r\n'].

Note: It is not exactly what I was looking for, since the newline is appended to chunks, so if you know a better way, please tell!

Sign up to request clarification or add additional context in comments.

4 Comments

Wow, I have been using Python for a very long time and I never knew about this function. +1!
So you want to just split the string into lines or to know what newline sequence is used? And if second, what if several different sequences are used, like in your sample?
@AntonSavin I want to know what newline sequence is used. If there are different sequences, I'm interested in whatever comes closest to this task from the built-in functions and modules.
@naxa so what solution did you adopt finally?
1

Can't you check for the presence of possible line endings in an unsplit string? e.g.

def find_line_ending(s):
    if '\r\n' in s:  # Check this one first
        return '\r\n'
    if '\r' in s:
        return '\r'
    if '\n' in s:
        return '\n'
    return None  # No line endings in string

That at least means you know what will happen in case more than one type occurs in the same string.

Comments

1

Here is a simple function which counts number of occurences of all three types of newline sequences in one pass throughout the string:

def countNewlines(s):
    numRN = 0;
    numR = 0;
    numN = 0;
    prev = '';
    for c in s:
        if c == '\n':
            if prev == '\r':
                numRN += 1;
            else:
                numN += 1;
        elif prev == '\r':
            numR += 1;
        prev = c;
    if prev == '\r':
        numR += 1;
    return (numRN, numR, numN);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.