Porting a Python 2.x file like object to Python 3

Question

I'm working on improving Python 3.X support for PyFilesystem. It's an abstraction for filesystems. Each filesystem object has an open method that returns a file-like object.

The problem I'm facing is that the open method works open on Python 2.X, but I would like it to work like io.open which returns one of a number of binary or text mode streams.

What I could use, is a way of taking a Python 2.X file-like object, and returning an appropriate io stream object that reads/writes to the underlaying file-like object (but handles buffering/unicode etc if required).

I was thinking something like the following:

def make_stream(file_object, mode, buffering, encoding):
    # return a io instance

I can't see any straight forward way of doing that with the stdlib. But it strikes me as something the io module must be doing under the hood, since its a software layer that provides the buffering/unicode functionality.

Martijn Pieters · Accepted Answer · 2013-03-25 13:44:27Z

1

Python 2 includes the same io library too.

Use from io import open to work the same across Python versions.

Your API should then offer a open() equivalent (called open() or make_stream()) that uses the io class library to provide the same functionality.

All that you need to do is create a class that implements the io.RawIOBase ABC, then use the other classes provided by the library to add buffering and text handling as needed:

import io

class MyFileObjectWrapper(io.RawIOBase):
    def __init__(self, *args):
        # do what needs done

    def close(self):
        if not self.closed:
            # close the underlying file
        self.closed = True

    # ... etc for what is needed (e.g. def read(self, maxbytes=None), etc.

def open(fileobj, mode='r', buffering=-1, encoding=None, errors=None, newline=None):
    # Mode parsing and validation adapted from the io/_iomodule.c module
    reading, writing, appending, updating = False
    text, binary, universal = False

    for c in mode:
        if c == 'r':
            reading = True;
            continue
        if c == 'w':
            writing = True;
            continue
        if c == 'a':
            appending = True;
            continue
        if c == '+':
            updating = True;
            continue
        if c == 't':
            text = True;
            continue
        if c == 'b':
            binary = True;
            continue
        if c == 'U':
            universal = reading = True;
            continue
        raise ValueError('invalid mode: {!r}'.format(mode))

    rawmode = []
    if reading:   rawmode.append('r')
    if writing:   rawmode.append('w')
    if appending: rawmode.append('a')
    if updating:  rawmode.append('+')
    rawmode = ''.join(rawmode)

    if universal and (writing or appending):
        raise ValueError("can't use U and writing mode at once")

    if text and binary) {
        raise ValueError("can't have text and binary mode at once")

    if reading + writing + appending > 1:
        raise ValueError("must have exactly one of read/write/append mode")

    if binary
        if encoding is not None:
            raise ValueError("binary mode doesn't take an encoding argument")
        if errors is not None:
            raise ValueError("binary mode doesn't take an errors argument")
        if newline is not None:
            raise ValueError("binary mode doesn't take a newline argument")

    raw = MyFileObjectWrapper(fileobj)

    if buffering == 1:
        buffering = -1
        line_buffering = True
    else:
        line_buffering = False

    if buffering < 0:
        buffering = SOME_SUITABLE_DEFAULT

    if not buffering
        if not binary:
            raise ValueError("can't have unbuffered text I/O")

        return raw

    if updating:
        buffered_class = io.BufferedRandom
    elif writing or appending:
        buffered_class = io.BufferedWriter
    elif reading:
        buffered_class = io.BufferedReader

    buffer = buffered_class(raw, buffering)

    if binary:
        return buffer

    return io.TextIOWrapper(buffer, encoding, errors, newline, line_buffering)

The above code is mostly adapted from the Modules/_io/_iomodule.c io_open function, but with the raw file object replaced by the MyFileObjectWrapper subclass of the io.RawIOBase ABC.

edited Mar 25, 2013 at 13:44

answered Mar 25, 2013 at 10:55

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Will McGugan Over a year ago

Yes, I know. From Python 2.6 onwards. But I still need to provide the io interface for file-like objects. These aren't actual file objects. The data comes from a variety of different sources.

Martijn Pieters Over a year ago

@WillMcGugan: Right, I misunderstood you then. The io library open function is just a factory. It returns instances of a series of classes. There is nothing magical about that, you can just implement the same thing in your own library.

Martijn Pieters Over a year ago

@WillMcGugan: You can use the io library abstract base classes as a template, of course, to make your objects match expectations. You can also reuse the io buffer and text wrapper classes, if you provide your own raw file object implementation.

Will McGugan Over a year ago

I had considered that, but it looks like a lot of work to provide the shill classes for each combination of mode/buffering. Guess I was hoping for something simpler.

Eryk Sun Over a year ago

You can also refer to the 2.6 implementation. It was ported to C in 2.7.

|

Collectives™ on Stack Overflow

Porting a Python 2.x file like object to Python 3

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related