0

I have a piece of python code to practice python co-routines. As explained by A. Jesse Jiryu Davis.

  • Firstly, I define a co-routine named 'get' to get the content of some URL.
  • Then I define a Task class to iterate the co-routine to completion.
  • Then I create two Tasks which open two different URL.

But I got the error message: KeyError: '368 (FD 368) is already registered' in the line selector.register(s.fileno(), EVENT_WRITE).

This error is caused by the same file descriptor is returned by the two calls to socket.socket(). Actually, this file descriptor 368 has been allocated in the previous call but still returned in the second call.

  • Then I add an expression which modifies the outer variable.

This time the error message is just gone! If you want to run the code yourself, you can un-comment arr.append(self.init) in Task.step method to see the error-free output.

EDIT If I explicitly call python garbage collection, occasionally this error will be gone. But WHY occasionally?

After several days searching and reading python documents, I still have no idea why this is happening. I just missed some 'python Gotchas', do I?

I'm using python 3.6 to test. The code is as following and I have deleted all the irrelevant code to make the following code precise and relevant to the topic:

#! /usr/bin/python

from selectors import DefaultSelector, EVENT_WRITE
import socket
import gc

selector = DefaultSelector()
arr = [1, 2, 3]

class Task:

    def __init__(self, gen):
        self.gen = gen
        self.step()

    def step(self):
        next(self.gen)
        # arr.append(self.__init__)

def get(path, count = 0):
    s = socket.socket()
    print(count, 'fileno:', s.fileno())
    s.connect(('www.baidu.com', 80))
    selector.register(s.fileno(), EVENT_WRITE)
    yield

Task(get('/foo',1))
gc.collect()
Task(get('/bar',2))
7
  • 1
    Immediately after creating and connecting each socket, the function exits without saving any reference to the socket object. It therefore gets automatically closed as a part of garbage collection, and its fileno is immediately available for reuse. Commented Oct 20, 2017 at 16:03
  • This can't explain why add some other operation like 'object.append(self.step)' will help get a new fileno. And 'object.append(1234)' will not. Commented Oct 20, 2017 at 16:08
  • Task() -> self.step -> object -> Task(). You built a circular reference and one of those objects has a reference to the generator, hence it will only be cleaned by the next garbage collector run. Which makes the bug appear non-deterministically. Commented Oct 20, 2017 at 16:24
  • I think it's Task() -> self.step -> object -> arr , where arr is [1, 2, 3] from the original code. Do you mean that after I uncomment 'object.append(self.__init__)', it's a 'Task() -> self.step -> object -> Task()' circular reference? Commented Oct 20, 2017 at 16:39
  • 1
    Anyway, using selector.register(s, EVENT_WRITE) should solve your problem. In that case, selector holds the reference to the socket object. Commented Oct 20, 2017 at 16:58

2 Answers 2

1

Note:

  1. your get('/foo', 1) is an anonymous generator object, Task(get('/foo', 1)) is an anonymous Task object.
  2. GC is short for python garbage collection/collector

The reference chain of your original code is:

selector --> # socket_fd
anonymous Task(get('/foo', 1)) --> anonymous get('/foo', 1) --> s

So the anonymous Task(get('/foo', 1)) object is collected by GC as soon as it finishes. This is because:

Python GC will collect the memory of an object as soon as it finds the object's reference count == 0. But python GC is not running as a thread so maybe not the moment right after the object's reference count decreases to 0.

So then the anonymous get('/foo', 1) will be collected, then will be s. At here, s is collected, closed and its corresponding socket #fd number (#368 in your example) has been released.

But the socket #fd number (#368) has been registered to selector.

Then you run Task(get('/bar',2)), a new socket s tries to apply for a "new" fd, because #368 is available (as long as the other processes in your system has not claimed it), you will get #368 as socket fd agian.

un-comment arr.append(self.__init__) in Task.step()

After you un-comment arr.append(self.__init__) in Task.step() method, the global arr hold a reference to Task(get('/foo', 1)) . Then Task(get('/foo', 1)) has a reference to get('/foo', 1). Then get('/foo', 1) has a reference to your local socket s. This reference chain is like:

arr --> anonymous Task(get('/foo', 1)) --> anonymous get('/foo', 1) --> s

arr is valid through your program, so s will not be collected by GC. Later s = socket.socket() will not get the same fd because it is still held by s.

use s instead of s.fileno()

If you use selector.register(s ..) instead of selector.register(s.fileno()..), the global selector will hold a reference to your local s, the reference chain is:

selector --> s

Although the two anonymous objects have gone, your get('/foo', 1))::s and get('/bar', 2))::s are still held by the global selector. So don't worry the two fds will not collide.

cyclic reference?

The answer is No. Your situation has nothing to do with cyclic reference.

gc.collect?

Well, replace it with time.sleep(0.02) you will observe the same phenomenon. This may be caused by:

  1. sockets come and sockets go driven by the other processes of your system.
  2. python GC thread may take time till it "finds" the s should be collected, or the thread is being collecting.
Sign up to request clarification or add additional context in comments.

Comments

0

@allen He, Thank you so much for your reply.

  1. I researched further, the conclusion is: the problem is not caused by gc.

The "garbage collector" referred to in gc is only used for resolving circular references. In Python (at least in the main C implementation, CPython) the main method of memory management is reference counting. In my code, the result of Task() has no references, so will always be disposed immediately. There's no way of preventing that no matter you use gc.disable() or anything else.

This quotas from @Daniel Roseman, see here:

how to prevent Python garbage collection for anonymous objects?

  1. And another, Python GC is not a thread. Instead, it's synchronous

See: Why python doesn't have Garbage Collector thread?

  1. So the final answer to my question title is:

No, socket.socket() will not return an occupied file descriptor. If it returned a "occupied" fd, this would imply the former one has been released.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.