5

Problem:

I have one folder(json_folder_large) which holds more than 200, 000 json files inside, another folder(json_folder_small) which holds 10, 000 json files inside.

import os
lst_file = os.listdir("tmp/json_folder_large") # this returns an OSError
OSError: [Errno 5] Input/output error: 'tmp/json_folder_large'

I got an OSError when I use listdir with directory path. I am sure there is no problem with the path because I can do the same thing with the other folder without this OSError.

lst_file = os.listdir("tmp/json_folder_small") # no error with this

Env:

Problem above is with docker image as pycharm interpreter.

When the interpreter is conda env, there is no errors.

The only difference here I could see is that in my docker/preferences/resources/advanced, I set 4 CPU(max is 6) and 32GB memory(max is 64).

I tried:(under docker)

1. With Pathlib

import pathlib
pathlib.Path('tmp/json_folder_large').iterdir() # this returns a generator <generator object Path.iterdir at 0x7fae4df499a8>
for x in pathlib.Path('tmp/json_folder_large').iterdir():
    print("hi")
    break

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python3.7/pathlib.py", line 1074, in iterdir for name in self._accessor.listdir(self):
OSError: [Errno 5] Input/output error: 'tmp/json_folder_large'

2. With os.scandir

os.scandir("tmp/json_folder_large") # this returns a generator <posix.ScandirIterator object at 0x7fae4c48f510>
for x in os.scandir("tmp/json_folder_large"):
    print("hi")
    break
Traceback (most recent call last):
  File "<input>", line 1, in <module>
OSError: [Errno 5] Input/output error: 'tmp/json_folder_large'

3.Connect pycharm terminal to docker container, then do ls

docker exec -it 21aa095da3b0 bash
cd json_folder_large
ls

Then I got an error(when the terminal is not connected to docker container, the code above raise no error!!!!!)

ls: reading directory '.': Input/output error

Questions:

  1. Is it really because of the memory issue?
  2. Is it possible to solve this error while everything is under the same directory? (I see we could split those files into different directories)
  3. Why my code raise error under docker but not conda env?

Thanks in advance.

3
  • Hmm. Did you try os.listdir("/tmp/json_folder_large")? Commented Mar 22, 2021 at 14:17
  • 1
    Hi, @mutantkeyboard, I am sure the path should be 'tmp/json_folder_large', if I do os.listdir("/tmp/json_folder_large"), I will get a FileNotFoundError: [Errno 2] No such file or directory: '/tmp/json_folder_large'. By the way my current working directory is '/opt/project' Commented Mar 22, 2021 at 14:19
  • Does this answer your question? IOError: [Errno 5] Input/output error Commented Mar 22, 2021 at 14:44

1 Answer 1

0

You can use os.scandir or glob.iglob. They make use of iterator and avoid loading the entire list in memory.

Sign up to request clarification or add additional context in comments.

6 Comments

I just tried with os.scandir and Path from pathlib, nothing works in my case, I will update this into my question
You mean you get the i/o error when using os.scandir also?
Hi @lllrnr101 I updated the part with os.scandir, the moment I iterate through the generator it raise an error
Then I think memory is not your problem. I would put a sleep(60) and then attach the process to strace and see if I get any clue.
Do the calls for scnadir or listdir with a pattern also give you error. Like trying to get only a suibset of files from that dir?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.