I have a cat and dog image dataset. I converted into two folders (cat and dog) each folder contains roughly 10000 images. So Far I don't want 10000 images, I need only 2000 images in each folder. How to automate this in python.
I know to delete a file X, I could use os.remove(X)
similarly to delete a folder os.rmdir(dir_)
But I'm wondering how could i delete randomly n files in each folder effectively
So Far I tried,
dogs_dir=os.listdir('dogs')
cats_dir=os.listdir('cats')
selected_dogs = np.random.choice(dogs_dir,8000)
selected_cats = np.random.choice(cats_dir,8000)
for file_ in selected_dogs:
os.remove('dogs/'+file_)
for file_ in selected_cats:
os.remove('cats/'+file_)
The above code does the job for me, but I'm wondering is their effective way so that i could remove complexity in my code.
Any help would be appreciable.
I'm using ubuntu 17.10, For Now linux based solution is sufficient, but If it compatible with windows also then it's more appreciable.
np.random.choicesamples with replacement by default, passreplace=Falseto avoid picking the same file twice 2) If you want, you can avoid using NumPy for this task by just usingrandom.sample.