Is there a simpler way to split list into sublists randomly without repeating elements in python?

Question

I would like to split a list into 3 sublists (train, validation, test) using pre-defined ratios. The items should be chosen to the sublists randomly and without repetition. (My first list contains the names of images in a folder which I want to process after the splitting.) I found a working method, but it seems complicated. I'm curious is there a simpler way to do this? My method is:

list the files in the folder,
define the necessary size of sublists,
randomly fill in the first sublist,
remove the used items from the original list,
randomly fill in the second sublist from the remaining list,
remove the used items to get the third sublist.

This is my code:

import random
import os 

# list files in folder
files = os.listdir("C:/.../my_folder")

# define the size of the sets: ~30% validation, ~20% test, ~50% training (remaining goes to training set)
validation_count = int(0.3 * len(files))
test_count = int(0.2 * len(files))
training_count = len(files) - validation_count - test_count

# randomly choose ~20% of files to test set
test_set = random.sample(files, k = test_count)

# remove already chosen files from original list
files_wo_test_set = [f for f in files if f not in test_set]

# randomly chose ~30% of remaining files to validation set
validation_set = random.sample(files_wo_test_set, k = validation_count)

# the remaining files going into the training set
training_set = [f for f in files_wo_test_set if f not in validation_set]

What is the meaning of cleaner when he/she says I found a working method but it was complicated? What element makes it cleaner? What element makes it complicated? — mece1390
– mece1390, Commented Dec 11, 2020 at 16:10
Hello, thanks for the comments. I already got 2 answers which are more simple and elegant in my opinion, this was my goal. Thanks for the answers! — user14421092
– user14421092, Commented Dec 12, 2020 at 10:41

user14421092 · Accepted Answer · 2020-12-11 16:48:43Z

4

I think the answer is self explanatory so I am not adding any explanation.

import random
random.shuffle(files)
k = test_count
set1 = files[:k]
set2 = files[k:1.5k]
set3 = files[1.5k:]

edited Dec 11, 2020 at 16:48

user14421092

1552 silver badges8 bronze badges

answered Dec 11, 2020 at 16:05

Shadowcoder

9721 gold badge8 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alan · Accepted Answer · 2020-12-11 16:08:00Z

1

I'd recommend looking into the sci-kit learn library, as that contains the train_test_split function to do this for you. However to answer your question using just the random library.

# First shuffle the list randomly
files = os.listdir("C:/.../my_folder")
random.shuffle(files) 

# Then just slice
ratio = int(len(files)/5) # 20%
test_set = files[:ratio]
val_set = files[ratio:1.5*ratio] #30%

answered Dec 11, 2020 at 16:08

Alan

2,6282 gold badges14 silver badges30 bronze badges

Comments

Joy · Accepted Answer · 2022-08-31 23:28:31Z

0

I hope this can help someone. Sklearn has a library that does it easily:

from sklearn.model_selection import train_test_split

X = np.arange(15).reshape((5, 3))
>>> X
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

X_train, X_test =train_test_split(X, test_size=0.3, random_state=42)

>>> X_train
array([[ 6,  7,  8],
       [ 0,  1,  2],
       [ 9, 10, 11]])

>>> X_test
array([[ 3,  4,  5],
       [12, 13, 14]])

answered Aug 31, 2022 at 23:28

Joy

971 silver badge8 bronze badges

Collectives™ on Stack Overflow

Is there a simpler way to split list into sublists randomly without repeating elements in python?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related