2

I am trying to create a custom dataset for a Deep Learning project using jpg images. I need to read them all in one batch. Doing so using the code below, but my array shape is (100, 1, 224, 224, 3) instead of (100,224, 224, 3). Any suggestions?

path = '/content/drive/My Drive/Dataset/Training'
X=[]
for img in os.listdir(path):
    pic = cv2.imread(os.path.join(path,img))
    pic = cv2.cvtColor(pic,cv2.COLOR_BGR2RGB)
    pic = cv2.resize(pic,(224,224))
    X.append([pic])
X=np.array(X)
print(X.shape)
(100, 1, 224, 224, 3)
2
  • X.append(pic) It's the [] that are adding the size 1 dimension. Commented Mar 20, 2021 at 15:48
  • @hpaulj, did you read my answer? I wrote exactly the same thing as you commented now. Commented Mar 20, 2021 at 15:53

1 Answer 1

2

From general point of view, use squeeze from numpy to remove unused dimension (with unit length) from tensor.

For example:

print(np.squeeze(X).shape)

gives you:

(100, 224, 224, 3)

But perhaps in your case it is enough to use X.append(pic) in the line 7 (try to check this).

Tip: try to avoid lists when using numpy. About @hpaulj comment, you can use the concatenate function of numpy instead of lists:

# initialization like X = []
X = np.zeros([0]+list(pic.shape))
...
# append
X = np.concatenate((X, pic.reshape([1]+list(pic.shape))), axis=0)
Sign up to request clarification or add additional context in comments.

2 Comments

lists are best when collecting values iteratively.
@hpaulj I only wrote "try to avoid" (this is not a restriction), as using lists can increase the calculation time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.