I'm using Python Pandas to read a CSV file:
col1\tcol2\tcol3\tcol4\tcol5
So in principle this file contains one row and 5 columns separated with a tabulator '\t'.
While reading the file, I specify a list of names, like so (I assume my file should have 3 columns, not 5 as the file above):
df = pd.read_csv("test.txt", sep = "\t", names = ["COL1", "COL2", "COL3"])
Panda doesn't complain about it and in fact, when I print it, it takes first 3 columns and reads it as one, first column, so when I print the DataFrame I get the following:
print(df.head())
COL1 COL2 COL3
col1 col2 col3 col4 col5
To me this means that the file is wrongly formatted, but I don't know how to catch this programmatic, e.g. when I check for the size of the columns, it returns 3 (the number of columns I have defined) and if I check the shape of the DataFrame, it also returns column number of 3.
My question is, how can I detect that the file I try to load with read_csv contains a certain number of columns? Of course I could just read the first line of the fine in a traditional way, parse it and check what it is, but is there a way to do this with Pandas?
pd.read_csv("test.txt", sep = "\t", usecols = [0, 1, 2])the parameter should beusecolsnotnames. The column name or index position can be passed tousecols.