When reading a CSV file as a pandas dataframe, an error is raised when trying to select a subset of columns based on original column names (usecols=) and renaming the selected columns (names=). Passing renamed column names to usecols works, but all columns must be passed to names to correctly select columns.
# read the entire CSV
df1a = pd.read_csv(folder_csv+'test_read_csv.csv')
# select a subset of columns while reading the CSV
df1b = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'])
# rename columns while reading the CSV
df1c = pd.read_csv(folder_csv+'test_read_csv.csv', names=['first', 'second', 'third'], header=0)
# select a subset of columns and rename them while reading the CSV;
# throws error "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']"
df1d = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'], names=['first','third'])
# selects columns 1 and 2, calling them 1 and 3
df1e = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','third'])
# selects columns 1 and 3 correctly
df1f = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','second','third'])
The CSV file test_read_csv.csv is:
Col1,Col2,Col3
val1a,val2a,val3a
val1b,val2b,val3b
val1c,val2c,val3c
val1d,val2d,val3d
val1e,val2e,val3e
Wouldn't it be a fairly common use case to select certain columns based on the original column names and then renaming only those columns while reading the data?
Of course, it is possible to select the columns and rename them after loading the entire CSV file:
df1 = df1[['Col1','Col3']]
df1.columns = ['first', 'third']
But I don't know how and whether this can be integrated directly when reading the data. The same holds also for pd.read_excel().
usecolsarg. While withoutusecolsI do understand that all columns must be renamed if the columns to be renamed are not provided (after all,namesis a list and not a dictionary like the one used indf.rename(columns=)), having columns selected withusecolsshould make it clear which columns should be selected, provided the renaming is done after the selection.