You're working too hard. You concatenate all of the CSV rows anyway, so LINES= has no value. Just read the CSV, and choose an easy way to identify invalid rows - such as rows whose last column is NaN due to having insufficient columns.
Since you're using Pandas, your dictionary format is probably not a good idea. Consider instead forming a second dataframe whose column names are based on the keys. The key-value separation can be done with Pandas in a vectorised manner:
Suggested
from io import StringIO
import pandas as pd
def read_mixed(fname) -> tuple[pd.DataFrame, pd.DataFrame]:
df = pd.read_csv(fname) # Comment and data rows are mixed in
df is_data = df[df[dfdf[df.columns[-1]].notna() # Assume that "short" rows are comments, "long" rows are data
data = df[is_data] # All data rows in one frame
comments = df[~is_data] # Comment rows are "everything left"
comments.dropna(axis='columns', inplace=True) # Drop NA columns
first_col = comments.columns[0] # Name of the first column, disposable
comments[first_col] = comments[first_col].str.removeprefix('Comment line: ') # Drop the prefix on all rows
comments.columns = comments.iloc[0, :].str.split('=', 1).str[0] # Set columns to be keys from comment cells
for col in comments.columns:
comments[col] = comments[col].str.split('=', 1).str[1]
return data, comments
# This is a tester that illustrates the format of the real file(s)
if __name__ == "__main__":
fstr = StringIO("""a1,a2,a3,b1,b2,b3,c1,c2,c3
Comment line: LINES=5, CAR1=Top Fuel, CAR2=Funny, CAR3=Pro Stock
12,1,23,5,23,5,3,4,5
12,1,23,5,23,5,3,4,5
12,1,23,5,23,5,3,4,5
12,1,23,5,23,5,3,4,5
12,1,23,5,23,5,3,4,5
Comment line: LINES=5, CAR1=Funny, CAR2=Pro Stock, CAR3=Top Fuel
24,2,46,10,46,10,6,8,10
24,2,46,10,46,10,6,8,10
24,2,46,10,46,10,6,8,10
24,2,46,10,46,10,6,8,10
24,2,46,10,46,10,6,8,10
Comment line: LINES=3, CAR1=Funny, CAR2=Top Fuel, CAR3=Pro Stock
12,1,23,5,23,5,3,4,5
12,1,23,5,23,5,3,4,5
12,1,23,5,23,5,3,4,5
""")
output = read_mixed(fstr)
print(output)
This produces a result equivalent to that of your pd.concat and may be faster (test this).
Output
Or if this is too variable to rely on, you could filter by rows whose first column is numeric.
a1 a2 a3 b1 b2 b3 c1 c2 c3
1 12 1 23 5 23.0 5.0 3.0 4.0 5.0
2 12 1 23 5 23.0 5.0 3.0 4.0 5.0
3 12 1 23 5 23.0 5.0 3.0 4.0 5.0
4 12 1 23 5 23.0 5.0 3.0 4.0 5.0
5 12 1 23 5 23.0 5.0 3.0 4.0 5.0
7 24 2 46 10 46.0 10.0 6.0 8.0 10.0
8 24 2 46 10 46.0 10.0 6.0 8.0 10.0
9 24 2 46 10 46.0 10.0 6.0 8.0 10.0
10 24 2 46 10 46.0 10.0 6.0 8.0 10.0
11 24 2 46 10 46.0 10.0 6.0 8.0 10.0
13 12 1 23 5 23.0 5.0 3.0 4.0 5.0
14 12 1 23 5 23.0 5.0 3.0 4.0 5.0
15 12 1 23 5 23.0 5.0 3.0 4.0 5.0
You'll still need to find your Comment line by way of regex or similar.
0 LINES CAR1 CAR2 CAR3
0 5 Top Fuel Funny Pro Stock
6 5 Funny Pro Stock Top Fuel
12 3 Funny Top Fuel Pro Stock