I am wondering if there is a more performant way to iterate through a pandas dataframe and concatenate values in different columns.
For example I have the below working:
import pandas as pd
from pathlib import Path
data = {'subdir': ['tom', 'phil', 'ava'],
'filename':['9.wav', '8.wav', '7.wav'],
'text':['Pizza','Strawberries and yogurt', 'potato']}
df = pd.DataFrame(data, columns = ['subdir', 'filename', 'text'])
df.head()
example_path = Path(r"C:\Hello\World")
for index, row in df.iterrows():
full_path = example_path.joinpath(row['subdir'], row['filename'])
print(full_path)
text = row['text']
print(text)
Output:
C:\Hello\World\tom\9.wav
Pizza
C:\Hello\World\phil\8.wav
Strawberries and yogurt
C:\Hello\World\ava\7.wav
potato
However, I have a large amount of rows and I would like to do this in the fastest way possible. What is the best way to do this? I am taking pieces of a path (subdirectory and the base file name) and concatenating them as I iterate through the dataframe.
I will also likely be grabbing data from other adjacent columns (like 'text' in the example) and storing them as I iterate over the dataframe, so I'd like to find a way to do this all in one go, as I will be taking these pieces to output a dictionary/dataframe object after I have gathered all of the data in list or series like structures.
Thank you.