What if you use the SparkSession and SparkContext to read the files at once and then loop through thes s3 directory by using wholeTextFiles method. You can utilize the s3a connector in the url which allows to read from s3 through Hadoop.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('S3Example').getOrCreate()
s3_bucket = 'your-bucket'
s3_path = f's3a://{s3_bucket}/my-directory/'
# List files S3
file_list = spark.sparkContext.wholeTextFiles(s3_path).map(lambda x: x[0]).collect()
for file_path in file_list:
print(file_path)
Please note, above I've only retrieved the file paths. If you want both, you can avoid only extracting the file path (x[0] in the lambda), and get both.
file_tuple = spark.sparkContext.wholeTextFiles(s3_path)
boto3. Otherwise, you might want to look into wildcarding. Something likespark.read.parquet('s3://my_bucket/*/*/*')