I'm using Partial method to pass 2 parameters which are not iterables, thus i shouldn't use that in the Map() function. I'm also using ThreadPoolExecutor for I\O bound task that i have here.
the problem is that inside of the get_the_text_par() function, i have a for loop which should go through all the rows and send the requests for each row (link) but it's doing it only for the first row and skips the other rows. How can i fix the issue or what am i missing here.
get_the_text_par = partial(get_the_text,_link_column=link,_firms=firms)
with ThreadPoolExecutor() as executor:
#chunk_size = len(results) // 10
chunk_size= len(results) if len(results)<10 else len(results) // 10
chunks=[results.iloc[i:i + chunk_size] for i in range(0, len(results),chunk_size)]
result = list(executor.map(get_the_text_par,chunks))
Get_the_Text implementation:
def get_the_text(_df,_firms:list,_link_column:str):
'''
sending a request to recieve the Text of the Articles
Parameters
----------
_df : DataFrame
Returns
-------
dataframe with the text of the articles
'''
_df.reset_index(inplace=True)
print(_df)
for k,link in enumerate(_df[[f'{_link_column}']]):
print(k,'\n',_df.loc[k,f'{_link_column}'])
if link:
website_text=list()
# print(link,'\n','K:',k)
try:
page_status_code,page_content,page_url = send_two_requests(_df.loc[k,f'{_link_column}'])
......
.....
...
..
.
to import the data :
data = {
'index': [1366, 4767, 6140, 11898],
'DATE': ['2014-01-12', '2014-01-12', '2014-01-12', '2014-01-12'],
'SOURCES': ['go.com', 'bloomberg.com', 'latimes.com', 'usatoday.com'],
'SOURCEURLS': [
'http://abcnews.go.com/Business/wireStory/mercedes-recalls-372k-suvs-21445846',
'http://www.bloomberg.com/news/2014-01-12/vw-patent-application-shows-in-car-gas-heater.html',
'http://www.latimes.com/business/autos/la-fi-hy-autos-recall-mercedes-20140112-story.html',
'http://www.usatoday.com/story/money/cars/2014/01/12/mercedes-recall/4437279/'
],
'Tone': [-0.375235, -1.842752, 1.551724, 2.521008],
'Positive_Score': [2.626642, 1.228501, 3.275862, 3.361345],
'Negative_Score': [3.001876, 3.071253, 1.724138, 0.840336],
'Polarity': [5.628518, 4.299754, 5.0, 4.201681],
'Activity_Reference_Density': [22.326454, 18.918919, 22.931034, 19.327731],
'Self_Group_Reference_Density': [0.0, 0.0, 0.344828, 0.840336],
'Year': [2014, 2014, 2014, 2014],
'Month': [1, 1, 1, 1],
'Day': [12, 12, 12, 12],
'Hour': [0, 0, 0, 0],
'Minute': [0, 0, 0, 0],
'Second': [0, 0, 0, 0],
'Mentioned_firms': ['mercedes', 'vw', 'mercedes', 'mercedes'],
'text': ['', '', '', '']
}
# Creating a DataFrame
df = pd.DataFrame(data)