feat: add search configuration parameters and improve pipeline resilience #1011
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR enhances the search functionality in ScrapeGraphAI with configurable parameters and improves pipeline resilience by adding robust error handling.
Changes Made
region,language, andtimelimitparameters toSearchConfigandsearch_on_web()function for bettercontrol over search results
GraphIteratorNodeto prevent pipeline crashes when individual sites fail to scrapeSearchInternetNodewith verbose debug output showing original prompts, LLM-generated queries, and found URLsSERVICES_GUIDE.mdwith complete reference for all 25+ graph services and their parametersWhy These Changes?
Problem: When using SearchGraph with Italian pharmaceutical queries (e.g., "Gruppo Chiesi"), the system returned generic news homepages instead of specific
articles. Additionally, individual site failures (like HTTP/2 errors) crashed the entire scraping pipeline.
Solution:
timelimit,region,language) allow users to fine-tune search quality and get relevant, recent results instead of generichomepages
Files Modified
scrapegraphai/utils/research_web.py- Addedtimelimitparameter to search configurationscrapegraphai/nodes/search_internet_node.py- Addedregion,language,timelimitparameters and debug loggingscrapegraphai/nodes/graph_iterator_node.py- Added error handling in async graph executionSERVICES_GUIDE.md- New comprehensive documentation fileBreaking Changes
None - all new parameters are optional and backward compatible.
Testing
Tested with Italian pharmaceutical company searches: