Skip to content

Conversation

@madnz8
Copy link

@madnz8 madnz8 commented Oct 3, 2025

Summary

This PR enhances the search functionality in ScrapeGraphAI with configurable parameters and improves pipeline resilience by adding robust error handling.

Changes Made

  • Add search configuration parameters: Introduced region, language, and timelimit parameters to SearchConfig and search_on_web() function for better
    control over search results
  • 🛡️ Improve error handling: Added try-catch blocks in GraphIteratorNode to prevent pipeline crashes when individual sites fail to scrape
  • 🐛 Add debug logging: Enhanced SearchInternetNode with verbose debug output showing original prompts, LLM-generated queries, and found URLs
  • 📚 Add comprehensive documentation: Created SERVICES_GUIDE.md with complete reference for all 25+ graph services and their parameters

Why These Changes?

Problem: When using SearchGraph with Italian pharmaceutical queries (e.g., "Gruppo Chiesi"), the system returned generic news homepages instead of specific
articles. Additionally, individual site failures (like HTTP/2 errors) crashed the entire scraping pipeline.

Solution:

  1. Configurable search parameters (timelimit, region, language) allow users to fine-tune search quality and get relevant, recent results instead of generic
    homepages
  2. Error handling ensures the pipeline continues processing other sites even when one fails

Files Modified

  • scrapegraphai/utils/research_web.py - Added timelimit parameter to search configuration
  • scrapegraphai/nodes/search_internet_node.py - Added region, language, timelimit parameters and debug logging
  • scrapegraphai/nodes/graph_iterator_node.py - Added error handling in async graph execution
  • SERVICES_GUIDE.md - New comprehensive documentation file

Breaking Changes

None - all new parameters are optional and backward compatible.

Testing

Tested with Italian pharmaceutical company searches:

  • ✅ Search parameters correctly filter to recent, relevant articles
  • ✅ Pipeline continues when individual sites fail
  • ✅ Debug logging provides visibility into search process

madnz8 and others added 2 commits October 2, 2025 14:34
Enhance search functionality with configurable parameters and improve pipeline resilience:

- Add region, language, and timelimit parameters to search configuration
- Add error handling in GraphIteratorNode to prevent pipeline crashes from individual site failures
- Add debug logging in SearchInternetNode showing original prompt, search query, and found URLs
- Pass new search parameters through research_web.py and SearchInternetNode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request labels Oct 3, 2025
@madnz8 madnz8 closed this Oct 3, 2025
@madnz8 madnz8 deleted the feature/search-fixes branch October 3, 2025 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant