I would like to scrape the problems from these Go (board game) books, and convert them into SGFs, if they aren't in that format already. For now, I would be satisfied with only taking the problems themselves, no need for the answer variations, only the initial setup.
The link above might not work because some of the pages need you to be logged in. A standalone question link like this one seems to work without being logged in for now though, but that depends on that website's dev's whims.
That website is using a <canvas> component to draw the problems but I can't seem to find where the data is. I think they are not using SGF — SGF is a text format for encoding trees, it's the standard file type for Go — but their own coordinate system in a JSON. There's a var qqdata in one of the <script> tags at the end of the HTML file, but I'm not sure how to translate that into SGF coordinates.
This other project already extracts the data from these webpages (although I haven't yet been able to reproduce it), but I think it does things visually from the <canvas>?
I would prefer an answer in TypeScript, if possible, but I Python would also be ok.
What would be the best way of scraping the data from that website?
qqdatavariable. Can you clarify what you're looking for please? SGF isn't a common-knowledge format so maybe an example of what you're hoping to get would be helpful. Thanks!parseSgfPtLstfrom the main .js file, you can potentially plug in whatever data is the input into it. Scraping canvas seems tough because you'd probably need to play the move to move onto the next board, repeat until solved.qqdata.c/qqdata.contenthave base64 that decodes to a binary format that seem to include point data