1

I am trying scrape a website using C#. At some point in the process, website returns a JavaScript page that I need to execute so that it will generate some arguments and then post a request using the generated arguments as query variables.

This is the JavaScript file https://jsfiddle.net/7aw5vr59/

The browser generated result file will look like the below:

<imimxxxyyy id="ActiveX"></imimxxxyyy><form action="/home/" method="post"><input name="TS013a5875_id" value="3" type="hidden"><input name="TS013a5875_cr" value="085d52524cab2800109920a8877032c63ff20a076afde32d3949a9c0cc832e2a409e921dbd0f04b390bc9a36f79f4d080873a7f6848948001fe9d70f9af2fa1f81ba0cb687810509e2df6f37950961d59dba504d18b2e08237af58ac5683f65a8b9a4c978624319575ee9b400ae2307cbb314a0f32ecca4464cdc6b2082f7352" type="hidden"><input name="TS013a5875_76" value="085d52524cab2800109920a8877032c63ff20a076afde32d3949a9c0cc832e2a409e921dbd0f04b390bc9a36f79f4d080873a7f68488b000c2ff7c505061da44dff5459af7ebe2f604b8d36bdeeeca3eead0e146af07190233b9414ca790443d2453827dc161e073eb63ed4d10c070e405848b2ccb2dc1c4412b93dff97f978c6f1caecff07f6d4c23e1ade1bfb2f715409cf4d5f1f91a826e092193a1407539ec35c80a0d82032163abc93f6876c7c1cecded7400c11873a90a0ad58c3d18b0a55b0a0430c50575d7f535fd9b414c06b1c3b11ab326b07356737269137f2610cf26df27c7e0bcd5" type="hidden"><input name="TS013a5875_86" value="085d52524cab2800109920a8877032c63ff20a076afde32d3949a9c0cc832e2a409e921dbd0f04b390bc9a36f79f4d080873a7f68486600098382373b7447eebb69eb2b508714f7fb748b827881d272fff290b8bcf8bef6184c2a8c9f1236e71539573e709a14a158df0bb128ca0ba6e196a5b4a979b28a93e07d7089584e53a1ae51612c25ee3012964be00bc312836a58d7543f2cd825f" type="hidden"><input name="TS013a5875_md" value="1" type="hidden"><input name="TS013a5875_rf" value="0" type="hidden"><input name="TS013a5875_ct" value="0" type="hidden"><input name="TS013a5875_pd" value="0" type="hidden"></form>

As you see at the end there are variables in the form starts with TS013a5875. I should do the same in my code. Can someone help me how can I do that.

I tried the below but no luck. Also, the application is very tightly coupled to add more external dependencies.

  1. Using Jurassic Engine
  2. ScrapySharp
  3. WebBrowser Class
10
  • I would prefer to use an actual web browser i.e. Chrome or Firefox to do that so. And for scrapping, I would use Selenium Web Driver. Commented Nov 7, 2016 at 13:39
  • 1
    How about using something like selenium webdriver + phantomjs? Commented Nov 7, 2016 at 13:39
  • @AdnanUmer Can you give more details on Selenium Web Driver or any references where I can understand it more clearly? Commented Nov 7, 2016 at 13:43
  • Also, how smooth would be the selenium integration into a .net project? I am completely new to selenium, which means there is a lot of learning required. Commented Nov 7, 2016 at 13:44
  • 1
    @PJS scraping.pro/… Commented Nov 7, 2016 at 13:47

1 Answer 1

0

The website you are scraping probably uses a anti-scraping technology called BIG IP developed by F5.com.

You should use a browser that is able to execute javascript and that have some real capabilities, like rendering canvas. You can try a headless browser like PhantomJS, but it'll probably not work.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.