1

I'd like to scrape real-time data from a website and i decided to use webSocket - sharp library. My problem is that with the same code i can parse the data from a specific website and i can't from another one.

The program throws this exception: WebSocket.connect:0|WebSocketSharp.WebSocketException: Not a WebSocket handshake response.

using (var wss = new WebSocket("wss://..."))
{
    wss.SslConfiguration.EnabledSslProtocols = System.Security.Authentication.SslProtocols.Tls12;
    wss.Origin = "https://www.blabla.com";
           
    wss.CustomHeaders = new Dictionary<string, string>
    {
        { "Accept-Encoding", "gzip, deflate, br" },
        { "Accept-Language", "el-GR,el;q=0.9,en;q=0.8" },
        { "Cache-Control", "no-cache" },
        { "Connection", "Upgrade" },
        { "Host", "blabla.com" },
        { "Origin", "https://www.bla.com" },
        { "Pragma", "no-cache" },
        //{ "Sec-WebSocket-Key", secWebSocketKey },
        //{ "Sec-WebSocket-Protocol", "zap-protocol-v1" },
        { "Sec-WebSocket-Extensions", "permessage-deflate; client_max_window_bits" },
        { "Sec-WebSocket-Version", "13" },
        { "Upgrade", "websocket" },
        { "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" }
     };

     //wss.OnOpen += Ws_OnOpen;
     wss.OnMessage += (sender, e) => Console.WriteLine($"Server: {e.Data}");
     wss.OnError += (sender, e) => Console.WriteLine($"Error: {e.Message}");

     wss.Connect();

     Console.ReadKey();
 }

I tried with or without custom headers.

What have i do to make a valid handshake?

(P.S: I can parse the data without custom headers from the first website)

UPDATE

In the URL there is a uid parameter wss://blabla.com/zap/?uid=5829062969032768

This uid changes in every refresh of webpage. I think it's necessary for the handshake. Is there any way to reproduce it?

5
  • Does the second website support websocket connections? You can't use a websocket to any random page on any website - the server also needs to want that connection to be a websocket, rather than a normal web request. Commented Sep 25, 2020 at 8:47
  • @James Thorpe Yes it supports. I can see the stream from Chrome. The data that client sends and the received data from server Commented Sep 25, 2020 at 8:49
  • OK - in that case it'll be down to a mismatch in your request in some fashion. Are you able to see the actual response the server is sending to your code - might tell you why it's refusing it? Or dig into the websocket request in Chrome - see if it's sending other headers (perhaps a needed cookie etc?). Or worst case use fiddler/wireshark etc to compare your request to the one the website itself uses. Commented Sep 25, 2020 at 8:53
  • @JamesThorpe I used all the request headers as they appear in google chrome inspector Commented Sep 25, 2020 at 9:01
  • So any more details available in the exception when it happens then? It ought to show what the response from the server actually was somewhere I think. If not it's off to fiddler to compare... Commented Sep 25, 2020 at 9:03

1 Answer 1

1

This uid changes every time the page loads. I found that this site uses code obfuscation so it was too difficult for me to underastand the js code so i used selenium 4 devtools and finally scrape real-time data.

First have to initialize chrome devtools

public async static Task<DevToolsSession> InitializeChromeDevTools(IWebDriver driver)
{
    var devTools = driver as IDevTools;
    var output = devTools.CreateDevToolsSession();
    await output.Network.Enable(new OpenQA.Selenium.DevTools.Network.EnableCommandSettings());

    return output;
}

And then

var session = await ChromeDriverSettings.InitializeChromeDevTools(driver);
session.Network.WebSocketFrameReceived += Network_WebSocketFrameReceived; 

private static void Network_WebSocketFrameReceived(object sender, OpenQA.Selenium.DevTools.Network.WebSocketFrameReceivedEventArgs e)
{
    var message = e.Response.PayloadData;
}
Sign up to request clarification or add additional context in comments.

7 Comments

I am unable to use your InitializeChromeDevTools method. The output.Network property is not available. I have installed the nuget package Selenium.WebDriver v4.0.0-alpha07. Any hints?
@PetterT Try with -alpha05. I just updated to alpha07 and have the same problem. Something probably changed in this new version
Excellent @ggeorge, that made the difference. By the way: are you also able to send Websocket messages by using devtools? (Not much documentation for these alpha releases around :-) )
By the way, the output.Network.Enable method is async, so you should await it.
@PetterT Thanks for the tip. I will update my answer. About websocket messages i don't know because i used it only as proxy to capture the network traffic
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.