Is it possible to capture websocket traffic using selenium and python? - python-3.x

we are using selenium on python and as a part of our automation, we need to capture the messages that a sample website sends and receives after the web page loaded completely.
I have check here and it is stated that what we want to do is achievable using BrowserMobProxy but after testing that, the websocket connection did not work on website and certificate errors were also cumbersome.
In another post here it is stated that, this can be done using loggingPrefs of Chrome but it seemed that we only get the logs up to the time when website loads and not the data after that.
Is it possible to capture websocket traffic using only selenium?

Turned out that it can be done using pyppeteer; In the following code, all the live websocket traffic of a sample website is being captured:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(
headless=True,
args=['--no-sandbox'],
autoClose=False
)
page = await browser.newPage()
await page.goto('https://www.tradingview.com/symbols/BTCUSD/')
cdp = await page.target.createCDPSession()
await cdp.send('Network.enable')
await cdp.send('Page.enable')
def printResponse(response):
print(response)
cdp.on('Network.webSocketFrameReceived', printResponse) # Calls printResponse when a websocket is received
cdp.on('Network.webSocketFrameSent', printResponse) # Calls printResponse when a websocket is sent
await asyncio.sleep(100)
asyncio.get_event_loop().run_until_complete(main())

Related

binance websocket not working in aws instance

I am trying to get the live prices of crypto tokens in binance.I used websockets for that.
Code:
import websockets
import asyncio
async def hello():
async with websockets.connect('wss://fstream.binance.com/ws/!markPrice#arr') as websocket:
print("connected!")
while True:
print("Debug")
greeting = await websocket.recv()
print(greeting)
await client.close_connection()
asyncio.run(hello())
This is code working fine in local linux machine (means it's printing the data coming from web socket which is being stored in greeting variable.)
At the same time without changing any code in that, I run that in aws ec2 ubuntu instance,it's doing nothing, just printing connected! and Debug. After that nothing print on console and no any error raised.
I installed web sockets latest version.
I was experiencing a very similar issue. I've found out it has something to do with the newest websocket library release. As a workaround, downgrading from 10.0 to 9.1 is what helped me.

Discord.py: how to create a custom function notify and call trigger the discord client to invoke it

I want to ask a discord client to execute a specific function WITHOUT being triggered by an on message, or on ready or having a LOOP that repeats itself.
Basically, I want a function notify(msg) and make the discord client send a message to a channel
I have tried with the following without success
async notify(message, client):
await client.wait_until_ready()
channel = client.get_channel(os.getenv('CHANNEL_ID'))
await channel.send(message)
client.run(os.getenv('CLIENT_TOKEN'))
# Notice that I try to run the task after the client run
# Because I may call notify multiple times after the client has been created
client.loop.create_task(notify())
I want to invoke this function after the client has been created and without looping indefinitely.
Tried also to run
asyn def main():
await client.connect()
await client.log(client.run(os.getenv('CLIENT_TOKEN')))
await notify('My message', client)
# then
if __name__ = "__main__":
asyncio.run(main())
This just doesnt run ... having errors with http requests and so on.
client.loop.create_task(notify(*args))
That pass the function to be executed in the client's loop and will be called only once whenever you want to.
What I was looking for are webhooks.
The role of the client is to act on predefined events (like on start, on message received ... etc).
If we want to trigger something externally, Discord provides webhooks:
https://discordpy.readthedocs.io/en/stable/api.html?highlight=webhooks#discord.TextChannel.webhooks
Basically webhooks are unique urls for discord channels to which we can send data, the minimal data is of the form {'content': "my message}
which sends a simple message. You can improve this further by including embeds.
Here is an basic example using aihttp
from discord import Webhook, AsyncWebhookAdapter
import aiohttp
async def foo():
async with aiohttp.ClientSession() as session:
webhook = Webhook.from_url('url-here', adapter=AsyncWebhookAdapter(session))
await webhook.send('Hello World', username='Foo')
And here another example using Requests library
import requests
from discord import Webhook, RequestsWebhookAdapter
webhook = Webhook.partial(123456, 'abcdefg', adapter=RequestsWebhookAdapter())
webhook.send('Hello World', username='Foo')

Is it possible to use puppeteer to pass a Javascript object to nodejs?

Background
I am using Posenet (see the in browser demo here) for keypoint detection. I have set it up to run on a WebRTC MediaStream, s.t.:
Client: Runs in a chrome tab on machine A. Initializes WebRTC connection and sends a MediaStream to Server. Receives back real time keypoint data from Server via WebRTC's DataChannel.
Server: Runs in a chrome tab on machine B, receives a WebRTC stream and passes the corresponding MediaStream to Posenet. Posenet does its thing and computes keypoints. This keypoint data is then send back to the client via WebRTC's DataChannel (if you have a better idea, I'm all ears).
Problem: I would like to have the server receive multiple streams from various clients and run Posenet on each, sending real time keypoint data to all clients. Though I'm not thrilled about the server utilizing Chrome, I am fine with using puppeteer and Chrome's headless mode for now, mainly to abstract away WebRTC's complexity.
Approaches
I have tried two approaches, being heavily in favor of approach #2:
Approach #1
Run #tensorflow/tfjs inside the puppeteer context (i.e. inside a headless chrome tab). However, I cannot seem to get the PoseNet Browser Demo working in headless mode, due to some WebGL error (it does work in non-headless mode though). I tried the following (passing args to puppeteer.launch() to enable WebGL, though I haven't had any luck - see here and here for reference):
const puppeteer = require('puppeteer');
async function main() {
const browser = await puppeteer.launch({
headless: true,
args: ['--enable-webgl-draft-extensions', '--enable-webgl-image-chromium', '--enable-webgl-swap-chain', '--enable-webgl2-compute-context']
});
const page = await browser.newPage();
await page.goto('https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html', {
waitUntil: 'networkidle2'
});
// Make chromium console calls available to nodejs console
page.on('console', msg => {
for (let i = 0; i < msg.args().length; ++i)
console.log(`${i}: ${msg.args()[i]}`);
});
}
main();
In headless mode, I am receiving this error message.
0: JSHandle:Initialization of backend webgl failed
0: JSHandle:Error: WebGL is not supported on this device
This leaves me with question #1: How do I enable WebGL in puppeteer?
Approach #2
Preferably, I would like to run posenet using the #tensorflow/tfjs-node backend, to accelerate computation. Therefore, I would to link puppeteer and #tensorflow/tfjs-node, s.t.:
The puppeteer-chrome-tab talks WebRTC with the client. It makes a Mediastream object available to node.
node takes this MediaStream and passes it to posenet, (and thus #tensorflow/tfjs-node), where the machine learning magic happens. node then passes detected keypoints back to puppeteer-chrome-tab which uses its RTCDataChannel to communicate them back to client.
Problem
The problem is that I cannot seem to get access to puppeteer's MediaStream object within node, to pass this object to posenet. I'm only getting access to JSHandles and ElementHandles. Is it possible to pass the javascript object associated with the handle to node?
Concretely, this error is thrown:
UnhandledPromiseRejectionWarning: Error: When running in node, pixels must be an HTMLCanvasElement like the one returned by the `canvas` npm package
at NodeJSKernelBackend.fromPixels (/home/work/code/node_modules/#tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:1464:19)
at Engine.fromPixels (/home/work/code/node_modules/#tensorflow/tfjs-core/dist/engine.js:749:29)
at fromPixels_ (/home/work/code/node_modules/#tensorflow/tfjs-core/dist/ops/browser.js:85:28)
at Object.fromPixels (/home/work/code/node_modules/#tensorflow/tfjs-core/dist/ops/operation.js:46:29)
at toInputTensor (/home/work/code/node_modules/#tensorflow-models/posenet/dist/util.js:164:60)
at /home/work/code/node_modules/#tensorflow-models/posenet/dist/util.js:198:27
at /home/work/code/node_modules/#tensorflow/tfjs-core/dist/engine.js:349:22
at Engine.scopedRun (/home/work/code/node_modules/#tensorflow/tfjs-core/dist/engine.js:359:23)
at Engine.tidy (/home/work/code/node_modules/#tensorflow/tfjs-core/dist/engine.js:348:21)
at Object.tidy (/home/work/code/node_modules/#tensorflow/tfjs-core/dist/globals.js:164:28)
Logging the pixels argument that is passed to NodeJSKernelBackend.prototype.fromPixels = function (pixels, numChannels) {..}, it evaluates to an ElementHandle. I am aware that I can access serializable properties of a Javascript object, using puppeteer's page.evaluate. However, if I were to pass the CanvasRenderingContext2D's imageData (using the method getImageData() to node by calling puppeteer.evaluate(..), this means stringifying an entire raw image and then reconstructing it in node's context.
This leaves me with question #2: Is there any way to make an object from puppeteer's context accessible (read-only) directly inside node, without having to go through e.g. puppeteer.evaluate(..)?
I recommend another approach with is to ditch the idea of using puppeteer on the server-side and instead implementing an actual WebRTC client in Node.js which then directly uses PoseNet via #tensorflow/tfjs-node.
Why not to use puppeteer on the server-side
Using puppeteer on the server-side introduces a lot of complexity. On top of active WebRTC connections to multiple clients you now also have to manage one browser (or one tab at least) per connection. So, not only do you have to think about what happens when the connection to the clients fails, but you also have to prepare for other scenarios like browser crashes, page crashes, WebGL support (per page), document in the browser not loading, memory/CPU usage of the browser instances, ...
That said, let's go over your approaches.
Approach 1: Running Tensorflow.js inside puppeteer
You should be able to get this running by using only the cpu backend. You can set the backend like this before using any other code:
tf.setBackend('cpu');
You might also be able to get WebGL running (as you are not the only one having problems with WebGL and puppeteer). But even if you get it running, you are now running a Node.js script to start a Chrome browser that starts a WebRTC session and Tensorflow.js training inside a website. Complexity-wise, this will be very hard to debug if any problems occur...
Approach 2: Transferring the data between puppeteer and Node.js
This approach will be nearly impossible without a large slowdown (regarding the sending and receiving of frames). puppeteer needs to serialize any exchanged data. There is no such thing as shared memory or shared data objects between the Node.js and the browser environment. This means you would have to serialize each frame (all the pixels...) to transfer them from the browser environment to Node.js. Performance-wise, this might work okay for small images, but will become worse the bigger your images are.
All in all, you are introducing a lot of complexity if you want to go with one of your two approaches. Therefore, let's look at the alternative.
Alternative approach: Send your video stream directly to your server
Instead of using puppeteer to establish a WebRTC connection, you can directly implement a WebRTC peer. I read form your question that you fear the complexity, but it is probably worth the hassle.
To implement a WebRTC server, you can use the library node-webrtc, which allows to implement a WebRTC peer on the server-side. There are multiple examples, of which one is very interesting for your use case. This is the video-compositing example, which establishes a connection between client (browser) and server (Node.js) to stream a video. Then the server will modify the sent frames and put a "watermark" on top of them.
Code Sample
The following code shows the most relevant lines from the video-compositing example. The code reads a frame from the input stream and creates a node-canvas object from it.
const lastFrameCanvas = createCanvas(lastFrame.width, lastFrame.height);
const lastFrameContext = lastFrameCanvas.getContext('2d', { pixelFormat: 'RGBA24' });
const rgba = new Uint8ClampedArray(lastFrame.width * lastFrame.height * 4);
const rgbaFrame = createImageData(rgba, lastFrame.width, lastFrame.height);
i420ToRgba(lastFrame, rgbaFrame);
lastFrameContext.putImageData(rgbaFrame, 0, 0);
context.drawImage(lastFrameCanvas, 0, 0);
You now have a canvas object, which you can use the feed into the PoseNet like this:
const net = await posenet.load();
// ...
const input = tf.browser.fromPixels(lastFrameCanvas);
const pose = await net.estimateSinglePose(input, /* ... */);
The resulting data now needs to be transferred back to the client which can be done by using a data channel. There is also an example (ping-pong) in the repository regarding that, which is much simpler than the video example.
Although you might fear the complexity of using node-webrtc, I recommend giving this approach and node-webrtc-examples a try. You can check out the repository first. All examples are ready to try and play around with.

What does "Cannot write to closing transport" mean?

I get the exception "Cannot write to closing transport" raised from aiohttp.http_writer.StreamWriter#_write, but only in a fraction of cases.
The relevant snippet.
session: aiohttp.ClientSession
async with session.get(url, timeout=60) as response:
txt = await response.text()
response.close()
return txt
What is going on? I don't think the server-size is closing the socket.
Answer: We should create a new session on each request. There is also no need to close the response() explicitly, as the context manager handles that.
async with aiohttp.ClientSession().get(url, timeout=60) as response:
txt = await response.text()
return txt
It means that your connection already closed. It occurs when client break connection, but server still tried respond him.
Remove response.close() from your code.
This happens if a client prematurely disconnects before reading some or all of the response. You may encounter that case frequently if you're either dealing with mobile clients (which may switch between WiFi and mobile networks) or if you have views that take some time, but clients have a lower timeout. Since you can't control how clients talk to your service, it's probably safe to ignore this.
aiohttp 3.6.0 introduces code to slience this exception

How can I monitor all the outgoing ajax request with selenium-webdriver [IN NODEJS]

I have seen some answers in JAVA but I couldn't understand them and Selenium's documentation in NodeJs is abysmal.
I am scraping a webpage which has just few lines (15-20) of source code and everything else is run by JS. The webpage makes certain request with specific queries (which changes from time to time). I want to monitor all the outgoing ajax request on this page.
If there are better alternatives than Selenium please help me.
const {Builder, By, Key, until} = require('selenium-webdriver');
let driver = new Builder()
.forBrowser('chrome')
.build();
driver.get('https://example.com')
// driver monitor all outgoing ajax request of example.com

Resources