Chrome Extension - make a full page screenshot - TypeError: chrome.tabs.captureTab is not a function - google-chrome-extension

I'm trying to capture a full page screenshot of a web page through a chrome extension. I have managed to capture a screenshot with chrome.tabs.captureVisibleTab but I'd like to capture the full page.
I have tried chrome.tabs.captureTab but I always end up with the same error message:
TypeError: chrome.tabs.captureTab is not a function
Here is the code:
async function createTab (url) {
return new Promise(resolve => {
chrome.tabs.create({url}, tab => resolve(tab));
});
}
(async() => {
let tab = await createTab('https://www.laurentwillen.be');
chrome.tabs.captureTab(tab.id, { format: "png" }, function(image) {
chrome.downloads.download({
url: image,
filename: "screenshot.png",
saveAs: false
});
});
})();
I'm using Chrome 108 under Windows. Is the chrome.tabs.captureTab supported by this version ?
Thanks
Laurent

Related

Blocking specific resources (css, images, videos, etc) using crawlee and playwright

I'm using crawlee#3.0.3 (not released yet, from github), and I'm trying to block specific resources from loading with playwrightUtils.blockRequests (which isn't available in previous versions). When I try the code suggested in the official repo, it works as expected:
import { launchPlaywright, playwrightUtils } from 'crawlee';
const browser = await launchPlaywright();
const page = await browser.newPage();
await playwrightUtils.blockRequests(page, {
// extraUrlPatterns: ['adsbygoogle.js'],
});
await page.goto('https://cnn.com');
await page.screenshot({ path: 'cnn_no_images.png' });
await browser.close();
I can see that the images aren't loaded from the screenshot. My problem has to do with the fact that I'm using PlaywrightCrawler:
const crawler = new PlaywrightCrawler({
maxRequestsPerCrawl: 3,
async requestHandler({ page, request }) {
console.log(`Processing: ${request.url}`);
await playwrightUtils.blockRequests(page);
await page.screenshot({ path: 'cnn_no_images2.png' });
},
});
This way, I'm not able to block specific resources, and my guess is that blockRequests needs launchPlaywright to work, and I don't see a way to pass that to PlaywrightCrawler.blockRequests has been available for puppeteer, so maybe someone has tried this before.
Also, i've tried "route interception", but again, I couldn't make it work with PlaywrightCrawler.
you can set any listeners or code before navigation by using preNavigationHooks like this:
const crawler = new PlaywrightCrawler({
maxRequestsPerCrawl: 3,
preNavigationHooks: [async ({ page }) => {
await playwrightUtils.blockRequests(page);
}],
async requestHandler({ page, request }) {
console.log(`Processing: ${request.url}`);
await page.screenshot({ path: 'cnn_no_images2.png' });
},
});

how to keep track of changes of an .aspx page using nodejs?

Imagine keeping track of a page like this? (Open with Chrome, then right click and select Translate to English.)
http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042
When you press F12 and select the Network tab, note that responses are returning—with an interval of about 1 per second—containing the last prices and trades, with these HTTP header details:
{
...
connection: keep-alive
cookies: fooCookie
...
}
I have tried the GOT package with a keep-alive config:
const gotOption = {
keepAlive: true,
maxSockets: 10,
}
await got.get(url, {
agent: {
http: new HttpAgent(gotOption),
https: new HttpsAgent(gotOption),
},
})
I get just the first response, but how can I get new responses?
Is it possible to use Puppeteer for this purpose?
Well, there is a new xhr request being made every 3 to 5 seconds.
You could run a function triggering on that specific event. Intercepting .aspx responses and running your script. Here is a minimal snipet.
let puppeteer = require(`puppeteer`);
(async () => {
let browser = await puppeteer.launch({
headless: true,
});
let page = await browser.newPage(); (await browser.pages())[0].close();
let res = 0;
page.on('response', async (response) => {
if (response.url().includes(`.aspx`)) {
res++;
console.log(`\u001b[1;36m` + `Response ${res}: ${new Date(Date.now())}` + `\u001b[0m`);
};
});
await page.goto('http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042');
//await browser.close();
})();

Chrome Extension - Monitoring network traffic with body data in background [duplicate]

It seems to be difficult problem (or impossible??).
I want to get and read HTTP Response, caused by HTTP Request in browser, under watching Chrome Extension background script.
We can get HTTP Request Body in this way
chrome.webRequest.onBeforeRequest.addListener(function(data){
// data contains request_body
},{'urls':[]},['requestBody']);
I also checked these stackoverflows
Chrome extensions - Other ways to read response bodies than chrome.devtools.network?
Chrome extension to read HTTP response
Is there any clever way to get HTTP Response Body in Chrome Extension?
I can't find better way then this anwser.
Chrome extension to read HTTP response
The answer told how to get response headers and display in another page.But there is no body info in the response obj(see event-responseReceived). If you want to get response body without another page, try this.
var currentTab;
var version = "1.0";
chrome.tabs.query( //get current Tab
{
currentWindow: true,
active: true
},
function(tabArray) {
currentTab = tabArray[0];
chrome.debugger.attach({ //debug at current tab
tabId: currentTab.id
}, version, onAttach.bind(null, currentTab.id));
}
)
function onAttach(tabId) {
chrome.debugger.sendCommand({ //first enable the Network
tabId: tabId
}, "Network.enable");
chrome.debugger.onEvent.addListener(allEventHandler);
}
function allEventHandler(debuggeeId, message, params) {
if (currentTab.id != debuggeeId.tabId) {
return;
}
if (message == "Network.responseReceived") { //response return
chrome.debugger.sendCommand({
tabId: debuggeeId.tabId
}, "Network.getResponseBody", {
"requestId": params.requestId
}, function(response) {
// you get the response body here!
// you can close the debugger tips by:
chrome.debugger.detach(debuggeeId);
});
}
}
I think it's useful enough for me and you can use chrome.debugger.detach(debuggeeId)to close the ugly tip.
sorry, mabye not helpful... ^ ^
There is now a way in a Chrome Developer Tools extension, and sample code can be seen here: blog post.
In short, here is an adaptation of his sample code:
chrome.devtools.network.onRequestFinished.addListener(request => {
request.getContent((body) => {
if (request.request && request.request.url) {
if (request.request.url.includes('facebook.com')) {
//continue with custom code
var bodyObj = JSON.parse(body);//etc.
}
}
});
});
This is definitely something that is not provided out of the box by the Chrome Extension ecosystem. But, I could find a couple of ways to get around this but both come with their own set of drawbacks.
The first way is:
Use a content script to inject our own custom script.
Use the custom script to extend XHR's native methods to read the response.
Add the response to the web page's DOM inside a hidden (not display: none) element.
Use the content script to read the hidden response.
The second way is to create a DevTools extension which is the only extension that provides an API to read each request.
I have penned down both the methods in a detailed manner in a blog post here.
Let me know if you face any issues! :)
To get a XHR response body you can follow the instructions in this answer.
To get a FETCH response body you can check Solution 3 in this article and also this answer. Both get the response body without using chrome.debugger.
In a nutshell, you need to inject the following function into the page from the content script using the same method used for the XHR requests.
const constantMock = window.fetch;
window.fetch = function() {
return new Promise((resolve, reject) => {
constantMock.apply(this, arguments)
.then((response) => {
if (response) {
response.clone().json() //the response body is a readablestream, which can only be read once. That's why we make a clone here and work with the clone
.then( (json) => {
console.log(json);
//Do whatever you want with the json
resolve(response);
})
.catch((error) => {
console.log(error);
reject(response);
})
}
else {
console.log(arguments);
console.log('Undefined Response!');
reject(response);
}
})
.catch((error) => {
console.log(error);
reject(response);
})
})
}
If response.clone().json() does not work, you can try response.clone().text()
I show my completed code if it can be some help. I added the underscore to get the request url, thanks
//background.js
import _, { map } from 'underscore';
var currentTab;
var version = "1.0";
chrome.tabs.onActivated.addListener(activeTab => {
currentTab&&chrome.debugger.detach({tabId:currentTab.tabId});
currentTab = activeTab;
chrome.debugger.attach({ //debug at current tab
tabId: currentTab.tabId
}, version, onAttach.bind(null, currentTab.tabId));
});
function onAttach(tabId) {
chrome.debugger.sendCommand({ //first enable the Network
tabId: tabId
}, "Network.enable");
chrome.debugger.onEvent.addListener(allEventHandler);
}
function allEventHandler(debuggeeId, message, params) {
if (currentTab.tabId !== debuggeeId.tabId) {
return;
}
if (message === "Network.responseReceived") { //response return
chrome.debugger.sendCommand({
tabId: debuggeeId.tabId
}, "Network.getResponseBody", {
"requestId": params.requestId
//use underscore to add callback a more argument, passing params down to callback
}, _.partial(function(response,params) {
// you get the response body here!
console.log(response.body,params.response.url);
// you can close the debugger tips by:
// chrome.debugger.detach(debuggeeId);
},_,params));
}
}
I also find there is a bug in chrome.debugger.sendCommand. If I have two requests with same URI but different arguments. such as:
requests 1:https://www.example.com/orders-api/search?limit=15&offer=0
requests 2:https://www.example.com/orders-api/search?limit=85&offer=15
The second one will not get the corrected responseBody, it will show:
Chrome Extension: "Unchecked runtime.lastError: {"code":-32000,"message":"No resource with given identifier found"}
But I debugger directly in background devtools, it get the second one right body.
chrome.debugger.sendCommand({tabId:2},"Network.getResponseBody",{requestId:"6932.574"},function(response){console.log(response.body)})
So there is no problem with tabId and requestId.
Then I wrap the chrome.debugger.sendCommand with setTimeout, it will get the first and second responseBody correctly.
if (message === "Network.responseReceived") { //response return
console.log(params.response.url,debuggeeId.tabId,params.requestId)
setTimeout(()=>{
chrome.debugger.sendCommand({
tabId: debuggeeId.tabId
}, "Network.getResponseBody", {
"requestId": params.requestId
//use underscore to add callback a more argument, passing params down to callback
}, _.partial(function(response,params,debuggeeId) {
// you get the response body here!
console.log(response.body,params.response.url);
// you can close the debugger tips by:
// chrome.debugger.detach(debuggeeId);
},_,params,debuggeeId));
},800)
}
I think the setTimeout is not the perfect solution, can some one give help?
thanks.

Getting changes in object with puppeteer

I'm trying to learn how to track changes in a div. I found a post that showed the following code:
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.exposeFunction('onCustomEvent', text => console.log(text));
await page.goto('https://www.time.ir', {waitUntil: 'networkidle0'});
await page.evaluate(() => {
$('#digitalClock').bind("DOMSubtreeModified", function(e) {
window.onCustomEvent(e.currentTarget.textContent.trim());
});
});
})();
When running this it pulls the time from the webpage and every second console.logs the new time - exactly what I'm looking for. However I'm having issues with any other page for some reason. For example, the very similar code below gives me an error:
'node:1801) UnhandledPromiseRejectionWarning: Error: Evaluation failed: ReferenceError: $$ is not defined'
await page.exposeFunction('onCustomEvent', text => console.log(text));
await page.goto('https://www.clocktab.com', {waitUntil: 'networkidle0'});
await page.evaluate(() => {
$('#digit2').bind("DOMSubtreeModified", function(e) {
window.onCustomEvent(e.currentTarget.textContent.trim());
});
});
I'm not sure the difference between them other than the page I navigate to, and the element that I'm looking at to find the changing value. Additionally, I did read somewhere that DOMSubtreeModified is deprecated now, so if there's a better way to get what I'm looking for that would be great!
Thanks in advance
The difference is that in the second website there is not jquery, and when you send the evaluation function, $ is not defined.
Replace with vanilla js:
document.querySelector('#digit2').addEventListener ("DOMSubtreeModified", function(e) {
window.onCustomEvent(e.currentTarget.textContent.trim());
})
Suggestion: when i debug with the puppeeter evaluation function i copy-paste this on my browser console in the page. For example:

Open Puppeteer with specific configuration (download PDF instead of PDF viewer)

I would like to open Chromium with a specific configuration.
I am looking for the configuration to activate the following option :
Settings => Site Settings => Permissions => PDF documents => "Download PDF files instead of automatically openning them in Chrome"
I searched the tags on this command line switch page but the only parameter that deals with pdf is --print-to-pdf which does not correspond to my need.
Do you have any ideas?
There is no option you can pass into Puppeteer to force PDF downloads. However, you can use chrome-devtools-protocol to add a content-disposition: attachment response header to force downloads.
A visual flow of what you need to do:
I'll include a full example code below. In the example below, PDF files and XML files will be downloaded in headful mode.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Fetch.enable', {
patterns: [
{
urlPattern: '*',
requestStage: 'Response',
},
],
});
await client.on('Fetch.requestPaused', async (reqEvent) => {
const { requestId } = reqEvent;
let responseHeaders = reqEvent.responseHeaders || [];
let contentType = '';
for (let elements of responseHeaders) {
if (elements.name.toLowerCase() === 'content-type') {
contentType = elements.value;
}
}
if (contentType.endsWith('pdf') || contentType.endsWith('xml')) {
responseHeaders.push({
name: 'content-disposition',
value: 'attachment',
});
const responseObj = await client.send('Fetch.getResponseBody', {
requestId,
});
await client.send('Fetch.fulfillRequest', {
requestId,
responseCode: 200,
responseHeaders,
body: responseObj.body,
});
} else {
await client.send('Fetch.continueRequest', { requestId });
}
});
await page.goto('https://pdf-xml-download-test.vercel.app/');
await page.waitFor(100000);
await client.send('Fetch.disable');
await browser.close();
})();
For a more detailed explanation, please refer to the Git repo I've setup with comments. It also includes an example code for playwright.
Puppeteer currently does not support navigating (or downloading) PDFs
in headless mode that easily. Quote from the docs for the page.goto function:
NOTE Headless mode doesn't support navigation to a PDF document. See the upstream issue.
What you can do though, is detect if the browser is navigating to the PDF file and then download it yourself via Node.js.
Code sample
const puppeteer = require('puppeteer');
const http = require('http');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('request', req => {
if (req.url() === '...') {
const file = fs.createWriteStream('./file.pdf');
http.get(req.url(), response => response.pipe(file));
}
});
await page.goto('...');
await browser.close();
})();
This navigates to a URL and monitors the ongoing requests. If the "matched request" is found, Node.js will manually download the file via http.get and pipe it into file.pdf. Please be aware that this is a minimal working example. You want to catch errors when downloading and might also want to use something more sophisticated then http.get depending on the situation.
Future note
In the future, there might be an easier way to do it. When puppeteer will support response interception, you will be able to simply force the browser to download a document, but right now this is not supported (May 2019).

Resources