Intercept document download with Puppeteer and Extract CSV Data - node.js

I would like to download a .csv file from the browser, intercept it, and extract the data to convert it into a JSON object. Most responses say to use the requests.buffer(), however, my situation is unique as it always says the buffer is empty, but the file downloads.
I have tried to pull requests.buffer()
downloadPage.on('request', request => {
console.log(request.isNavigationRequest());
console.log(nextRequest);
if (request.isNavigationRequest() && !nextRequest) {
return request.abort();
}
initialRequest = false;
request.continue();
});
downloadPage.on('response', async (response) => {
console.log(response.buffer());
file_data = JSON.parse(response.buffer());
})
await Promise.all([
downloadPage.goto('https://clients.messagelabs.com/Tools/Track-And-Trace/DownloadCsv.ashx?sessionid=' + json_data.request.SessionId).catch(err => console.log(err)),
page.waitForNavigation()
])
Since I am on a corporate network, it won't let me upload my images, but...
When I proceed to the link above on downloadPage.goto(...) it automatically begins a .csv file download. Than the page closes. I think the page closing is clearing the buffer, however, I can't seem to intercept the response to grab the file data before this happens. Any ideas are appreciated.
Please do not link me to another github that tells me to use the request.buffer(), as I have tried many variations.
Error: Protocol error (Network.getResponseBody): No data found for resource with given identifier

Related

Saving image in local file system using node.js

I was working on a simple code to download missing images from a site and save it in the local system, given its complete URL. I am able to get the data in binary format as response and also I am able to save it properly. But when I try to open the image it shows the format is not supported by the system. I tried to save some js and css file and they are being saved properly and I am able to view them as well. But I am having problem with all the image formats.
Here is the code I wrote:
try {
response = await axios.get(domain + pathOfFile);
console.log(response);
fs.writeFile(localBasePath + pathOfFile, response.data, "binary", (error) => {
if (error) console.log("error while writting file", error.message);
});
} catch (error) {
console.log("error in getting response", error.message);
}
domian: contains the base domain of the site
pathOfFile: contains the path of file on that domain
localBasePath: the base folder where I need to store the image
I even tried to store the response in a buffer and then tried to save the image, but still I am facing the same problem.
Any suggestions would be appreciated.
You need to define responseEncoding while calling axios.get method.
Change your line to:
response = await axios.get(domain + pathOfFile, {responseEncoding: "binary"});

downloading an image using https.get in node produces a corrupt image when using in pdflatex

I have set up a node server that handles requests to convert latex to a rendered pdf using PDFLateX. If the document requires extra assets, e.g. images I download them first and then start the pdflatex process. If the images are downloaded using the http.get client and saved using fs.createWriteStream they appear correctly in the final file.
This Url for example works fine: image over https
Now if I serve the same image but over http: image over http it corrupts the file that is included in the final pdf. The Image would be greatly distorted, but displayed. The File that is written to the filesystem is correct and is not corrupted however.
The code that downloads the file looks like this:
const http = require('http');
const https = require('https');
const fs = require('fs-promise');
/**
* Downloads a single Asset
*
* #param url Url to an Image
* #param dest Path where it will be saved
* #returns {Promise}
*/
function download (url, dest) {
console.log('Downloading Asset from: ' + url + ' to ' + dest);
return new Promise((resolve, reject) => {
let file = fs.createWriteStream(dest);
let link = new URL(url);
let client = (link.protocol.includes('https')) ? https : http;
client.get(url, function(response) {
response.pipe(file);
file.on('close', function () {
setTimeout(function () {
resolve();
},10)
});
}).on('error', function(err) {
fs.unlink(dest);
reject();
});
});
}
As you can see I even tried a timeout to delay the resolve event, just in case 'close' gets fired a bit early. But this does not help.
I start the PdfLateX process after all Download Processes are done, e.g. after the function below resolves.
/**
* Downloads all needed Assets to /assets/
*
* #param assetUrls Array of Objects that contain urls and names
* #returns {Promise}
*/
function downloadAssets(assetUrls){
return new Promise((resolve,reject) => {
let requests = [];
if(assetUrls){
assetUrls = JSON.parse(assetUrls);
for(let asset of assetUrls){
requests.push(download(asset.url, './assets/' + asset.name));
}
}
Promise.all(requests)
.catch((err) => {
console.log(err);
reject('Download of one or more assets failed!');
})
.then(() => {
resolve();
});
})
}
This is running with Node 12.16.2 inside a Docker Container.
I tried several methods of downloading the Images, but every time the Image is served over https it ends up corrupted in the final Pdf, but intact when looking at id directly in the file system.
One more thing to note is, that the image always corrupts in the same way. There is no variation, a specific image served over https will always break in the same way.
Any clue what could cause this would be greatly appreciated, as I could not find any solutions when searching the web.
This has sorted itself out. The problem did not lie in the creation of the document, but rather how i passed the data on between services. On one point I had a filter in place where raw text data whould be parsed for php variables starting with $. By mistake the raw pdf data went through this filter and thus got corrupted. I assume only images downloaded via https contain $ signs in the raw data string, thus the corruption only appears when using https.

Trouble downloading mp3 files from S3 using Amplify/Node

I'm quite confused on how to use the Amplify library to actually download an mp3 file stored in my s3 bucket. I am able to list the bucket contents and parse it all out into a tree viewer for users to browse the various files, but once I select a file I can't get it to trigger a download.
I'm confident my amplify configuration is correct since I can see all my expected directories and when I select the file I want to download, I see the response size being correct:
You can see it takes 2+ seconds and appears to be downloading the data/mp3 file, but the user is never prompted to save the file and it's not in my Downloads folder.
Here is a capture of my file metadata setup from my bucket:
And the method I'm calling:
getFile (fileKey) {
Storage.get(fileKey, {download: true})
}
Without the "download : true" configuration, I get the verified URL back in the response. I'd like to avoid making a 2nd request using that URL download the file if possible. Anything else I may have missed? Is it better for s3 operations to go back to the standard aws-sdk? Thanks in advance!
I ended up using a combination of this answer:
https://stackoverflow.com/a/36894564
and this snippet:
https://gist.github.com/javilobo8/097c30a233786be52070986d8cdb1743
So the file gets downloaded in the response data(result), I added more meta data tags to the files to get the file name and title. Finally adding the link to the DOM and executing a click() on it saves the file named correctly. Full solution below:
getFile (fileKey) {
Storage.get(fileKey, {download: true}).then(result => {
console.log(result)
let mimeType = result.ContentType
let fileName = result.Metadata.filename
if (mimeType !== 'audio/mp3') {
throw new TypeError("Unexpected MIME Type")
}
try {
let blob = new Blob([result.Body], {type: mimeType})
//downloading the file depends on the browser
//IE handles it differently than chrome/webkit
if (window.navigator && window.navigator.msSaveOrOpenBlob) {
window.navigator.msSaveOrOpenBlob(blob, fileName)
} else {
let objectUrl = URL.createObjectURL(blob);
let link = document.createElement('a')
link.href = objectUrl
link.setAttribute('download', fileName)
document.body.appendChild(link)
link.click()
document.body.removeChild(link)
}
} catch (exc) {
console.log("Save Blob method failed with the following exception.");
console.log(exc);
}
})
}
}

Internal server error om Azure when writing file from buffer to filesystem

Context
I am working on a Proof of Concept for an accounting bot. Part of the solution is the processing of receipts. User makes picture of receipt, bot asks some questions about it and stores it in the accounting solution.
Approach
I am using the BotFramework nodejs example 15.handling attachments that loads the attachment into an arraybuffer and stores it on the local filesystem. Ready to be picked up and send to the accounting software's api.
async function handleReceipts(attachments) {
const attachment = attachments[0];
const url = attachment.contentUrl;
const localFileName = path.join(__dirname, attachment.name);
try {
const response = await axios.get(url, { responseType: 'arraybuffer' });
if (response.headers['content-type'] === 'application/json') {
response.data = JSON.parse(response.data, (key, value) => {
return value && value.type === 'Buffer' ? Buffer.from(value.data) : value;
});
}
fs.writeFile(localFileName, response.data, (fsError) => {
if (fsError) {
throw fsError;
}
});
} catch (error) {
console.error(error);
return undefined;
}
return (`success`);
}
Running locally it all works like a charm (also thanks to mdrichardson - MSFT). Stored on Azure, I get
There was an error sending this message to your bot: HTTP status code InternalServerError
I narrowed the problem down to the second part of the code. The part that write to the local filesystem (fs.writefile). Small files and big files result in the same error on Azure.fs.writefile seams unable to find the file
What is happpening according to stream logs:
Attachment uploaded by user is saved on Azure
{ contentType: 'image/png',contentUrl:
'https://webchat.botframework.com/attachments//0000004/0/25753007.png?t=< a very long string>',name: 'fromClient::25753007.png' }
localFilename (the destination of the attachment) resolves into
localFileName: D:\home\site\wwwroot\dialogs\fromClient::25753007.png
Axios loads the attachment into an arraybuffer. Its response:
response.headers.content-type: image/png
This is interesting because locally it is 'application/octet-stream'
fs throws an error:
fsError: Error: ENOENT: no such file or directory, open 'D:\home\site\wwwroot\dialogs\fromClient::25753007.png
Some assistance really appreciated.
Removing ::fromClient prefix from attachment.name solved it. As #Sandeep mentioned in the comments, the special characters where probably the issue. Not sure what its purpose is. Will mention it in the Botframework sample library github repository.
[update] team will fix this. Was caused by directline service.

Display video from Gridfs storage in react app

I am using multer-gridfs-storage and gridfs-stream to store my video in the backend (Express/Node). When I try to retrieve the file to play on my front end (React) the player refuses to recognize the source.
I am using Video-React to display the video on download. The download is successful, I get a Binary string back from the backend, which I converted to a Blob.
try{
fileBlob = new Blob([res.data], {type : res.headers['content-type']});
}catch(err){
console.log('Error converting to blob');
console.log(err);
}
This is my Video-React player being rendered
<Player
autoPlay
ref="player"
>
<source src={this.state.fileURL} />
<ControlBar autoHide={false} />
</Player>
Then I tried two techniques
readDataAsURL
let reader = new FileReader();
reader.onload = function(event){
//rThis is just a reference to the parent function this
rThis.setState({fileURL: reader.result}, () => {
rThis.refs.player.load();
});
}
try{
reader.readAsDataURL(fileBlob);
}catch(err){
console.log('Error trying readDataURL');
console.log(err);
}
src is being set correctly but the video never loads
URL.createObjectURL
let vidURL = URL.createObjectURL(fileBlob);
rThis.setState({fileURL: vidURL}, () => {
rThis.refs.player.load();
});
src is set to a blob: url but still nothing
Is this an issue with Video-react or should I be doing something else? Any pointers to references I could look at will also help. What am I doing wrong? dataURL works in the case of images, I checked, but not video.
So after some more reading, I finally figured out the problem. Since I'm using gridfs-stream I'm actually piping the response from the server. So I was never getting the whole file, and trying to convert res.data, which is just a chunk, was a mistake. Instead, in my res object, I found the source url within the config property.
res.config.url
This contained my source url to which my server was piping the chunks. Should have figured it out earlier, considering I picked GridFS storage for precisely this reason.

Resources