Why is Chrome treating this file as document, while Firefox as Image? - node.js

I have a download GET endpoint in my express app. For now it simply reads a file from the file system and streams it after setting some headers.
When i open the endpoint in Chrome, I can see that this is treated as a "document", while in Firefox it is being treated as type png.
I can't seem to understand why it is being treated differently.
Chrome: title bar - "download"
Firefox: title bar - "image name"
In Chrome, this also leads to no caching of the image if I refresh the address bar.
In Firefox it is being cached just fine.
This is my express code:
app.get("/download", function(req, res) {
let file = `${__dirname}/graph-colors.png`;
var mimetype = "image/png";
res.set("Content-Type", mimetype);
res.set("Cache-Control", "public, max-age=1000");
res.set("Content-Disposition", "inline");
res.set("Vary", "Origin");
var filestream = fs.createReadStream(file);
filestream.pipe(res);
});
Also attaching images for Browser network tabs.

This are all to do with the behaviors of Chrome, you can test on another site like Example.png on Wikipedia.
Chrome always treats the "thing" you opened in the address bar as document, ignoring what it really is. You can even test loading a css and it will read document.
For title, it reads download because your path is /download, you cannot change it according to this SO thread.
For caching, Chrome apparently ignores the cache when you are reloading, anything, page or image. You can try using the Wiki example.png, you will get 304 instead of "(from cache)". (304 means the request is sent, and the server has implemented ETag, if-none-match or similar technique)

Related

How to disable or cancel all downloads

I'm currently scraping a public webpage in the event that it goes down, and this site has some files where opening them in Chromium will usually download the file to your downloads folder automatically. For example, accessing https://www.7-zip.org/a/7z2201-x64.exe downloads a file instead of showing you the binary.
My code is really complicated, but what the main part of the code is, is this:
const page = await browser.newPage();
page.on("response", async response => {
// saves the file to a place I want it, but doesn't cancel the chrome-based download.
buffer = Buffer.from(new Uint8Array(await page.evaluate(function(x:string) {
return fetch(x).then(r=>r.arrayBuffer())
}, response.url())));
fs.writeFileSync('path', buffer);
return void 0;
});
await page.goto('https://www.7-zip.org/a/7z2201-x64.exe', { waitUntil: "load", timeout: 120000 });
I can't just assume the mime type either, the page could go to any URL from an html file to a zip file, so is it possible to disable downloads or rewire it to /dev/null? I've looked into response intercepting and it doesn't seem to be a thing based on this.
After reading a bit about /dev/null and seeing this answer, I figured out that I can do this:
const page = await browser.newPage();
await (await page.target().createCDPSession()).send("Page.setDownloadBehavior", {
behavior: "deny",
downloadPath: "/dev/null"
});
Setting the download path to /dev/null is redundant, but if you don't want to fully deny the download behavior for a tab but also don't want it going to your downloads folder, /dev/null will essentially delete whatever it receives on the spot.
I set the download behavior before navigating to a page, and this also focusing on behavior related to Chromium-based browsers, not just mime types.

Selenium firefox webdriver: set download.dir for a pdf

I tried several solution around nothing really works or they are simply outdated.
Here my webdriver profile
let firefox = require('selenium-webdriver/firefox');
let profile = new firefox.Profile();
profile.setPreference("pdfjs.disabled", true);
profile.setPreference("browser.download.dir", 'C:\\MYDIR');
profile.setPreference("browser.helperApps.neverAsk.saveToDisk", "application/pdf","application/x-pdf", "application/acrobat", "applications/vnd.pdf", "text/pdf", "text/x-pdf", "application/vnd.cups-pdf");
I simply want to download a file and set the destination path. It looks like browser.download.dir is ignored.
That's the way I download the file:
function _getDoc(i){
driver.sleep(1000)
.then(function(){
driver.get(`http://mysite/pdf_showcase/${i}`);
driver.wait(until.titleIs('here my pdf'), 5000);
})
}
for(let i=1;i<5;i++){
_getDoc(i);
}
The page contains an iframe with a pdf. I can gathers the src attribute of it, but with the iframe and pdfjs.disabled=true simply visits the page driver.get() causes the download (so I'm ok with it).
The only problem is the download dir is ignored and the file is saved in the default download firefox dir.
Side question: if I wrap _getDoc() in a for loop for that parameter i how can I be sure I won't flood the server? If I use the same driver instance (just like everyone usually does) the requests are sequentials?

NodeJs web crawler file extension handling

I'm developing a web crawler in nodejs. I've created a unique list of the urls in the website crawle body. But some of them have extensions like jpg,mp3, mpeg ... I want to avoid crawling those who have extensions. Is there any simple way to do that?
Two options stick out.
1) Use path to check every URL
As stated in comments, you can use path.extname to check for a file extension. Thus, this:
var test = "http://example.com/images/banner.jpg"
path.extname(test); // '.jpg'
This would work, but this feels like you'll wind up having to create a list of file types you can crawl or you must avoid. That's work.
Side note -- be careful using path. Typically, url is your best tool for parsing links because path is aimed at files/directories, not urls. On some systems (Windows), using path to manipulate a url can result in drama because of the slashes involved. Fair warning!
2) Get the HEAD for each link & see if content-type is set to text/html
You may have reasons to avoid making more network calls. If so, this isn't an option. But if it is OK to make additional calls, you could grab the HEAD for each link and check the MIME type stored in content-type.
Something like this:
var headersOptions = {
method: "HEAD",
host: "http://example.com",
path: "/articles/content.html"
};
var req = http.request(headersOptions, function (res) {
// you will probably need to also do things like check
// HTTP status codes so you handle 404s, 301s, and so on
if (res.headers['content-type'].indexOf("text/html") > -1) {
// do something like queue the link up to be crawled
// or parse the link or put it in a database or whatever
}
});
req.end();
One benefit is that you only grab the HEAD, so even if the file is a gigantic video or something, it won't clog things up. You get the HEAD, see the content-type is a video or whatever, then move along because you aren't interested in that type.
Second, you don't have to keep track of file names because you're using a standard MIME type to differentiate html from other data formats.

Getting the mime type from a request in nodeJS

Im learning nodejs but I ran into a roadblock. Im trying to setup a simple server that will serve static files. My problem is that unless I explicitly type in the extension in the url, I cannot get the file extension. The httpheader 'content-type' comes in as undefined .
Here is my code, pretty simple:
var http = require("http"),
path = require("path"),
fs = require("fs");
var server = http.createServer(function(request, response){
console.log([path.extname(request.url), request.headers['content-type']]);
var fileName = path.basename(request.url) || "index.html",
filePath = __dirname + '/public/' + fileName;
console.log(filePath);
fs.readFile( filePath, function(err,data){
if (err) throw err
response.end(data);
});
})
server.listen(3000)
Any ideas?
FYI I dont just wanna dive into connect or other, I wanna know whats going on before I drop the grind and go straight to modules.
So static web servers generally don't do any deep magic. For example, nginx has a small mapping of file extensions to mime types here: http://trac.nginx.org/nginx/browser/nginx/conf/mime.types
There's likely also some fallback logic defaulting to html. You can also use a database of file "magic numbers" as is used by the file utility to look at the beginning of the file data and guess based on that.
But there's no magic here. It's basically
go by the file extension when available
maybe go by the beginning of the file content as next option
use a default of html because normally only html resources have URLs with no extensions, whereas images, css, js, fonts, multimedia, etc almost always do use file extensions in their URIs
Also note that browsers generally have fairly robust set of checks that will intepret files correctly even when Content-Type headers are mismatched with the actual response body data.

Unable to Change Favicon with Express.js

This is a really basic question, but I'm trying to change the favicon of my node.js/Express app with
app.use(express.favicon(__dirname + '/public/images/favicon.ico'));
and I'm still getting the default favicon. This is in my app.configure function, and yes, I've verified that there is a favicon.ico in the /public/images/favicon.ico.There's nothing about a favicon.ico in the console, either, which leads me to believe that this line of code is being ignored. Everything else in the function (setting port, setting views directory, setting template engine. etc.) seems to be working fine, so why would this line of code not be executing?
What I tried
Emptying browser cache
Restarting Terminal and running node app.js again
Adding { maxAge: 2592000000 }, as described here
Thanks in advance.
Update: I got it to work. See my answer below for more information.
I tried visiting the site in Safari for the first time (I normally use Chrome) and noticed that it was showing the correct favicon. I tried clearing my cache in Chrome again (twice) to no avail, but after more searching, I found that apparently favicons aren't stored in the cache. I "refreshed my favicon" using the method described here and it worked!
Here's the method (modified from the above link), in case the link goes dead:
Open Chrome/the problematic browser
Navigate to the favicon.ico file directly, e.g. http://localhost:3000/favicon.ico
Refresh the favicon.ico URL by pressing F5 or the appropriate browser Refresh (Reload) button
Close the browser and open your website - voila, your favicon has been updated!
What worked for me finally:
Look that the
app.use(express.favicon(__dirname + '/public/images/favicon.ico'));
is at the beginning of the app configuration function. I had it before at the end. As the Express doc says: 'The order of which middleware are "defined" using app.use() is very important, they are invoked sequentially, thus this defines middleware precedence.'
I didn't need to set any maxAge.
To test it:
Restart the server with node app.js
Clear the browser cache
Refresh the Favicon with directly accessing it by using "localhost:3000/your_path_to_the favicon/favicon.ico" and reloading
The above answer is no longer valid.
If you use
app.use(express.favicon(__dirname + '/public/images/favicon.ico'));
You'll get this error:
Error: Most middleware (like favicon) is no longer bundled with Express and must be installed separately
What you're going to need to do is get serve-favicon.
run
npm install serve-favicon --save
then add this to your app
var express = require('express');
var favicon = require('serve-favicon');
var app = express();
app.use(favicon(__dirname + '/public/images/favicon.ico'));
smiley favicon to prevent error:
var favicon = new Buffer('AAABAAEAEBAQAAAAAAAoAQAAFgAAACgAAAAQAAAAIAAAAAEABAAAAAAAgAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAA/4QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEREQAAAAAAEAAAEAAAAAEAAAABAAAAEAAAAAAQAAAQAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAEAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAA//8AAP//AAD8HwAA++8AAPf3AADv+wAA7/sAAP//AAD//wAA+98AAP//AAD//wAA//8AAP//AAD//wAA', 'base64');
app.get("/favicon.ico", function(req, res) {
res.statusCode = 200;
res.setHeader('Content-Length', favicon.length);
res.setHeader('Content-Type', 'image/x-icon');
res.setHeader("Cache-Control", "public, max-age=2592000"); // expiers after a month
res.setHeader("Expires", new Date(Date.now() + 2592000000).toUTCString());
res.end(favicon);
});
to change icon in code above
make an icon maybe here: http://www.favicon.cc/ or here :http://favicon-generator.org
convert it to base64 maybe here: http://base64converter.com/
then replace the icon base 64 value
general information how to create a personalized fav icon
icons are made using photoshop or inkscape, maybe inkscape then photoshop for vibrance and color correction (in image->adjustments menu).
for quick icon goto http://www.clker.com/ and pick some Vector Clip Arts, and download as svg.
then bring it to inkscape and change colors or delete parts, maybe add something from another vector clipart image, then to export select the parts to export and click file>export, pick size like 16x16 for favicon or 32x32, for further edit 128x128 or 256x256. ico package can have several icon sizes inside. it can have along with 16x16 pixel fav icon a high quality icons for link for the website.
then maybe enhance the imaage in photoshop. like vibrance bivel round mask , anything.
then upload this image to one of the websites that generate favicons.
there are also programs for windows for editing icons(search like "windows icon editor opensource", figure our how to create two images of different size inside a single icon).
to add the favicon to website. just put the favicon.ico as a file in your root domain files folder. for example in nodejs in public folder that contans the static files. it doesn't have to be anything special like code above just a simple file.
What worked for me follows. Set express to serve your static resources as usual, for example
app.use(express.static('public'));
Put favicon inside your public folder; Then add a query string to you icon url, for example
<link rel="icon" type="image/x-icon" href="favicon.ico?v="+ Math.trunc(Math.random()*999)>
In this case, Chrome is the misbehaving Browser; IE. Firefox. Safari (all on Windows) worked fine, WITHOUT the above trick.
Simplest way I could come up with (valid only for local dev, of course) was to host the server on a different port
PORT=3001 npm run start
Have you tried clearing your browser's cache? Maybe the old favicon is still in the cache.
How to do this without express:
if (req.method == "GET") {
if (/favicon\.ico/.test(req.url)) {
fs.readFile("home/usr/path/favicon.ico", function(err, data) {
if (err) {
console.log(err);
} else {
res.setHeader("Content-Type","image/x-icon");
res.end(data);
}
});
}

Resources