I am using QNetworkManager to fetch files from a server, however what I cannot figure out is if the files are compressed during the transfer with the standard gzip compression and if not how to get them to download compressed.
How would I go about checking?
I just ran a quick test by adding:
request.setRawHeader("Accept-Encoding", "gzip,deflate");
to the QNetworkRequest and the data returns what look compressed (because its ~20% smaller and unusable).
It appears that the QNetworkManager and the QNetworkReply are not intelligent as far as decompression is concerned. It looks like I have to implement a gzip and/or deflate on the returned QByteArray.
When you set a custom Accept-Encoding raw header on a QNetworkRequest object (for example via an overridden QNetworkAccessManager::createRequest() ), QtWebKit will never decompress the reply anymore: source code of qhttpnetworkconnection.cpp : ====================
// If the request had a accept-encoding set, we better not mess
// with it. If it was not set, we announce that we understand gzip
// and remember this fact in request.d->autoDecompress so that
// we can later decompress the HTTP reply if it has such an
// encoding.
value = request.headerField("accept-encoding");
if (value.isEmpty()) {
#ifndef QT_NO_COMPRESS
request.setHeaderField("Accept-Encoding", "gzip, deflate");
request.d->autoDecompress = true;
#else
// if zlib is not available set this to false always
request.d->autoDecompress = false;
#endif
You should use a packet sniffer / network analyzer and check for yourself.
QNetworkAccessManager does support receiving compressed HTTP replies, so in theory it should work if the HTTP server is correctly set up.
Read this elsewhere so w/o having tested it: just don't set the accept-encoded header yourself, then QNam should handle it transparently (return decompressed payload).
Considering the following sentence, I would say no, but they can be :
The downloadProgress() signal is also
emitted when data is received, but the
number of bytes contained in it may
not represent the actual bytes
received, if any transformation is
done to the contents (for example,
decompressing and removing the
protocol overhead).
You can find it here : https://doc.qt.io/qt/qnetworkreply.html
I didn't test it tho !
To compress, if I remember well, you can send QByteArray... And on this kind of objects, you can use "compress"...
You could also have a look at some Qt examples, like :
https://doc.qt.io/qt/qtnetwork-broadcastsender-example.html
I didn't look at all of them but maybe you'll find some interesting things !
Hope it helps a bit !
Related
I am building a web server from an ESP8266 that will send environmental data to any web client as a web page. I'm using the Arduino IDE.
The problem is that the data can get rather large at times, and all of the examples I can find show assembling a web page in memory and sending it all at once to the client via ESP8266WebServer.send(). This is ok for small web pages, but won't work with the amount of data I need to send.
What I want to do is send the first part of the web page, then send the data out directly as I gather it, then send the closing parts of the web page. Is this even possible? I've looked unsuccessfully for documentation and there doesn't seem to be any examples anywhere.
For future reference, I think I figured out how to do it, with help from this page: https://gist.github.com/spacehuhn/6c89594ad0edbdb0aad60541b72b2388
The gist of it is that you still use ESP8266WebServer.send(), but you first send an empty string with the Content-Length header set to the size of your data, like this:
server.sendHeader("Content-Length", (String)fileSize);
server.send(200, "text/html", "");
Then you send buffers of data using ESP8266WebServer.sendContent() repeatedly until all of the data is sent.
Hope this helps someone else.
I was having a big issue and a headache in serving big strings concatenating together with other strings variables to the ESP32 Ardunio webserver with
server.send(200, "text/html", BIG_WEBPAGE);
and often resulted in a blank page as I reported in my initial error.
What was happening was this error
E (369637) uart: uart_write_bytes(1159): buffer null
I don't reccommend to use the above server.send() function
After quite a lot of reaserch I found this piece of code that simply works like a charm. I just chunked my webpage in 5 pieces like you see below.
server.sendHeader("Cache-Control", "no-cache, no-store, must-revalidate");
server.sendHeader("Pragma", "no-cache");
server.sendHeader("Expires", "-1");
server.setContentLength(CONTENT_LENGTH_UNKNOWN);
// here begin chunked transfer
server.send(200, "text/html", "");
server.sendContent(WEBPAGE_BIG_0);
server.sendContent(WEBPAGE_BIG_1);
server.sendContent(WEBPAGE_BIG_2);
server.sendContent(WEBPAGE_BIG_3);
server.sendContent(WEBPAGE_BIG_4);
server.sendContent(WEBPAGE_BIG_5);
server.client().stop();
I really own much to this post. Hope the answer hepls someone else.
After some more experiments I realized it is faster and more efficient the code if you do not feed the string variable into the server.sendContent function. Instead you just paste there the actual string value.
server.sendContent("<html><head>my great page</head><body>");
server.sendContent("my long body</body></html>");
It is very important the when you chunk the webpage you don't chunk html tags and you don't chunk an expression of a javascript code (like cutting in half a while or an if), while chunking scripts just chunk after the semicolon or better between two function declarations.
Chunked transfer encoding is probably what you want, and it's helpful in the situation where the web page you are sending is being dynamically created on-the-fly and is also too large to fit into memory. In this situation, you have two problems. One, you can't send it all at once, and two, you don't know ahead of time how big the result is going to be. Both problems can be fixed like this:
String webPageChunk = "some html";
server.setContentLength(CONTENT_LENGTH_UNKNOWN);
server.send ( 200, "text/html", webPageChunk);
while (<page is being generated>) {
webPageChunk = "some more html";
server.sendContent(webPageChunk);
}
server.sendContent("");
Sending a blank line will tell the client to terminate the session. Be careful not to send one in your loop before you're done generating the whole page.
I'm trying to use a "ServerSocket" to receive HTTP messages from a client which is POSTing them as chunked. I just want to capture the text content being posted (plain text). Any input on how to do this better would be welcomed.
Here is my existing and very not-elegant solution. When dealing with ServerSocket, one has to handle the headers and chunk barriers manually, and this is what I came up with. I looked at filterLine() method on the reader, maybe that's part of a solution, i'm not sure. Don't know how to elegantly/reliably identify the chunked barriers.
socket.withStreams { input, output ->
BufferedReader reader = new BufferedReader(new InputStreamReader(input))
while (currentLineCount < processor.newLineCount) {
line = reader.readLine()
if (line && line.size() > 3) {
processor.processFormats(line)
}
currentLineCount++
}
}
Caveats:
I have been trying to process line by line to minimize memory impact, rather than buffering the whole collection. I'd like to keep it that way.
These 4 Jetty libraries are available on the classpath, so I could leverage them, but can't add other libraries.
compile 'org.eclipse.jetty:jetty-server:8.1.2.v20120308'
compile 'org.eclipse.jetty:jetty-continuation:8.1.2.v20120308'
compile 'org.eclipse.jetty:jetty-io:8.1.2.v20120308'
compile 'org.eclipse.jetty:jetty-util:8.1.2.v20120308'
I would re-make the whole HTTP server and Handler in Jetty, but I've never used Jetty and can't find any good examples that fit my situation. Any suggestions with code samples for that would be great!
I'm developing a web crawler in nodejs. I've created a unique list of the urls in the website crawle body. But some of them have extensions like jpg,mp3, mpeg ... I want to avoid crawling those who have extensions. Is there any simple way to do that?
Two options stick out.
1) Use path to check every URL
As stated in comments, you can use path.extname to check for a file extension. Thus, this:
var test = "http://example.com/images/banner.jpg"
path.extname(test); // '.jpg'
This would work, but this feels like you'll wind up having to create a list of file types you can crawl or you must avoid. That's work.
Side note -- be careful using path. Typically, url is your best tool for parsing links because path is aimed at files/directories, not urls. On some systems (Windows), using path to manipulate a url can result in drama because of the slashes involved. Fair warning!
2) Get the HEAD for each link & see if content-type is set to text/html
You may have reasons to avoid making more network calls. If so, this isn't an option. But if it is OK to make additional calls, you could grab the HEAD for each link and check the MIME type stored in content-type.
Something like this:
var headersOptions = {
method: "HEAD",
host: "http://example.com",
path: "/articles/content.html"
};
var req = http.request(headersOptions, function (res) {
// you will probably need to also do things like check
// HTTP status codes so you handle 404s, 301s, and so on
if (res.headers['content-type'].indexOf("text/html") > -1) {
// do something like queue the link up to be crawled
// or parse the link or put it in a database or whatever
}
});
req.end();
One benefit is that you only grab the HEAD, so even if the file is a gigantic video or something, it won't clog things up. You get the HEAD, see the content-type is a video or whatever, then move along because you aren't interested in that type.
Second, you don't have to keep track of file names because you're using a standard MIME type to differentiate html from other data formats.
So this is the setup I'm working with:
I am on an express server which must stream an archived binary payload to a browser (does not matter if it is zip, tar or tar.gz - although zip would be nice).
On this server, I have a websocket open that connects to another server which is sending me binary payloads of individual files in a directory. I get these payloads streamed, piece-by-piece, as buffers, and I'm doing this serially (that is - file-by-file - there aren't multiple websockets open at one time, and there is one websocket per file). This is the websocket library I'm using: https://github.com/einaros/ws
I would like to go through each file, open a websocket, and then append the buffers to an archiver as they come through the websockets. When data is appended to the archiver, it would be nice if I could stream the ouput of the archiver to the browser (via the response object with response.write). So, basically, as I'm getting the payload from the websocket, I would like that payload streamed through an archiver and then to the response. :-)
Some things I have looked into:
node-zipstream - This is nice because it gives me an output stream I can pipe directly to response.write. However, it doesn't appear to support nested files/folders, and, more importantly, it only accepts an input stream. I have looked at the source code (which is quite terse and readable), and it seems as though, if I were able to have access to the update function within ZipStream.prototype.addFile, I could just call that each time on the message event when I get a binary buffer from the websocket. This is quite messy/hacky though, and, given that this library already doesn't seem to support nested files/folders, I'm not sure I will be going with it.
node-archiver - This suffers from the same issue as node-zipstream (probably because it was inspired by it) where it allows me to pipe the output, but I cannot append multiple buffers for the same file within the archive (and then manually signal when the last buffer has been appended for a given file). However, it does allow me to have nested folders, which is a clear win over node-zipstream.
Is there something I'm not aware of, or is this just a really crazy thing that I want to do?
The only alternative I see at this point is to wait for the entire payload to be streamed through a websocket and then append with node-archiver, but I really would like to reap the benefit of true streaming/archiving on-the-fly.
I've also thought about the possibility of creating a read stream of sorts just to serve as a proxy object that I can pass into node-archiver and then just append the buffers I get from the websocket to this read stream. Looking at various read streams, I'm not sure how to do this though. The only way I could think of was creating a writestream, piping buffers to it, and having a readstream read from that writestream. Am I on the correct thought process here?
As always, thanks for any help/direction you can offer SO community.
EDIT:
Since I just opened this question, and I'm new to node, there may be a better answer than the one I provided. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.
I figured out a way to get this working with node-archiver. :-)
It was based off my hunch of creating a temporary "proxy stream" of sorts, inspired by this SO question: How to create streams from string in Node.Js?
The basic gist is (coffeescript syntax):
archive = archiver 'zip'
archive.pipe response // where response is the http response
// and then for each file...
fileName = ... // known file name
fileSize = ... // known file size
ws = .... // create websocket
proxyStream = new Stream()
numBytesStreamed = 0
archive.append proxyStream, name: fileName
ws.on 'message', (dataBuffer) ->
numBytesStreamed += dataBuffer.length
proxyStream.emit 'data', dataBuffer
if numBytesStreamed is fileSize
proxyStream.emit 'end'
// function/indicator to do this for the next file in the folder
// and then when you're completely done...
archive.finalize (err, bytesOfArchive) ->
if err?
// do whatever
else
// unless you somehow knew this ahead of time
res.addTrailers
'Content-Length': bytesOfArchive
res.end()
Note that this is not the complete solution I implemented. There is still a lot of logic dealing with getting the files, their paths, etc. Not to mention error-handling.
EDIT:
Since I just opened this question, and I'm new to node, there may be a better answer. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.
I'm very new to LINUX working with node.js. Its just my 2nd day. I use node-curl for curl request. In the link below I have found example with Get request. Can anybody provide me a Post request example using node-curl.
https://github.com/jiangmiao/node-curl/blob/master/examples/low-level.js
You need to use setopt in order to specify POST options for a cURL request. The options you should start looking at first are CURLOPT_POST and CURLOPT_POSTFIELDS. From the libcurl documentation linked from node-curl:
CURLOPT_POST
A parameter set to 1 tells the library to do a regular HTTP post. This will also make the library use a "Content-Type: application/x-www-form-urlencoded" header. (This is by far the most commonly used POST method).
Use one of CURLOPT_POSTFIELDS or CURLOPT_COPYPOSTFIELDS options to specify what data to post and CURLOPT_POSTFIELDSIZE or CURLOPT_POSTFIELDSIZE_LARGE to set the data size.
Optionally, you can provide data to POST using the CURLOPT_READFUNCTION and CURLOPT_READDATA options but then you must make sure to not set CURLOPT_POSTFIELDS to anything but NULL. When providing data with a callback, you must transmit it using chunked transfer-encoding or you must set the size of the data with the CURLOPT_POSTFIELDSIZE or CURLOPT_POSTFIELDSIZE_LARGE option. To enable chunked encoding, you simply pass in the appropriate Transfer-Encoding header, see the post-callback.c example.
CURLOPT_POSTFIELDS
... [this] should be the full data to post in a HTTP POST operation. You must make sure that the data is formatted the way you want the server to receive it. libcurl will not convert or encode it for you. Most web servers will assume this data to be url-encoded.
This POST is a normal application/x-www-form-urlencoded kind (and libcurl will set that Content-Type by default when this option is used), which is the most commonly used one by HTML forms. See also the CURLOPT_POST. Using CURLOPT_POSTFIELDS implies CURLOPT_POST.
If you want to do a zero-byte POST, you need to set CURLOPT_POSTFIELDSIZE explicitly to zero, as simply setting CURLOPT_POSTFIELDS to NULL or "" just effectively disables the sending of the specified string. libcurl will instead assume that you'll send the POST data using the read callback!
With that information, you should be able add the following options to the low-level example to have it make a POST request:
var fieldsStr = '{}';
curl.setopt('CURLOPT_POST', 1); // true?
curl.setopt('CURLOPT_POSTFIELDS', fieldsStr);
You will need to tweak the contents of fieldsStr to match the format the server is expecting. Per the documentation you may also need to url-encode the data - which should be as simple as using encodeURIComponent according to this post.