nodejs vs. ruby / understanding requests processing order

nodejs vs. ruby / understanding requests processing order - node.js

I have a simple utility that i use to size image on the fly via url params.
Having some troubles with the ruby image libraries (cmyk to rvb is, how to say… "unavailable"), i gave it a shot via nodejs, which solved the issue.
Basically, if the image does not exists, node or ruby transforms it. Otherwise when the image has already been requested/transformed, the ruby or node processes aren't touched, the image is returned statically
The ruby works perfectly, a bit slow if lot of transforms are requested at once, but very stable, it always go through whatever the amount (i see the images arriving one the page one after another)
With node, it works also perfectly, but when a large amount of images are requested, for a single page load, the first images is transformed, then all the others requests returns the very same image (the last transformed one). If I refresh the page, the first images (already transformed) is returned right away, the second one is returned correctly transformed, but then all the other images returned are the same as the one just newly transformed. and it goes on the same for every refresh. not optimal , basically the resquests are "merged" at some point and all return the same image. for reason i don't understand
(When using 'large amount', i mean more than 1)
The ruby version :
get "/:commands/*" do |commands,remote_path|
path = "./public/#{commands}/#{remote_path}"
root_domain = request.host.split(/\./).last(2).join(".")
url = "https://storage.googleapis.com/thebucket/store/#{remote_path}"
img = Dragonfly.app.fetch_url(url)
resized_img = img.thumb(commands).to_response(env)
return resized_img
end
The node js version :
app.get('/:transform/:id', function(req,res,next){
parser.parse(req.params,function(resized_img){
// the transform are done via lovell/sharp
// parser.parse parse the params, write the file,
// return the file path
// then :
fs.readFileSync(resized_img, function(error,data){
res.write(data)
res.end()
})
})
})
Feels like I'm missing here a crucial point in node. I expected the same behaviour with node and ruby, but obviously the same pattern transposed in the node area just does not work as expected. Node is not waiting for a request to process, rather processes those somehow in an order that is not clear to me
I also understand that i'm not putting the right words to describe the issue, hoping that it might speak to some experienced users, let them provide clarifiactions to get a better understanding of what happens behind the node scenes

Related

MarkLogic 8 - XQuery write large result set to a file efficiently

UPDATE: See MarkLogic 8 - Stream large result set to a file - JavaScript - Node.js Client API for someone's answer on how to do this in Javascript. This question is specifically asking about XQuery.
I have a web application that consumes rest services hosted in node.js.
Node simply proxies the request to XQuery which then queries MarkLogic.
These queries already have paging setup and work fine in the normal case to return a page of data to the UI.
I need to have an export feature such that when I put a URL parameter of export=all on a request, it doesn't lookup a page anymore.
At that point it should get the whole result set, even if it's a million records, and save it to a file.
The actual request needs to return immediately saying, "We will notify you when your download is ready."
One suggestion was to use xdmp:spawn to call the XQuery in the background which would save the results to a file. My actual HTTP request could then return immediately.
For the spawn piece, I think the idea is that I run my query with different options in order to get all results instead of one page. Then I would loop through the data and create a string variable to call xdmp:save with.
Some questions, is this a good idea? Is there a better way? If I loop through the result set and it does happen to be very large (gigabytes) it could cause memory issues.
Is there no way to directly stream the results to a file in XQuery?
Note: Another idea I had was to intercept the request at the proxy (node) layer and then do an xdmp:estimate to get the record count and then loop through querying each page and flushing it to disk. In this case I would need to find some way to return my request immediately yet process in the background in node which seems to have some ideas here: http://www.pubnub.com/blog/node-background-jobs-async-processing-for-async-language/

One possible strategy would be to use a self-spawning task that, on each iteration, gets the next page of the results for a query.
Instead of saving the results directly to a file, however, you might want to consider using xdmp:http-post() to send each page to a server:
http://docs.marklogic.com/xdmp:http-post?q=xdmp:http-post&v=8.0&api=true
In particular, the server could be a Node.js server that appends each page as it arrives to a file or any other datasink.
That way, Node.js could handle the long-running asynchronous IO with minimal load on the database server.
When a self-spawned task hits the end of the query, it can again use an HTTP request to notify Node.js to close the file and report that the export is finished.
Hping that helps,

WifiLock under-locked my_lock

I'm tring to download an offline map pack. Trying to reverse engineer the example project from the skobbler support website, however when trying to start a download the download manager crashes.
What my usecase is: show a list of available countries (within the EUR continent) and make the user select a single one, and that will be downloaded at that time. So far I have gotten a list where those options are available. Upon selecting an item (and starting the download) it crashes.
For the sake of the question I commented out some things.
Relevant code:
// Get the information about where to obtain the files from
SKPackageURLInfo urlInfo = SKPackageManager.getInstance().getURLInfoForPackageWithCode(pack.packageCode);
// Steps: SKM, ZIP, TXG
List<SKToolsFileDownloadStep> downloadSteps = new ArrayList<>();
downloadSteps.add(new SKToolsFileDownloadStep(urlInfo.getMapURL(), pack.file, pack.skmsize)); // SKM
//downloadSteps.add(); // TODO ZIP
//downloadSteps.add()); // TODO TXG
List<SKToolsDownloadItem> downloadItems = new ArrayList<>(1);
downloadItems.add(new SKToolsDownloadItem(pack.packageCode, downloadSteps, SKToolsDownloadItem.QUEUED, true, true));
mDownloadManager.startDownload(downloadItems); // This is where the crash is
I am noticing a running download, since the onDownloadProgress() is getting triggered (callback from the manager). However the SKToolsDownloadItem that it takes as a parameter says that the stepIndex starts at 0. I don't know how this can be, since I manually put that at (byte) 0, just like the example does.
Also, the logs throw a warning on SingleClientConnManager, telling me:
Invalid use of SingleClientConnManager: connection still allocated.
This is code that gets called from within the manager somewhere. I am thinking there is some vital setup steps missing from the documentation and the example project.

Avoiding repetition with Flask - but is it too DRY?

Let us assume I serve data to colleagues in-office with a small Flask app, and let us also assume that it is a project I am not explicitly 'paid to do' so I don't have all the time in the world to write code.
It has occurred to me in my experimentation with pet projects at home that instead of decorating every last route with #app.route('/some/local/page') that I can do the following:
from flask import Flask, render_template, url_for, redirect, abort
from collections import OrderedDict
goodURLS = OrderedDict([('/index','Home'), ##can be passed to the template
('/about', 'About'), ##to create the navigation bar
('/foo', 'Foo'),
('/bar', 'Bar'), ##hence the use of OrderedDict
('/eggs', 'Eggs'), ##to have a set order for that navibar
('/spam', 'Spam')])
app = Flask(__name__)
#app.route('/<destination>')
def goThere(destination):
availableRoutes = goodURLS.keys():
if "/" + destination in availableRoutes:
return render_template('/%s.html' % destination, goodURLS=goodURLS)
else:
abort(404)
#app.errorhandler(404)
def notFound(e):
return render_template('/notFound.html'), 404
Now all I need to do is update my one list, and both my navigation bar and route handling function are lock-step.
Alternatively, I've written a method to determine the viable file locations by using os.walk in conjunction with file.endswith('.aGivenFileExtension') to locate every file which I mean to make accessible. The user's request can then be compared against the list this function returns (which obviously changes the serveTheUser() function.
from os import path, walk
def fileFinder(directory, extension=".html"):
"""Returns a list of files with a given file extension at a given path.
By default .html files are returned.
"""
foundFilesList = []
if path.exists(directory):
for p, d, files in walk(directory):
for file in files:
if file.endswith(extension):
foundFilesList.append(file)
return foundFilesList
goodRoutes = fileFinder('./templates/someFolderWithGoodRoutes/')
The question is, Is This Bad?
There are many aspects of Flask I'm just not using (mainly because I haven't needed to know about them yet) - so maybe this is actually limiting, or redundant when compared against a built-in feature of Flask. Does my lack of explicitly decorating each route rob me of a great feature of Flask?
Additionally, is either of these methods more or less safe than the other? I really don't know much about web security - and like I said, right now this is all in-office stuff, the security of my data is assured by our IT professional and there are no incoming requests from outside the office - but in a real-world setting, would either of these be detrimental? In particular, if I am using the backend to os.walk a location on the server's local disk, I'm not asking to have it abused by some ne'er-do-well am I?
EDIT: I've offered this as a bounty, because if it is not a safe or constructive practice I'd like to avoid using it for things that I'd want to like push to Heroku or just in general publicly serve for family, etc. It just seems like decorating every viable route with app.route is a waste of time.

There isn't anything really wrong with your solution, in my opinion. The problem is that with this kind of setup the things you can do are pretty limited.
I'm not sure if you simplified your code to show here, but if all you are doing in your view function is to gather some data and then select one of a few templates to render it then you might as well render the whole thing in a single page and maybe use a Javascript tab control to divide it up in sections on the client.
If each template requires different data, then the logic that obtains and processes the data for each template will have to be in your view function, and that is going to look pretty messy because you'll have a long chain of if statements to handle each template. Between that and separate view functions per template I think the latter will be quicker, even more so if you also consider the maintenance effort.
Update: based on the conversion in the comments I stand by my answer, with some minor reservations.
I think your solution works and has no major problems. I don't see a security risk because you are validating the input that comes from the client before you use it.
You are just using Flask to serve files that can be considered static if you ignore the navigation bar at the top. You should consider compiling the Flask app into a set of static files using an extension like Frozen-Flask, then you just host the compiled files with a regular web server. And when you need to add/remove routes you can modify the Flask app and compile it again.
Another thought is that your Flask app structure will not scale well if you need to add server-side logic. Right now you don't have any logic in the server, everything is handled by jQuery in the browser, so having a single view function works just fine. If at some point you need to add server logic for these pages then you will find that this structure isn't convenient.
I hope this helps.

I assume based on your code that all the routes have a corresponding template file of the same name (destination to destination.html) and that the goodURL menu bar is changed manually. An easier method would be to try to render the template at request and return your 404 page if it doesn't exist.
from jinja2 import TemplateNotFound
from werkzeug import secure_filename
....
#app.route('/<destination>')
def goThere(destination):
destTemplate = secure_filename("%s.html" % destination)
try:
return render_template(destTemplate, goodURLS=goodURLS)
except TemplateNotFound:
abort(404)
#app.errorhandler(404)
def notFound(e):
return render_template('/notFound.html'), 404
This is adapted from the answer to Stackoverflow: How do I create a 404 page?.
Edit: Updated to make use of Werkzeug's secure_filename to clean user input.

Scraping URLs from a node.js data stream on the fly

I am working with a node.js project (using Wikistream as a basis, so not totally my own code) which streams real-time wikipedia edits. The code breaks each edit down into its component parts and stores it as an object (See the gist at https://gist.github.com/2770152). One of the parts is a URL. I am wondering if it is possible, when parsing each edit, to scrape the URL for each edit that shows the differences between the pre-edited and post edited wikipedia page, grab the difference (inside a span class called 'diffchange diffchange-inline', for example) and add that as another property of the object. Right not it could just be a string, does not have to be fully structured.
I've tried using nodeio and have some code like this (i am specifically trying to only scrape edits that have been marked in the comments (m[6]) as possible vandalism):
if (m[6].match(/vandal/) && namespace === "article"){
nodeio.scrape(function(){
this.getHtml(m[3], function(err, $){
//console.log('getting HTML, boss.');
console.log(err);
var output = [];
$('span.diffchange.diffchange-inline').each(function(scraped){
output.push(scraped.text);
});
vandalContent = output.toString();
});
});
} else {
vandalContent = "no content";
}
When it hits the conditional statement it scrapes one time and then the program closes out. It does not store the desired content as a property of the object. If the condition is not met, it does store a vandalContent property set to "no content".
What I am wondering is: Is it even possible to scrape like this on the fly? is the scraping bogging the program down? Are there other suggested ways to get a similar result?

I haven't used nodeio yet, but the signature looks to be an async callback, so from the program flow perspective, that happens in the background and therefore does not block the next statement from occurring (next statement being whatever is outside your if block).
It looks like you're trying to do it sequentially, which means you need to either rethink what you want your callback to do or else force it to be sequential by putting the whole thing in a while loop that exits only when you have vandalcontent (which I wouldn't recommend).
For a test, try doing a console.log on your vandalContent in the callback and see what it spits out.

Node JS Express Boilerplate and rendering

I am trying out node and it's Express framework via the Express boilerplate installation. It took me a while to figure out I need Redis installed (btw, if you're making a boilerplate either include all required software with it or warn about the requirement for certain software - Redis was never mentioned as required) and to get my way around the server.js file.
Right now I'm still a stranger to how I could build a site in this..
There is one problem that bugs me specifically - when I run the server.js file, it says it's all good. When I try to access it in the browser, it says 'transferring data from localhost' and never ends - it's like render doesn't finish sending and never sends the headers. No errors, no logs, no nothing - res.render('index') just hangs. The file exists, and the script finds it, but nothing ever happens. I don't have a callback in the render defined, so headers should get sent as usual.
If on the other hand I replace the render command with a simple write('Hello world'); and then do a res.end();, it works like a charm.
What am I doing wrong with rendering? I haven't changed a thing from the original installation btw. The file in question is index.ejs, it's in views/, and I even called app.register('.ejs', require('ejs')); just in case before the render itself. EJS is installed.
Also worth noting - if I do a res.render('index'); and then res.write('Hello'); immediately afterwards, followed by res.end();, I do get "Hello" on the screen, but the render never happens - it just hangs and says "Transferring data from localhost". So the application doesn't really die or hang, it just never finishes the render.
Edit: Interesting turn of events: if I define a callback in the render, the response does end. There is no more "Transferring data...", but the view is never rendered, neither is the layout. The source is completely empty upon inspection. There are no errors whatsoever, and no exceptions.

Problem fixed. It turns our render() has to be the absolute last command in a routing chain. Putting res.write('Hello'); and res.end(); after it was exactly what broke it.
I deleted everything and wrote simply res.render('index') and it worked like a charm. Learn from my fail, newbies - no outputting anything after rendering!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

nodejs vs. ruby / understanding requests processing order - node.js

Related

MarkLogic 8 - XQuery write large result set to a file efficiently

WifiLock under-locked my_lock

Avoiding repetition with Flask - but is it too DRY?

Scraping URLs from a node.js data stream on the fly

Node JS Express Boilerplate and rendering

Categories

Resources