I am using kue.js, which is a redis-backed priority queue for node, for pretty straightforward job-queue stuff (sending mails, tasks for database workers).
As part of the same application (albeit in a different service), I now want to use redis to manually store some mappings for a url-shortener. Does concurrent manual use of the same redis instance and database as kue.js interfere with kue, i.e., does kue require exclusive access to its redis instance?
Or can I use the same redis instance manually as long as I, e.g., avoid certain key prefixes?
I do understand that I could use multiple databases on the same instances but found a lot of chatter from various sources that discourage the use of the database feature as well as talk of it being deprecated in the future, which is why I would like to use the same database for now if safely possibly.
Any insight on this as well as considerations or advice why this might or might not be a bad idea are very welcome, thanks in advance!
I hope I am not too late with this answer, I just came across this post ...
It should be perfectly safe. See the README, especially the section on redis connections.
You will notice that each queue can have its own prefix (default is q), so as long as you are aware of how prefixes are used in your system, you should be fine. I am not sure why it would be a bad idea as long as you know about the prefixes and load usage by various apps hitting the redis server. Can you reference a post/page where this was described as a bad idea ?
Related
We can run more than one node app for a code base, all we need to start them on a diff port every time, but i am not sure if doing so is good or not.
I can see the following pros & cons of this approach
Pros:
multiple domains like sub1.domain.com, sub2.domain.com and so on, sharing same code base.
updates code at single place.
Any other pros you like to mention?
Cons:
May be it can cause some dead lock on reading some files or some other multi process issue.
Any other cons you like to mention?
Is it a good move to share code base?
Please share your experience.
Thank You
You are essentially spawning several instances of you application which is not a bad or a good thing in itself, it has to do with what you application does. If the application does not access any ressources which will be shared with instances of itself, it is not a problem and you can spawn as many instances as you like, for what ever purpose you see fit.
BUT if your application uses any shared ressources such as a database or flat files, you need to take race conditions and dead locks into account. This is very well handled on ACID compliant databases, on document oriented databases this is not as mature and requires you do have a good grasp on the techniques and languages used.
If there is no obvious reason to run multiple instances of your application, do not do it.
Once you start going down the route of multiple instances, you have to design around bottlenecks, network traffic, backups and a lot of other things that give people headaches, do not do it just because you can.
I am looking for a way to share the same data structure (which contains functions, so JSON is not an option) across all cluster instances within NodeJS. I have a data structure called 'Users' that tracks user sessions and contains functions that they have access to. I need to be able to share this datastructure across all node processes, or I need an alternative design pattern. Does anyone know of any solutions to this issue? Thanks
I realize this is old and answered, but it may be beneficial to others to note an alternative. The recommended way to handle a situation as this is to place this data structure with its functions in a separate file and require it when needed. This will essentially pull in the "code/functions" and you store (serialize/deserialize) the data itself in any data store.
There are multiple options for setting up proper IPC (inter process communication) on nodejs:
using a document/key-value storage solution like Redis (key-value) or MongoDB (NoSQL Document-Storage)
using the integrated IPC functionality of the cluster module (see send method)
Deciding which one of those solutions fits best depends on your requirements and your project setup. For our last project, i decided to use both methods:
IPC for triggering jobs and dispatching partial tasks to different nodejs instances
Redis for centralized session- and api-token management
If you are using Express, i highly recommend you use the Redis middleware connect-redis. This session middleware automatically handles centralized session management for express based applications (which also means you can store complex JS objects and have access to them from all your instances).
The Setup:
Imagine a 'twitter like' service where a user submits a post, which is then read by many (hundreds, thousands, or more) users.
My question is regarding the best way to architect the cache & database to optimize for quick access & many reads, but still keep the historical data so that users may (if they want) see older posts. The assumption here is that 90% of users would only be interested in the new stuff, and that the old stuff will get accessed occasionally. The other assumption here is that we want to optimize for the 90%, and its ok if the older 10% take a little longer to retrieve.
With this in mind, my research seems to strongly point in the direction of using a cache for the 90%, and then to also store the posts in another longer-term persistent system. So my idea thus far is to use Redis for the cache. The advantages is that Redis is very fast, and also it has built in pub/sub which would be perfect for publishing posts to many people. And then I was considering using MongoDB as a more permanent data store to store the same posts which will be accessed as they expire off of Redis.
Questions:
1. Does this architecture hold water? Is there a better way to do this?
2. Regarding the mechanism for storing posts in both the Redis & MongoDB, I was thinking about having the app do 2 writes: 1st - write to Redis, it then is immediately available for the subscribers. 2nd - after successfully storing to Redis, write to MongoDB immediately. Is this the best way to do it? Should I instead have Redis push the expired posts to MongoDB itself? I thought about this, but I couldn't find much information on pushing to MongoDB from Redis directly.
It is actually sensible to associate Redis and MongoDB: they are good team players. You will find more information here:
MongoDB with redis
One critical point is the resiliency level you need. Both Redis and MongoDB can be configured to achieve an acceptable level of resiliency, and these considerations should be discussed at design time. Also, it may put constraint on the deployment options: if you want master/slave replication for both Redis and MongoDB you need at least 4 boxes (Redis and MongoDB should not be deployed on the same machine).
Now, it may be a bit simpler to keep Redis for queuing, pub/sub, etc ... and store the user data in MongoDB only. Rationale is you do not have to design similar data access paths (the difficult part of this job) for two stores featuring different paradigms. Also, MongoDB has built-in horizontal scalability (replica sets, auto-sharding, etc ...) while Redis has only do-it-yourself scalability.
Regarding the second question, writing to both stores would be the easiest way to do it. There is no built-in feature to replicate Redis activity to MongoDB. Designing a daemon listening to a Redis queue (where activity would be posted) and writing to MongoDB is not that hard though.
I'm writing a piece to a project that's responsible for processing tasks outside of the main application facing data server, which is written in javascript using Node.js. It needs to handle tasks which are scheduled in the future and potentially handle tasks that are "right now". The "right now" just means the next time a worker becomes available it will operate on that task, so that bit might not matter. The workers are going to all talk to external resources, an example job would be to send an email. We are a small shop and we don't have a ton of resources so one thing I don't want to do is start mixing languages at this point in the process, and I already see that Node can do this for us pretty easily, so that's what we're going to go with unless I see a compelling reason not to before I start coding, which is soon.
All that said, I can't tell if there is a compelling reason to use an AMQP based server, like OpenAMQ or RabbitMQ over something like Kue or Beanstalkd with a node client. So, here we go:
Is there a compelling reason to use an AMQP based server over something like beanstalkd or redis with Kue? If yes, which AMPQ based server would fit best with the architecture that I laid out? If no, which nosql solution (beanstalkd, redis/Kue) would be easiest to set up and fastest to deploy?
FWIW, I'm not accepting my answer yet, I'm going to explain what I've decided and why. If I don't get any answers that appear to be better than what I've decided, I'll accept my own later.
I decided on Kue. It supports multiple workers running asynchronously, and with cluster it can take advantage of multicore systems. It is easily extended to provide security. It's backed with Redis, which is used all over for this exact thing, so I know I'm not backing my job process server with unproven software (that's not to say that any of the others are unproven.)
The most compelling reasons that I picked Kue is that it provides a JSON api so that the client applications (The first client is going to be a web based application, but we're planning on making smartphone apps also) can add jobs easily without going through the main application facing node instance, so I can be totally out of the way of the rest of my team as I write this. I don't need a route, I don't need anything, and it's all provided for me so I don't need to write anything to support this. This has another advantage, with an extention to provide l/p security only authorized clients can add jobs, so I don;t have to expose my redis server to client applications directly. It also has a built in web console and the API allows the client to pull back lists of jobs associated with a given user very easily, so we can show the user all of their scheduled tasks in a nifty calendar view with 0 effort on my part.
The other compelling reason is the lack of steep learning curve associated with getting redis and Kue going for me. I've set up redis before, and Kue is simple and effective.
Yes, I'm a lazy developer, but I'm the good kind of lazy developer.
UPDATE:
I have it working and doing jobs, the throughput is amazing. I split out the task marshaling logic into it's own node instance, basically all I have to do is deploy my repo to a new machine and run node task-server.js to scale out my workers. I may need to add in some more job searching calls to Kue, because of how I implimented a few things, but that will be easy.
I'm writing a server, and decided to split up the work between different processes running node.js, because I heard node.js was single threaded and figured this would parallize better. The application is going to be a game. I have one process serving html pages, and then other processes dealing with the communication between clients playing the game. The clients will be placed into "rooms" and then use sockets to talk to each other relayed through the server. The problem I have is that the html server needs to be aware of how full the different rooms are to place people correctly. The socket servers need to update this information so that an accurate representation of the various rooms is maintained. So, as far as I see it, the html server and the room servers need to share some objects in memory. I am planning to run it on one (multicore) machine. Does anyone know of an easy way to do this? Any help would be greatly appreciated
Node currently doesn't support shared memory directly, and that's a reflection of JavaScript's complete lack of semantics or support for threading/shared memory handling.
With node 0.7, only recently usable even experimentally, the ability to run multiple event loops and JS contexts in a single process has become a reality (utilizing V8's concept of isolates and large changes to libuv to allow multiple event loops per process). In this case it's possible, but still not directly supported or easy, to have some kind of shared memory. In order to do that you'd need to use a Buffer or ArrayBuffer (both which represent a chunk of memory outside of JavaScript's heap but accessible from it in a limited manner) and then some way to share a pointer to the underlying V8 representation of the foreign object. I know it can be done from a minimal native node module but I'm not sure if it's possible from JS alone yet.
Regardless, the scenario you described is best fulfilled by simply using child_process.fork and sending the (seemingly minimal) amount of data through the communication channel provided (uses serialization).
http://nodejs.org/docs/latest/api/child_processes.html
Edit: it'd be possible from JS alone assuming you used node-ffi to bridge the gap.
You may want to try using a database like Redis for this. You can have a process subscribed to a channel listening new connections and publishing from the web server every time you need.
You can also have multiple processes waiting for users and use a list and BRPOP to subscribe to wait for players.
Sounds like you want to not do that.
Serving and message-passing are both IO-bound, which Node is very good at doing with a single thread. If you need long-running calculations about those messages, those might be good for doing separately, but even so, you might be surprised at how well you do with a single thread.
If not, look into Workers.
zeromq is also becomming quite popular as a process comm method. Might be worth a look. http://www.zeromq.org/ and https://github.com/JustinTulloss/zeromq.node