Unfortunately, I'm not really familiar with Linux and Docker. But maybe someone from you can help me.
I need a lightweight Docker image that can do multiple byte stream conversions (e.g. encoding, encryption, compression, ...) in parallel. The process should be called several times outside of the image (e.g. from another image). If I understand it correctly, I need some kind of server that receives and processes the requests.
How do you implement this and how would the program then be called?
Many Thanks
Related
I'm developing a lightweight framework to work as a coordinator in a Robotics competition I compete.
My idea, is to have agnostic programs about the whole, just with inputs that might triggers outputs. I then, connect those outputs to inputs, and can have different behaviours with the same modules, without hard work.
I'm planning on doing this with Node.js and WebKit, to allow a nice UI for modifying the process. However, each "module" might not really be a code wrapped in some javascript class-like function, it might be a real Thread, running maybe some C++ native code (without Node.js), or even a Python program.
What I'm facing now, is a fast way, and also generic, to exchange data among processes. I have read about it, but haven't got to any conclusions...
Here are the 3 methods I found out:
Local Socket: Uses the localhost to dispatch a broadcast to a port
Unix Socket: Maybe more efficient than the above (but using filesystem?)
Stdin/Out communication: When a process is launched by Node.js, binding the stdin and stdout can be used to communicate between the program.
So, I have those 3 ways of doing it, what should I use mostly? I need things to communicate REALLY fast (data might go through 5 different processes, and I need that not to exceed 2ms)
I am running a webservice to convert ODT documents to PDF using OpenOffice on an Ubuntu server.
Sadly, OpenOffice chokes occasionally when more then 1 request is made simultaneously (converting a PDF takes around 500-1000ms). This is a real threat since my webservice is multithreaded and jobs are mostly issued in batches.
What I am looking for is a way to hand off the conversion task from my webservice to a intermediate process that queues all requests and streamlines them 1 by 1 to OpenOffice.
However, sometimes I want to be able to issue a high priority conversion that gets processed immediately (after the current one, if busy) and have the webservice wait (block) for that. This seems a tricky addition that makes most simple scheduling techniques obsolete.
What you're after is some or other message/work queue system.
One of the simplest work queueing systems I've used, that also supports prioritisation, is beanstalkd.
You would have a single process running on your server, that will run your conversion process when it receives a work request from beanstalkd, and you will have your web application push a work request onto beanstalkd with relevant information.
The guys at DigitalOcean have written up a very nice intro to it here:
https://www.digitalocean.com/community/tutorials/how-to-install-and-use-beanstalkd-work-queue-on-a-vps
We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.
I've been looking for a decent example to answer my question but, not sure if its possible at this point.
I'm curious if its possible to upload a image or any file and stream it so separate server? In my case I would like to stream it to imgur.
I ask this because I don't want to reduce the bandwidth hit for all the files to come to the actual nodejs server and upload it from there. Again, I'm not sure if this is possible or if I'm reaching but, some insight or an example would help a lot.
Took a look at Binary.js which may do what I'm looking for but, its IE10+ so no dice with that...
EDIT: based on comment on being vague
When the file is uploaded to the node server, it takes a bandwidth hit. When node takes the file and upload it to the remote server, it takes another hit. (if I'm not mistaken) I want to know it its possible to pipe it to the remote service(imgur in this case) and just used the node server as a liaison. Again, I'm not sure if this is possible which is why I'm attempting to articulate the question. I'm attempting to reduce the amount of bandwidth and storage space used.
I am in the process of creating a site which enables users to upload audio. I just figured our how to use ffmpeg with PHP to convert audio files (from WAV to MP3) on the fly.
I don't have any real experience with ffmpeg and I wanted to know what's the best way to convert the files. I'm not going to convert them upon page load, I will put the conversions in a queue and process them separately.
I have queries about how best to process the queue. What is a suitable interval to convert these files without overloading the server? Should I process files simultaneously or one by one? How many files should I convert at each interval to allow the server to function efficiently?
Server specs
Core i3 2.93GHz
4GB RAM
CentOS 64-bit
I know these questions are very vague but if anyone has any experience with a similar concept, I would really love to hear what works for them and what common problems I could face in the road ahead.
Really appreciate all the help!
I suggest you use a work queue like beanstalkd. When there is a new file to convert simply place a message into the queue (the filename maybe). A daemon that works as beanstalkd client fetches the message and converts the audio file properly (the daemon can be written in any language that has a beanstalkd library).