I have tasks running on VM and the following sequence of events. For scaling purposes I need to be able to run operations on demand and possibly in parallel.
A simple sequence of events
1. Execute task
2. Task create dataset file.
3. Startup container instance (Linux)
4. In container Execute operations on data set
5. Write updated dataset
6. Vm consume dataset
Environment is Azure.
Azure files for exchanging dataset. (2,5)
PowerShell for creating and starting container.
PowerShell could be used for sequence 4
I do not wish to use platform specific event handlers as it may be necessary to port to other runtime environments. This is a simple use case which I guess many has touched on before. Anyone have any idea if HashiCorp Nomad could bring value? Any tips for other tooling which can bring added value?
Related
I'm trying to move some computations to Azure cloud services. One of the steps of the workflow I'm trying to implement includes running a Win32 desktop application generating a file. Obviously, we cannot have a user interaction for cloud calculations, so the application is launched with command line arguments. The process starts, generates a file, and then exists. At the moment I cannot refactor the code and move this functionality to command-line windowless utility.
First, I chose Azure Functions because they are intended for event-driven short calculations, and that's exactly what I need. Also they are cheap. But I encountered a problem that processes in Azure Functions are being executed inside a sandbox blocking User32/GDI32 system calls and thus preventing me from launching desktop applications.
Another solution I came up with is mounting a virtual machine drive with all needed Visual C++ redistributables installed and then using Azure Batch with nodes based on the pre-configured drive. But this solution has another drawbacks, since it takes minutes to mount a new node. Of course, I could have some nodes that are always active, but anyway the further scaling is slow and having active nodes is not so cheap. Also I have a feeling that Azure Batch is a bit overkill, because there is no need for HPC in my case. Azure Functions' computation capabilities are enough for me.
Is there some kind of compromise solution? So that I would have a solution with fast scaling and quick responses, but with no need to establish Azure Batch based on Azure Virtual Machines?
A lot of GDI32 calls are available now but in a containerized form.
So, you can deploy a function with the desktop application but inside a docker container.
Refer the following articlefor more explanation.
Refer the following documentation on how to deploy containerized function.
I need an Azure VM (Ubuntu) to do some task (java application) every 10 minutes. Because the task lasts usually less than a minute I would save money if could start the machine every 10 minutes and stop it when the task accomplishes. I learned that I can schedule start and stop times in automation account, but more optimal would be to stop the VM in the very moment that task is completed. Is there a simple way to do that?
This really sounds like a job for Azure Batch. If you are looking for an IaaS solution, Azure Batch will do the job for you. Have a look at it: https://azure.microsoft.com/en-gb/services/batch/#overview.
It allows you to use VM's with your preferred OS (in Azure Batch it is called a node), and run a set of tasks. Once finished, the VM will be de-allocated.
So each node runs a set of pools, in each pool you have a job, and in each job you can have tasks. A task can be for example a cmd line that runs a specific app. So for instance you could just run example.exe 1 2 on a windows OS or the equivalent command line for an Ubuntu OS.
The power here is that it will allocate the tasks to run on the VM when you add them to the job, and then the VM will be disposed off once finished, and you would only pay for the compute time.
The disadvantages of this is that it is a stateless VM, therefore anything that you need installing or storing you would have to use alternative methods. Azure Batch allows you to pre-install a program (for example your Java application) each time it initiates. Also if you are using files and/or expecting files to be created, you would need a blob storage to support this. So if you are expecting it to use a certain amount of files, store them on blob storage and then write back to the blob storage if your program is doing this.
Finally your scheduler, this really depends on how you want to deal with it, if you have a local server or a server on Azure that is already running 24/7 you can add a scheduled job to the scheduler and run a program that will add the task to the Azure Batch. Or if you don't mind using Azure Functions, you can just add a timer Azure Function that will add a task to the job. There are multiple ways of dealing with this, you may already have an existing solution.
Hope you find this useful!
I have an old algorithm that executes two steps, say A and B. A deals with connecting to an external service and retrieve data points. A needs to be executed n times, and all the data so gathered is passed as input to B. To help scale the A part, it was implemented using multi-threading where 10 threads are spawned and each connects to n/10 endpoints. Once all threads complete execution, the complete dataset is provided as input to B.
We are planning to create a Docker image of this algo. While we do that I would like to explore if we can do away with the multi threading and instead deploy multiple containers. This gives us better scalability as n is a variable and sometimes is very small or very large.
I could use Kubernetes to orchestrate these containers. However, I see two challenges:
1. How do I gather back all the data points into my core algo?
2. How do I know that all containers have finished processing, and that I could move to step B?
Any pointers or help is appreciated.
I've recently encountered a problem to process a pickle file of 8 Gigabytes with a Python script using VMs in Google Cloud Compute Engine. The problem is that the process takes too long and I am searching for ways to decrease the time of processing. One of possible solutions could be sharing the processes in the script or map them between CPUs of several VMs. If somebody knows how to perform it, please, share with me!))
You can use Clusters for Large-scale Technical Computing in the Google Cloud Platform (GCP). There are open source software like ElastiCluster provide cluster management and support for provisioning nodes while using Google Compute Engine (GCE).
After the cluster is operational, workload manager manages the task execution and node allocation. There are a variety of popular commercial and open source workload managers such as HTCondor from the University of Wisconsin, Slurm from SchedMD, Univa Grid Engine, and LSF Symphony from IBM.
This article is also helpful.
it looks like an HPC problem. Look at this link: https://cloud.google.com/solutions/architecture/highperformancecomputing.
There are lot of valuable solutions to your problem but it depends on the details of your case. A first simple approach could be to logically split your task in small jobs. Then you can assign a subset of these jobs to each GCE instance in your group of dedicated instances.
You can consider to create a group of a predefined number of instances. Each run could rely on a startup scripts in order to reach out the job it must execute. When the job finishes the instance can be deleted and substituted by a new one (Google Compute Engine Managed Instance Groups will create a new instance automatically). You must only manage when the group should start and stop.
Furthermore, you can consider preemptible instances (more cheaper).
Hope this helps you.
Bye
I have a task (couple of C# functions) that updates the DB after every interval. How do I achieve this on Windows Azure (assuming after deployment the DB would also move to SQL Azure)
There's several options:
- use a 3rd party job scheduler to initiate the process remotely
- deploy a single "worker instance" that uses the task schedule built into Server 2008 to schedule the processes (this will require statup tasks)
- deploy a timer process as part of another role, just make sure you put in a traffic cop or singleton style pattern to prevent multiple instances rom simultaneously trying to execute the same process.
You can develop and deploy a Windows Azure Compute Worker Role. This would be the right tool for long running and background operations hosted in Azure. Depending on what your task is doing (how CPU intensive it is) you could choose a very small role size to minimize cost.
You could probably also put such a task in a preexisting web or worker role (but that might not be a clean solution depending on what your task is doing and how reliably it should run).