Deploying containerised node.JS application through mesos-marathon - node.js

I am using Marathon to deploy my Docker containerised node.js application. My Marathon app spec is as follows :
{
"id": "<some-name>",
"cmd": null,
"cpus": 1,
"mem": 2800,
"disk": 30720,
"instances": 1,
"container": {
"docker": {
"image": "<some-docker-registry-IP>:5000/<repo>",
"network": "BRIDGE",
"privileged": true,
"forcePullImage": true,
"parameters": [
{
"key": "net",
"value": "host"
}
],
"portMappings": [
{
"containerPort": <some-port>,
"hostPort": <some-port>,
"protocol": "tcp",
"name": null
}
]
},
"type": "DOCKER"
}
}
The problem however is that this leads to restarting my server where application is deployed once it is out of memory. I need my services to listen on private IP of host machine and that's why I am using --net=host.
Is it possible to just kill the task freeing up the memory so that Marathon can re-spawn it without restarting/shutting down server? Or is there any other way to make the Docker container routable to the outside world without using --net=host?

Basically, I think there is a problem with you Node application if it shows a memory leaking behavior. Thats the first point I'd address.
Second one is that you should use something like pm2 in your application's Docker image which will take care of restarting you application (in the container itself) when it encounters a problem.
Furthermore, you could implement a Marathon health endpoint, so that Marathon can recognize that the application actually has problems.
To reach some redundancy, I'd strongly advise that you run at least two instances of the application, and use Mesos DNS and a load balancer like marathon-lb on the public slave node(s), which will take care of the routing. This also allows you to use bridged networking if you want to.

Related

Creating a nested python dictionnary

I'm working on converting the nested python dictionnary of whatportistoll to a new format with a new organisatio of keys, but I'm encountering the following issue :
- key skipping problem
Here is the format of the existing nested dictionary :
{
"_default": {
"0": {
"name": "Service name",
"port": "Port Number",
"protocol": "Transport Protocol",
"description": "Description"
},
"1": {
"name": "",
"port": "0",
"protocol": "tcp,udp",
"description": "Port 0 is reserved by IANA, it is technically invalid to use, but possible. It is sometimes used to fingerprint machines, because different operating systems respond to this port in different ways. Some ISPs may block it because of exploits. Port 0 can be used by applications when calling the bind() command to request the next available dynamically allocated source port number."
},
...
}
here is the targeted format :
{
"0": {
"tcp": {
"name": "",
"port": "0",
"protocol": "tcp,udp",
"description": "Port 0 is reserved by IANA, it is technically invalid to use, but possible. It is sometimes used to fingerprint machines, because different operating systems respond to this port in different ways. Some ISPs may block it because of exploits. Port 0 can be used by applications when calling the bind() command to request the next available dynamically allocated source port number."
},
"udp": {
"name": "",
"port": "0",
"protocol": "tcp,udp",
"description": "Port 0 is reserved by IANA, it is technically invalid to use, but possible. It is sometimes used to fingerprint machines, because different operating systems respond to this port in different ways. Some ISPs may block it because of exploits. Port 0 can be used by applications when calling the bind() command to request the next available dynamically allocated source port number."
}
},
"1": {
"tcp": {
"name": "tcpmux",
"port": "1",
"protocol": "tcp",
"description": "Scans against this port are commonly used to test if a machine runs SGI Irix (as SGI is the only system that typically has this enabled). This service is almost never used in practice.RFC1078 - TCPMUX acts much like Sun's portmapper, or Microsoft's end-point mapper in that it allows services to run on arbitrary ports. In the case of TCPMUX, however, after the \"lookup\" phase, all further communication continues to run over that port.builtins.c in Xinetd before 2.3.15 does not check the service type when the tcpmux-server service is enabled, which exposes all enabled services and allows remote attackers to bypass intended access restrictions via a request to tcpmux port 1 (TCP/UDP). References: [CVE-2012-0862] [BID-53720] [OSVDB-81774]Trojans that use this port: Breach.2001, SocketsDeTroieAlso see: CERT: CA-95.15.SGI.lp.vul"
},
"udp": {}
},
...
}
Could you please help resolving this issue ?
Your help is much appreciated.
Best Regards,
Jihane

NODE JS - cron jobs in cluster mode or behind a load balancer executing multiple times

I have a NestJS server working in cluster mode on ec2 instance using pm2.
I have successfully set up cron jobs executing only one time in the cluster mode by starting the server with multiple name configurations using the name of the pm2 process.
{
"apps":[
{
"script":"dist/main.js",
"instances": "1",
"exec_mode": "cluster",
"name":"queue"
},
{
"script":"dist/main.js",
"instances": "1",
"exec_mode": "cluster",
"name":"coco"
}
]
}
But I want to know how to handle this case when multiple instances are behind the load balancer. As the jobs are scheduled on each instance and executed multiple times with the same data from the remote database.
Any help would be appreciated.

Refresh IP address for Azure VM via REST API

I am trying to your the REST API to change the IP of my Ubuntu Virtual Machine on Azure.
In the web interface, stopping and starting the VM usually causes the public IP to change. However, just stopping and starting the VM with curl requests to the API does not trigger an IP change.
I can request the current status of the IP configuration using a GET request (see the docs here), but I cannot find any function to refresh it. I also tried setting the IP to static and back to dynamic before turning the VM back on, that also did not work.
I found this similar question here, but when I tried that approach, I got the following error message:
{ "error": {
"code": "IpConfigDeleteNotSupported",
"message": "IP Configuration ipconfig1 cannot be deleted. Deletion and renaming of primary IP Configuration is not supported",
"details": [] }
I have also created a secondary IP configuration. The first one is called ipconfig1, the second I named "alternative". This seems to be a second network interface. I have associated a second IP address with that second network interface. But I am still getting the same error.
My final request looks like this:
curl -X PUT -H "Authorization: Bearer MYTOKEN" -H "Content-Type: application/json" -d '{ "name": "NETWORKINTERFACE542", "id": "GROUP", "location": "westeurope", "properties": { "provisioningState": "Succeeded", "ipConfigurations": [ { "name": "alternative", "properties": { "privateIPAllocationMethod": "Dynamic", "subnet": { "id": "/subscriptions/xx-xx-xx-xx/resourceGroups/GROUP/providers/Microsoft.Network/virtualNetworks/GROUP-vnet/subnets/default" }, "primary": true, "privateIPAddressVersion": "IPv4" } } ], "dnsSettings": { "dnsServers": [], "appliedDnsServers": [] }, "enableAcceleratedNetworking": true, "enableIPForwarding": false }, "type": "Microsoft.Network/networkInterfaces" }' https://management.azure.com/subscriptions/xx-xx-xx-xx/resourceGroups/GROUP/providers/Microsoft.Network/networkInterfaces/NETWORKINTERFACE542?api-version=2020-07-01
(Where the CAPS terms are stand-ins for my actual variable names)
I am still getting the same error, even though I am not even referencing ipconfig1 in my request.
Is there any way to achieve an IP reset?
As your mentioned: In the web interface, stopping and starting the VM usually causes the public IP to change.
Generally, the stop operation in the web UI actually does deallocate operation, so you need to use REST API Deallocate and Start to trigger the public IP address changed.
Virtual Machines - Deallocate
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{vmName}/deallocate?api-version=2020-12-01
Virtual Machines - Start
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{vmName}/start?api-version=2020-12-01

DC/OS 1.9 VIP load balancing not working for advertised ports

When I publish a service with a VIP, the advertised address does not route properly to the advertised port. For example, for a MariaDB Galera 3-node cluster service with a VIP specified as:
"labels": {
"VIP_0": "/mariadb-galera:3306"
}
On the configuration tab of the service page (and according to the docs), the load balanced address is:
mariadb-galera.marathon.l4lb.thisdcos.directory:3306
I can ping the DNS name just fine, but...
When I try to connect a front-end service (Drupal7, wordpress) to consume this load balanced address:port combination, there will be numerous connection failures and timeouts. It isn't that it never works but that it works quite sporadically, if at all. Drupal7 dies almost immediately and starts kicking up Bad Gateway errors.
What I have found through experimentation is that if I specify a hostPort for the service in question, the load balanced address will work as long as I use the hostPort value, and not the advertised load balanced service port as above. In this specific case I specified a hostPort of 3310.
"network":"USER",
"portMappings": [
{
"containerPort": 3306,
"hostPort": 3310,
"servicePort": 10000,
"name": "mariadb-galera",
"labels": {
"VIP_0": "/mariadb-galera:3306"
}
}
Then if I use the load balanced address (mariadb-galera.marathon.l4lb.thisdcos.directory) with the host port value (3310) in my Drupal7 settings.php, the front end connects and works fine.
I've noticed similar behaviour with custom applications connecting to mongodb backends also in a DC/OS environment... it seems the load balanced address/port combination specified never works reliably... but if you substitute the hostPort value, it does.
The docs clearly state that:
address and port is load balanced as a pair rather than individually.
(from https://docs.mesosphere.com/1.9/networking/dns-overview/)
Yet I am unable to effectively connect when I specify the VIP designated port. Yet IT DOES WORK when I use the hostPort (and will not work at all unless I designate a specific hostPort in the service definition json). Wether or not this approach is actually load balanced remains a question to me based on the wording in the documentation.
I must be doing something wrong, but I am at a loss... any help is appreciated.
My cluster nodes are VMWare virtual machines.
The VIP label shouldn't start with a slash:
"container": {
"portMappings": [
{
"containerPort": 3306,
"name": "mariadb-galera",
"labels": {
"VIP_0": "mariadb-galera:3306"
}
}
}
should be available as <VIP label>.marathon.l4lb.thisdcos.directory:<VIP port> in this case:
mariadb-galera.marathon.l4lb.thisdcos.directory:3306
you can test it using nc:
nc -z -w5 mariadb-galera.marathon.l4lb.thisdcos.directory 3306; echo $?
The command should return 0.
When you're not sure about exported DNS names you can list all of them from any DC/OS node:
curl -s http://198.51.100.1:63053/v1/records | grep mariadb-galera

CouchDB replications not starting when new replication added

When I add a replication to CouchDB, it doesn't start. i.e. I get the following doc after saving:
{
"_id": "xxx",
"_rev": "yyy",
"target": "https://user:pswd.domain/db",
"source": "db",
"create_target": true,
"continuous": true,
"user_ctx": {
"name": "admin",
"roles": [
"_admin"
]
},
"owner": "admin"
}
Usually after creating a replication, the replication is triggered and the doc updated to include:
"_replication_state": "triggered" or "error",
"_replication_state_time": "some time",
"_replication_id": "some ID"
I am using CouchDB 1.6.0 on Ubuntu 16.04. What could cause this to happen? Replication was working fine until about an hour ago when 80 of 140 or so replications failed at once.
There are 60 replications that are seen as 'triggered' in couch. But the _active_tasks endpoint only shows 46.
As it turned out our server was experiencing high traffic from super dodgy origins. This was causing replications to timeout, and not allowing replications to restart. The Nginx access logs seemed to show repeated attempts at fishing for insecure PHP settings and open MySQL databases

Resources