I'm trying to set up a Jenkins system where a certain program has to be run on a board on the network, accessed using telnet. We're talking about hundreds of such jobs here, therefore we will be setting up multiple boards. Therefore, each job has to be allocated a board, but the catch is that only one job can have a certain board at the same time, otherwise the program fails.
The solution I have right now is using a master-slave set-up where I connect to the same machine using SSH (so a master and multiple slaves on the same machine). Each of the slave nodes then has a label for the IP address the program has to telnet to. This works, scheduling wise, but it might cause issues because all nodes connect using SSH to the same machine. Connecting to the boards using SSH is not an option.
Is there any way to get the same functionality as above, but then without using SSH to connect to the same machine? So basically I want to be able to say: we have n available machines, when a job comes in give it one of those machines and pass it a label belonging to that machine (its IP address in this case); now there are n-1 machines left.
Mutual exclusion comes close, but does not allow the above functionality, and jobs waiting for a resource take up one of the executors of a node.
Thanks a lot!
I realize your problem is probably solved already years ago, but in case someone else is looking for the answer and runs into this.
You can use "Lockable resources" plugin and set the ip address as the name of the resource and use label such use test-board-ip.It is simple and easy to use.
Another possibility is to use "External resources dispatcher" plugin. It provides a bit more possibilities, but it has a bug that causes it to hang sometimes. And it seems there is no maintenance any more (last updates from 2013).
Maybe you should hava a look at the Lock and Latches Plugin. You are able to lock a resource with this plugin with only requireing the job to lock the board you want to.
https://wiki.jenkins-ci.org/display/JENKINS/Locks+and+Latches+plugin
Related
Let's say I have a cluster of 3 nodes for ScyllaDB in my local network (it can be AWS VPC).
I have my Java application running in the same local network.
I am concerned how to properly connect app to DB.
Do I need to specify all 3 IP addresses of DB nodes for the app?
What if over time one or several nodes die and get resurrected on other IPs? Do I have to manually reconfigure application?
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
I would be much grateful for a code sample of how to connect Java app to multi-node cluster.
You need to specify contact points (you can use DNS names instead of IPs) - several nodes (usually 2-3), and driver will connect to one of them, and will discover the all nodes of the cluster after connection (see the driver's documentation). After connection is established, driver keeps the separate control connection opened, and via it receives the information about nodes that are going up & down, joining or leaving the cluster, etc., so it's able to keep information about cluster topology up-to-date.
If you're specifying DNS names instead of the IP addresses, then it's better to specify configuration parameter datastax-java-driver.advanced.resolve-contact-points as true (see docs), so the names will be resolved to IPs on every reconnect, instead of resolving at the start of application.
Alex Ott's answer is correct, but I wanted to add a bit more background so that it doesn't look arbitrary.
The selection of the 2 or 3 nodes to connect to is described at
https://docs.scylladb.com/kb/seed-nodes/
However, going forward, Scylla is looking to move away from differentiating between Seed and non-Seed nodes. So, in future releases, the answer will likely be different. Details on these developments at:
https://www.scylladb.com/2020/09/22/seedless-nosql-getting-rid-of-seed-nodes-in-scylla/
Answering the specific questions:
Do I need to specify all 3 IP addresses of DB nodes for the app?
No. Your app just needs one to work. But it might not be a bad idea to have a few, just in case one is down.
What if over time one or several nodes die and get resurrected on other IPs?
As long as your app doesn't stop, it maintains its own version of gossip. So it will see the new nodes being added and connect to them as it needs to.
Do I have to manually reconfigure application?
If you're specifying IP addresses, yes.
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
By abstracting the need for a specific IP, using something like Consul. If you wanted to, you could easily build a simple restful service to expose an inventory list or even the results of nodetool status.
So, I have a collection of Windows Server 2016 virtual machines that are used to run some tests in pairs. To perform these tests, I copy a selection of scripts and files from the network on to the machine, before performing the tests.
I'm basically using a selection of scripts that have existed around here since before my time and whilst i would like to use other methods, so much of our infrastructure relies on these scripts that overhauling the system would be a colossal task.
First up, i sort out the mapped drives with
net use X: \\network\location1 /user:domain\user password
net use Y: \\network\location2 /user:domain\user password
and so on
Soon after, i use rsync to copy files from a location in /cygdrive/y/somewhere to /cygdrive/c/somewhere_else
During the rsync, i will get errors that "files have vanished" (I'm currently unable to post the exact error, I will edit this later to include this). When i check what's currently in the /cygdrive directory, all i see is /cygdrive/c and everything else has disappeared.
I've tried making a symbolic link to /cygdrive/y in a different location, I've tried including persistent:yes on the net use command, I've changed the power settings on the network card to not sleep. None of these work.
I'm currently looking into the settings for the virtual machines themselves at this point, but I have some doubts as we have other virtual windows machines that do not seem to have this issue.
Has anyone has heard of anything similar and/or knows of a decent method to troubleshoot this?
Right, so I've been working on this all day and finally noticed a positive change, but since my systems are in VMware's vCloud, this may not work for some people. It's was simply a matter of having the VM turned off and upgrading the Virtual Hardware Version to the latest version. I have noticed with this though, that upon a restart, one of the first messages that comes up mentions that the computer is "disabling group policies".
I did a bit of research into this and found out that Windows 8 and 10 (no mention of any Windows Server machines) both automatically update Group Policies in the background, disconnecting and reconnecting mapped drives to recreate them.
It's possible that changing the Group Policy drive from "recreate" to "update" should fix this issue, and that the Virtual Hardware update happened to resolve this in a similar manner.
I just installed Matlab's Distributed Computing Server on a bunch of machines and it works, but only for those physically connected to the cluster's network. For remote access those machines are 2 SSH hops away. How this problem is usually solved? I thought in setting up a VPN, but to me this seems like last resort.
What I want is that everybody in the lab, using their own versions of Matlab, with the correct Toolbox, just run their code in the cluster somewhat effortlessly. I guess I could ask to everybody just tar-ball their files and access a remote installation of matlab, somehow forwarding the GUI session (VNC or X-Forward), but that seem ugly.
Any help?
It is possible to set up "remote access" to a cluster running MDCS so that clients without direct access can submit jobs there. The documentation for this starts here:
http://www.mathworks.com/help/mdce/configure-parallel-computing-products-for-a-generic-scheduler.html
I'm not quite sure how to configure things so that the submission can work across two SSH connections - the example integration scripts shipping with MDCS all presume only one. However, it should be possible providing that:
The client can put the job and task files somewhere the execution nodes can see them
The client can trigger the appropriate qsub or whatever on the cluster headnode
You might also consider simply contacting MathWorks installation support.
I'm researching configuration management software like Puppet. My primary concern is preventing a single entry point to all of our internal servers. Take this scenario for example.
Somehow, access is gained into the master configuration server. From there a user would then be able to gain relatively easy access to manipulate or ultimately gain access to other servers controlled by the master.
The primary goal is to prevent a single point entry into the network, even if said master configuration is not available to the public internet.
tl;dr How can I prevent single point access to all other servers in a master/agent configuration management setup?
If you are thinking about delegating the task of defining the Puppet rules to other people (eg. technician), you can have create a Puppet master (Master A), have a test machine connected to Master A, then make them commit the code to Git or SVN.
You control a second Puppet master (Master B), which you pull the code from Git or SVN. All your machine in network connect to Master B. Once you are happy with the code, you can ask Puppet to push it to all your machine.
This way, access to all machine configuration only on Master B, which only you handle and access.
You can't with the default configuration.
The way Puppet is designed is to have agents to contact the master. Even if you are behind multiple firewalls you need to allow the agents to enter into your internal network. Even if you routinelly allow connections from the DMZ to the internal network, you may still need to manage machines in the open internet. What Puppet requires is to open your internal network to the open internet.
The risk of this client-pull design is that if you can hack into a machine with an agent you can contact the master, and if the master have any vulnerability you can hack into it and from them you can control all machines with agents, plus you can mount an attack into your internal network. So, if a vulnerability is exploited in the Puppet master communication channel with the agents, then Puppet becomes an attack vector (a huge one, as you maybe managing all your infrastructure with it, and you have allowed access from outside to your LAN).
With a master-push design this could be minimized, as the master would be one single point to protect and would be inside the safe internal network, with connections only going from inside to outside.
There is a pending feature request (4 years old!) in PuppetLabs (http://projects.puppetlabs.com/issues/2045) titled Push functionality in puppetmaster to clients. Reading the comments on that feature request and finding things like the following comment makes me wonder if the Puppet developers really understant what is the problem:
Ultimately, it isn’t all that high a priority, either – almost every risk that opening the port to the master exposes is also exposed by having the master reach out and contact the client. There is little or no change in actual risk to the model proposed.
However, while the developers realize the problem, others are designing their own solutions (like https://github.com/tomas-edwardsson/puppet-push).
Update:
I've found a presentation by Bernd Strößenreuther titled Best practices on how to turn Your environment into a Puppet managed environment available as PDF at http://stroessenreuther.info/pub/Puppet_getting_started.pdf
He suggest to establish ssh connections from the master to the agents and open a reverse tunnel so that the agents can connect to the master. These connections could be started in a cron job periodically. In this way you don't have to open your internal network for incoming connections yet the agents have access to the master data.
Now, regarding the pull mechanism, it may seem like a bad design but actually it is essential to allow very automated environments to work. For example, in an elastic network (like EC2 with autoscaling) where servers are started and halted automatically, the servers need to be able to configure themselves right away, so they boot up and the first thing they do is contact the master for an updated configuration. That would be harder if you have to push the configuration to each server periodically, because they would need to wait for the master (seconds, minutes, or hours; that is unacceptable in some applications).
I'm currently managing a cluster of PHP-FPM servers, all of which tend to get out of sync with each other. The application that I'm using on top of the app servers (Magento) allows for admins to modify various files on the system, but now that the site is in a clustered set up modifying a file only modifies it on a single instance (on one of the app servers) of the various machines in the cluster.
Is there an open-source application for Linux that may allow me to keep all of these servers in sync? I have no problem with creating a small VM instance that can listen for changes from machines to sync. In theory, the perfect application would have small clients that run on each machine to be synced, which would talk to the master server which would then decide how/what to sync from each machine.
I have already examined the possibilities of running a centralized file server, but unfortunately my app servers are spread out between EC2 and physical machines, which makes this unfeasible. As there are multiple app servers (some of which are dynamically created depending on the load of the site), simply setting up a rsync cron job is not efficient as the cron job would have to be modified on each machine to send files to every other machine in the cluster, and that would just be a whole bunch of unnecessary data transfers/ssh connections.
I'm dealing with setting up a similar solution. I'm half way there. I would recommend you use lsyncd, which basically monitors the disk for changes and then immediately (or whatever interval you want) automatically syncs files to a list of servers using rsync.
The only issue I'm having is keeping the server lists up to date, since I can spin up additional servers at any time, I would need to have each machine in the cluster notified whenever a machine is added or removed from the cluster.
I think lsyncd is a great solution that you should look into. The issue I'm having may turn out to be a problem for you as well, and that remains to be solved.
Instead of keeping tens or hundreds of servers cross-synchronized it would be much more efficient, reliable, and most of all simple maintaining just one "admin node" and replicating changes from that to all your "worker nodes".
For instance at our company we use a Development server -> Staging server -> Live backends workflow where all the changes are transferred across servers using a custom php+rsync front end. That allows the developers to push updates to a Staging server in the live environment, test out changes, and roll them to Live backends incrementally.
A similar approach could very well work in your case as well. Obviously it's not a plug-and-play solution, but I see it as the easiest way to go - both in terms of maintainability and scalability.