How to raise or lower the log level in puppet master? - puppet

I am using puppet 3.2.3, passenger and apache on CentOS 6. I have 680 compute nodes in a cluster along with 8 gateways users use to log in to the cluster and submit jobs. All the nodes and gateways are under puppet control. I recently upgraded from 2.6. The master logs to syslog as desired, but how to change the log level for the master escapes me. I appear to have the choice of --debug, or nothing. Debug logs far too much detail, while not using that switch simply logs each time passneger/apache launch a new worker to handle incoming connections.
I find nothing in the on-line docs about doing this. What I want is to log each time a nodes hits the server; but I do not need to see the compiled catalogue, or resources in/var/log/messages.
How is this accomplished?

This is a hack, but here is how I solved the problem. In the file (config.ru) that passenger uses to launch puppet via rack middleware, which in my system lives in /usr/share/puppet/rack/puppetmasterd, I noticed these lines
require 'puppet/util/command_line'
run Puppet::Util::CommandLine.new.execute
So, this I edited to become
require 'puppet/util/command_line'
Puppet::Util::Log.level = :info
run Puppet::Util::CommandLine.new.execute
I suppose other choices for Log.level could be :warn and others.

Related

elasticsearch applying security on a running cluster

I've an ELK stack 7.6.2 with logstash, an elasticsearch cluster with 3 nodes and kibana. I would like to add security but the only doc I can fin always start 'from scratch' I would like to have an example on an already running cluster in order not te mess up with it. Thanks for your help.
Guillaume
You can not enable security features on an already running cluster. Security-settings are classified as static, meaning that they can not be dynamically updated on the fly:
static:
These settings must be set at the node level, either in the elasticsearch.yml file, or as an environment variable or on the command line when starting a node. They must be set on every relevant node in the cluster.
dynamic:
These settings can be dynamically updated on a live cluster with the cluster-update-settings API.
See https://www.elastic.co/guide/en/elasticsearch/reference/7.6/modules.html for reference and for all settings that can be dynamically updated (you won't find security settings there).
Also, from this guide (https://www.elastic.co/guide/en/elasticsearch/reference/current/get-started-enable-security.html) one can tell that you need to stop your running elasticsearch and kibana instances in order to enable security.
I hope I could help you.

Submit & monitor spark jobs via java in cluster mode

I have a java class which manage jobs and execute them via spark(using 1.6).
I am using the API - sparkLauncher. startApplication(SparkAppHandle.Listener... listeners) in order to monitor the state of the job.
The problem is I moved to work in a real cluster environment and this way can’t work when the master and workers are not on the same machine, as the internal implementation is making a use of localhost only (loopback) to open a port for the workers to bind to.
The API sparkLauncher.launch() works but doesn’t let me monitor the status.
What is the best practice for cluster environment using a java code?
I also saw the option of hidden Rest API, is it mature enough? Should I enable it in spark somehow (I am getting access denied, even though the port is open from outside) ?
REST API
In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at /api/v1. Eg., for the history server, they would typically be accessible at http://:18080/api/v1, and for a running application, at http://localhost:4040/api/v1.
More details you can find here.
Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:
A list of scheduler stages and tasks
A summary of RDD sizes and memory usage
Environmental information.
Information about the running executors
You can access this interface by simply opening http://driver-node:4040 in a web browser. If multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, etc).
More details you can find here.

CoreOS alternative to /usr/lib/systemd/system-shutdown/

I recently stumbled across the fact that on shutdown/reboot any script in /usr/lib/systemd/system-shutdown will get executed before the shutdown starts.
Paraphrasing - https://www.freedesktop.org/software/systemd/man/systemd-halt.service.html
With the /usr filesystem being read only on CoreOS I cannot put any of my shutdown scripts in /usr/lib/systemd/system-shutdown. I'm hoping someone more knowledgeable about CoreOS and systemd knows an alternate directory path on CoreOS nodes that would give me the same results. Or a configuration that I can adjust to point the directory to /etc/systemd/system-shutdown or something else.
Optionally any pointers on creating a custom service that does the same thing as systemd-shutdown.
My use case is that I have a few scripts that I want to execute when a node shutsdown. For example remove the node from the monitoring system, unschedule the node in kubernetes and drain any running pods while allowing in flight transactions to finish.

Setup Puppet at first place

I am trying to understand the best practice of setting up Puppet in the first place, let's say I have 1000 existing servers needs to be managed Puppet.
Do I manually install Puppet agent on each or there is a better way.
Sorry if this question is too generic just want to have some idea.
1000 servers could be a lot for a single master instance. of course it will depend on the master specs, and other factors related to the puppet runs.
There are few questions you need to answer first to determine how are you going to go about it such as
Puppet Enterprise or Open Source? What is the current configuration night mare you are trying to solve?
What is the current configuration data related to the challenge or
problem you have?
What are the current business roles (e.g. web server, load
balancer,database, ..etc) related to the problem you have? What
makes a role in terms of configurations?
I would suggest that you start first small to learn more about the puppet DSL, and its ECO system (master, agent, puppetdb, console/dashboard). I also recommend you start with the free 10 nodes puppet Enterprise as it will let you focus more on the problem at hand not how to configure the puppet masters, and agents, how to scale them, ..etc.
One more thing install puppet agent every where if you can in NOOP/disabled mode to get at least facts and run it in a masterless fashion using puppet apply when you need to. i find NOOP mode more useful as it tells you what needs to be changed, also you can enforce changes using --no-noop
hope that will get you started.
To answer your question: Yes, Puppet agent would need to be installed on every node. If you are managing 1000 nodes, I would assume you have your own OS image. In this case, its best to add it to the OS image, and use this image on 1000 nodes.

Accessing Matlab MDCS Cluster over SSH

I just installed Matlab's Distributed Computing Server on a bunch of machines and it works, but only for those physically connected to the cluster's network. For remote access those machines are 2 SSH hops away. How this problem is usually solved? I thought in setting up a VPN, but to me this seems like last resort.
What I want is that everybody in the lab, using their own versions of Matlab, with the correct Toolbox, just run their code in the cluster somewhat effortlessly. I guess I could ask to everybody just tar-ball their files and access a remote installation of matlab, somehow forwarding the GUI session (VNC or X-Forward), but that seem ugly.
Any help?
It is possible to set up "remote access" to a cluster running MDCS so that clients without direct access can submit jobs there. The documentation for this starts here:
http://www.mathworks.com/help/mdce/configure-parallel-computing-products-for-a-generic-scheduler.html
I'm not quite sure how to configure things so that the submission can work across two SSH connections - the example integration scripts shipping with MDCS all presume only one. However, it should be possible providing that:
The client can put the job and task files somewhere the execution nodes can see them
The client can trigger the appropriate qsub or whatever on the cluster headnode
You might also consider simply contacting MathWorks installation support.

Resources