I've installed Graphite+Carbon to collect metrics from several hosts. These hosts send Apache Spark and Java metrics. I can't distinguish metrics from different hosts on Graphite side. What should be the right approach? I want to group metrics by host.
"Master" is located on remote host, "workers" are located on three remote hosts and I can't distinguish incoming numbers. Don't understand what is the right way to add host determinant into metric.
Graphite has the notion of namespaces. e.g. host.app.metric.dimension If you don't send your metrics in this way you have no way of distinguishing them from one another.
Depending on your library there should be a way to prefix the sent metrics with some kind of identifier. I recommend some unique internal identifier and then going on from it.
Related
I set up two same servers with the same configuration. I link with hosts the same templates on the Zabbix server, but Hosts has different counts items, triggers, and graphs, why?
It depends on the no. of services, filesystems, processes etc discovered by the descovery templates. so, the no. of items and triggers will not be the same. Here is the documentation
https://www.zabbix.com/documentation/devel/en/manual/web_interface/frontend_sections/configuration/templates
https://www.zabbix.com/documentation/devel/en/manual/web_interface/frontend_sections/configuration/templates/items
https://www.zabbix.com/documentation/devel/en/manual/web_interface/frontend_sections/configuration/templates/triggers
https://www.zabbix.com/documentation/devel/en/manual/web_interface/frontend_sections/configuration/templates/discovery
Let's say I have a cluster of 3 nodes for ScyllaDB in my local network (it can be AWS VPC).
I have my Java application running in the same local network.
I am concerned how to properly connect app to DB.
Do I need to specify all 3 IP addresses of DB nodes for the app?
What if over time one or several nodes die and get resurrected on other IPs? Do I have to manually reconfigure application?
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
I would be much grateful for a code sample of how to connect Java app to multi-node cluster.
You need to specify contact points (you can use DNS names instead of IPs) - several nodes (usually 2-3), and driver will connect to one of them, and will discover the all nodes of the cluster after connection (see the driver's documentation). After connection is established, driver keeps the separate control connection opened, and via it receives the information about nodes that are going up & down, joining or leaving the cluster, etc., so it's able to keep information about cluster topology up-to-date.
If you're specifying DNS names instead of the IP addresses, then it's better to specify configuration parameter datastax-java-driver.advanced.resolve-contact-points as true (see docs), so the names will be resolved to IPs on every reconnect, instead of resolving at the start of application.
Alex Ott's answer is correct, but I wanted to add a bit more background so that it doesn't look arbitrary.
The selection of the 2 or 3 nodes to connect to is described at
https://docs.scylladb.com/kb/seed-nodes/
However, going forward, Scylla is looking to move away from differentiating between Seed and non-Seed nodes. So, in future releases, the answer will likely be different. Details on these developments at:
https://www.scylladb.com/2020/09/22/seedless-nosql-getting-rid-of-seed-nodes-in-scylla/
Answering the specific questions:
Do I need to specify all 3 IP addresses of DB nodes for the app?
No. Your app just needs one to work. But it might not be a bad idea to have a few, just in case one is down.
What if over time one or several nodes die and get resurrected on other IPs?
As long as your app doesn't stop, it maintains its own version of gossip. So it will see the new nodes being added and connect to them as it needs to.
Do I have to manually reconfigure application?
If you're specifying IP addresses, yes.
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
By abstracting the need for a specific IP, using something like Consul. If you wanted to, you could easily build a simple restful service to expose an inventory list or even the results of nodetool status.
When should I use multiple input for my graylog? Do you have a use case?
For instance, I have different Symfony (3.x) applications with different environments (integration, prod, ..) and I want all of them into my graylog.
What is the best way (or the bests pratices) to send all of them into my graylog and easily crate stream based on this environments?
The way I always understood this is that you create separate inputs for "kinds of logs". Like: one rsyslog input for all machines sending logs in rsyslog format, second for all GELF applications, third for capturing NetFlow, etc.
You can send logs from an arbitrary number of applications and systems (i.e. environments) to Graylog (even on the same input).
Simply configure your applications and systems to send logs to Graylog and create an appropriate input for them.
See http://docs.graylog.org/en/2.4/pages/sending_data.html for some hints.
Situation: I have to prepare to manage a large amount of remote servers.
Problem: They are going to be in different private networks. So I won't be able to reach them from outside, but they can easily reach my master node.
Is it sufficient that my client nodes know how to reach my master node in order for them to communicate?
Absolutely.
We have exactly that, "all in cloud" server infrastructure on multiple cloud providers, plus numbers of puppet manageable workstations on different continents and one puppet server responsible for hundreds of nodes and additional puppet dashboard server. They all communicate without any problems across Internet.
Something similar to this:
Puppet Infrastructure
I want to store logs of applications like uWSGI ("/var/log/uwsgi/uwsgi.log") on a device that can be accessed from
multiple instances and can save their logs to that particular device under their own instance name dir.
So does AWS provides any solution to do that....
There are a number of approaches you can take here. If you want to have an experience that is like writing directly to the filesystem, then you could look at using something like s3fs to mount a common S3 bucket to each of your instances. This would give you more or less a real-time log merge though honestly I would be concerned over the performance of such a set up in a high volume application.
You could process the logs at some regular interval to push the data to some common store. This would not be real time, but would likely be a pretty simple solution. The problem here is that it may be difficult to interleave your log entries from different servers if you need to have them arranged in time order.
Personally, I set up a Graylog server for each instance cluster I have, to which I log all my access logs, error logs, etc. It is UDP based, so it is fire and forget from the application servers' standpoint. It provides nice search/querying tools as well. Personally I like this approach as it removes log management from the application servers altogether.
Two options that I've used:
Use syslog (or Syslog-NG) to log to a centralized location. We do this to ship our AWS log data offsite to our datacenter. Syslog-NG is more reliable than plain ole' Syslog and allows us to use MongoDB as a backing store.
Use logrotate to push your logs to S3. It's not real-time like the Syslog solution, but it's a lot easier to set up and manage, especially if you have a lot of instances and aren't using a VPC
Loggly and Splunk Storm are also two interesting SaaS products intended to solve this problem.