Graphite not graphing statsd requests - node.js

I've got graphite and statsd (nodejs 0.6.2) setup on a Ubuntu 11.04 running nginx 1.010 using uwsgi.
I can confirm that graphite is setup correctly as when I run the example python client it will being dropping data on the graph as it should. However, when I start statsd (it starts without error), and start my app that just loops and dumps stats I don't see any stats being graphed.
I've done tcpdump on port 8125 and I am seeing the request coming in. Any thoughts?

|your script| -> |statsd:8125|
Edit the statsd config file and change the backend to 'console'. Now start statsd and your script in parallel. The statsd terminal will start dumping output. (The default flushInterval is 10000ms)
|statsd:8125| -> |carbon/whisper|
tailf the log files from "/opt/graphite/storage/log/carbon-cache/carbon-cache-a". The latest one would be: console.log, creates.log, listener.log, query.log. Out of these, "creates.log" will tell you about the .wsp files being created. Ensure that the files are being created. These files reside in: "/opt/graphite/storage/whisper/stats".
For more info on the schema and config of your data being stored in there, use whisper-dump.py to read the .wsp file.
Sample output:
Meta data:
aggregation method: average
max retention: 157784400
xFilesFactor: 0.5
Archive 0 info:
offset: 52
seconds per point: 1
points: 10080
retention: 10080
size: 120960
Now ensure that the statsd config specifies "localhost" and "2003" as the addr and port.
Open localhost in your browser. You should have graphite. Select your parameters from the tab on left. You should have your graphs.

Related

stackdriver logging agent not showing logs read from a custom log file in stackdriver logging viewer on Google cloud platform

I decided to post this question because, I have ran out of debugging ideas, just ideas are golden since I know it can be difficult to help debugging a virtual instance through here (debugging code is hard enough jaja). Anyway, I have created a virtual machine in Compute engine , I created a logs file that I populate, for example, with this command in a python script, let's call it logging.py:
import logging
logging.basicConfig(filename= 'app.log' , level = logging.INFO , format = ' %(asctime)s - %(name) - %(levelname)s - %(message)s')
logging.info('Some message ' + str(type(variable)))
everytime I use python3 logging.py , the app.log is effectively populated. ( Logging.py and app.log are in the same directory the /home/username/ folder )
I want stackdriver to show this log in the logging viewer everytime it's written, so , I installed the stackdriver agent as follows, in the virtual machine command line:
$ curl -sSO https://dl.google.com/cloudagents/install-logging-agent.sh
$ sudo bash install-logging-agent.sh
No errors that I see are delivered here, in fact, you can see here the messages obtained
Messags on the stackdriver viewer:
After this, I proceed to create a .conf file that I create in /etc/google-fluentd/config.d/app.conf
with this parameters
<source>
type tail
format none
path /home/username/app.log
pos_file /var/lib/google-fluentd/pos/app.pos
read_from_head true
tag whatever-tag
</source>
After that is created, I launch sudo service google-fluentd restart.
Aftert I execute, python3 logging.py , no logs are added to stack drivers logging viewer.
So, where might Have I gone wrong?
Things I have tried/checked:
-Have more than 13 gygabytes of RAM available
-If I run logger "some message" on the command line, I effectively add a log with "some message" to the log viewer
-If I run
ps ax | grep fluentd
I obtain :
3033 ? Sl 0:09 /opt/google-fluentd/embedded/bin/ruby /usr/sbin/google-fluentd --log /var/log/google-fluentd/google-fluentd.log --no-supervisor
3309 pts/0 S+ 0:00 grep --color=auto fluentd
-Both my user, and the service account I use, have logger admin permission in IAM roles.
-This is the documentation I have based myself on:
https://cloud.google.com/logging/docs/agent/troubleshooting?hl=es-419
https://cloud.google.com/logging/docs/reference/v2/rest/v2/entries/list?hl=es-419
https://cloud.google.com/logging/docs/agent/configuration?hl=es-419
https://medium.com/google-cloud/how-to-log-your-application-on-google-compute-engine-6600d81e70e3
https://cloud.google.com/logging/docs/agent/installation
-If I run sudo service google-fluentd status , the agent appears active.
-My instance hass access, to all the apis. It's an n1-standard-4 (4 vCPUs, 15 GB of memory) using ubuntu linux 18:04
So, what else can I check to debug this? I'm out of ideas here , hope I'm not being an idiot here :(
Based on my understanding, I think that you looking for the following fluentd resource types:
generic_node
“A generic node identifies a machine or other computational resource for which no more specific resource type is applicable. The label values must uniquely identify the node.”
generic_task
“A generic task identifies an application process for which no more specific resource is applicable, such as a process scheduled by a custom orchestration system. The label values must uniquely identify the task.”
The source of my information has been found here
This document explain how to send logs from your application in different ways:
Cloud Logging API
Cloud Logging Agent
Generic fluentd
As you mentioned having installed fluentd, let me provide more focused documentation about Cloud Logging Agent. I also found some python Client Library documentation that you may be interested.
Finally, I found a nginx/apache use-case guide that you may use as reference.
For some reason, if I change the directory to which both the .conf file points, and the directory where the logg is to /var/logs/ , being the final path as /var/logs/app.logs, it does work correctly. Possibly there is a configuration issue, causing the logging agent to only capture logs in specific predetermined folders, or a permissions issue that stops it from working if the log is in the username directory.
I found this solution, however, by chance(random testing basically.
). Did not find anything in the main articles that are supposed to teach me how to configure the logging agent, that could point me in the right direction, being those articles this ones,
https://cloud.google.com/logging/docs/agent/troubleshooting?hl=es-419 https://cloud.google.com/logging/docs/reference/v2/rest/v2/entries/list?hl=es-419 https://cloud.google.com/logging/docs/agent/configuration?hl=es-419 https://medium.com/google-cloud/how-to-log-your-application-on-google-compute-engine-6600d81e70e3 https://cloud.google.com/logging/docs/agent/installation
If I needed it to work in my username directory, it's not clear just by checking this articles how to do it,what configuration file I would need to change or where to start, so I recommend to google to improve that aspect of the docs.
This documentation you have sent https://docs.fluentd.org/quickstart is pretty interesting, maybe I can find the explanation there, thank you for your help.

Spring Data GemFire Server java.net.BindException in Linux

I have a Spring Boot app that I am using to start a Pivotal GemFire CacheServer.
When I jar up the file and run it locally:
java -jar gemfire-server-0.0.1-SNAPSHOT.jar
It runs fine without issue. The server is using the default properties
spring.data.gemfire.cache.log-level=info
spring.data.gemfire.locators=localhost[10334]
spring.data.gemfire.cache.server.port=40404
spring.data.gemfire.name=CacheServer
spring.data.gemfire.cache.server.bind-address=localhost
spring.data.gemfire.cache.server.host-name-for-clients=localhost
If I deploy this to a Centos distribution and run it with the same script but passing the "test" profile:
java -jar gemfire-server-0.0.1-SNAPSHOT.jar -Dspring.profiles.active=test
with my test profile application-test.properties looking like this:
spring.data.gemfire.cache.server.host-name-for-clients=server.centralus.cloudapp.azure.com
I can see during startup that the server finds the Locator already running on the host (I start it through a separate process with Gfsh).
The server even joins the cluster for about a minute. But then it shuts down because of a bind exception.
I have checked to see if there is anything running on that port (40404) - and nothing shows up
EDIT
Apparently I DO get this exception locally - it just takes a lot longer.
It is almost instant when I start it up on the Centos distribution. On my Mac it takes around 2 minutes before the process throws the exception:
Adding a few more images of this:
Two bash windows - left is monitoring GF locally and right I use to check the port and start the Java process:
The server is added to the cluster. Note the timestamp of 16:45:05.
Here is the server added and it appears to be running:
Finally, the exception after about two minutes - again look at the timestamp on the exception - 16:47:09. The server is stopped and dropped from the cluster.
Did you start other servers using Gfsh? That is, with a Gfsh command similar to...
gfsh>start server --name=ExampleGfshServer --log-level=config
Gfsh will start CacheServers listening on the default CacheServer port of 40404.
You have a few options.
1) First, you can disable the default CacheServer when starting a server with Gfsh like so...
gfsh>start server --name=ExampleGfshServer --log-level=config --disable-default-server
2) Alternatively, you can change the CacheServer port when starting other servers using Gfsh...
gfsh>start server --name=ExampleGfshServer --log-level=config --server-port=50505
3) If you are starting multiple instances of your Spring Boot, Pivotal GemFire CacheServer class, then you can vary the spring.data.gemfire.cache.server.port property by declaring the property as a System property when you startup.
For instance, you can, in the Spring Boot application.properties, do...
#application.properties
...
spring.data.gemfire.cache.server.port=${gemfire.cache.server.port:40404}
And then when starting the application from the command-line...
java -Dgemfire.cache.server.port=48484 -jar ...
Of course, you could just set the SDG property from the command line too...
java -Dspring.data.gemfire.cache.server.port=48484 --jar ...
Anyway, I guarantee you that you have another process (e.g. Pivotal GemFire CacheServer) with a ServerSocket listening on port 40404, running. netstat -a | grep 40404 should give you better results.
Hope this helps.
Regards,
John

How to setup ELK with node.js

I want to log error from my node.js server to another server. I'm thinking of using elasticsearch, logstash and kibana. I want to know how to setup ELK with my node server.
I had exactly this use case in my older organization. A basic tutorial to startup with Beats + ELK - https://www.elastic.co/guide/en/beats/libbeat/current/getting-started.html
So basically this is how it works - Your nodejs app will log in the files (you can use bunyan for this) in different formats like error/warning/info etc. Filebeat will tail these log files and send messages to logstash server. Logstash input.conf will have some input filters (in your case it will be error filters). When any log message passes these filters then logstash will forward it to some endpoint as decided in output.conf file.
Here is what we did -
Initial architecture - Install filebeat (earlier we used logstash forwarder) client to tail the logs on nodejs server and forward it to logstash machine. Logstash will do some processing on input logs and send them to ES cluster (can be on same machine as Logstash). Kibana is just a visualization on this ES.
Final Architecture - Initial setup was cool for small traffic but we realized that logstash can be single point of failure and may result in log loss when traffic increased. So we integrated Kafka along with Logstash so that system scales smoothly. Here is an article - https://www.elastic.co/blog/logstash-kafka-intro
Hope this helps!
It is possible to use logstash without agents running to collect logs from the application.
Logstash has input plugins (https://www.elastic.co/guide/en/logstash/current/input-plugins.html). This can be configured in the pipeline. One basic setup is to configure the TCP (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-tcp.html) or UDP (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-udp.html)input plugin. Logstash can listen on the port configured in the plugin.
Then the application can send the log directly to the logstash server. The pipeline can then transform and forward to ES.
By configuring Logstash pipeline to be durable, data loss can be avoided. This approach is better when the application servers are ephemeral ( as in containers).
For nodejs, https://www.npmjs.com/package/winston-logstash is a package which is quite active. This gist https://gist.github.com/jgoodall/6323951 provides a good example for the overall approach in other languages.
This is the sample (minimal) TCP input plugin configuration
input {
tcp {
'port' => '9563'
}
}
You can install Logstash in the NodeJS Server, and then create a configuration file that accepts input (location of the log file(s)) and output to your Elastic Server host.
Below is the sample configuration file (custom.conf) which has to created in your logstash directory.
input {
file {
path => "/path to log"
start_position => beginning
}
}
output {
stdout { }
elasticsearch{
type => "stdin-type"
embedded => false
host => "192.168.0.23"
port => "9300"
cluster => "logstash-cluster"
node_name => "logstash"
}
}
Execute the logstash
logstash -f custom.conf
Reference: https://www.elastic.co/guide/en/logstash/current/config-examples.html
If you are planning to customize a NodeJS application for sending error logs then you can install some ELKStack Nodjs wrapper and post error log within your application. ELKStack Nodjs wrapper - https://www.npmjs.com/package/elksdk

How to monitor multiple devices with fw1-loggrabber

I am currently working on a logging system where i need to pull logs out of Checkpoint devices.
I use fw1-loggrabber with OPSEC LEA, and I successfully pulled logs from a Checkpoint firewall.
Now let's say i have 100 devices.
do I need to configure and run fw1-loggrabber 100 times or can I use one lea.conf and fw1-loggrabber.conf to configure all the devices I want to monitor and run it?
My currently configured files:
lea.conf:
lea_server auth_type sslca
lea_server ip 255.255.255.255
lea_server auth_port 18184
lea_server port 18184
opsec_sic_name "CN=Test,O=test..hi7arv"
lea_server opsec_entity_sic_name "cn=tt_mgmt,o=test..hi7arv"
opsec_sslca_file /opt/pkg_rel/p12_cert_file
fw1-loggrabber.conf
DEBUG_LEVEL="0"
FW1_LOGFILE="fw.log"
FW1_OUTPUT="logs"
FW1_TYPE="ng"
FW1_MODE="normal"
ONLINE_MODE="yes"
SHOW_FIELDNAMES="yes"
DATEFORMAT="std"
SYSLOG_FACILITY="LOCAL1"
RESOLVE_MODE="no"
RECORD_SEPARATOR="|"
LOGGING_CONFIGURATION=file
OUTPUT_FILE_PREFIX="/var/log/testFolder/Checkpoint/fw1"
OUTPUT_FILE_ROTATESIZE=1048576
If not possible to configure and run all from one configuration file (or two), any alternatives for pulling logs using Checkpoint OPSEC LEA?
Thanks.
When you run the fw1-loggrabber simply run it with as many lea.conf configs as you like - it will run on as many devices as you want.
Example:
/usr/local/fw1-loggrabber/bin/fw1-loggrabber
-c /usr/local/fw1-loggrabber/fw1-loggrabber.conf
-l /usr/local/fw1-loggrabber/lea1.conf
-l /usr/local/fw1-loggrabber/lea2.conf

connect EADDRNOTAVAIL in nodejs under high load - how to faster free or reuse TCP ports?

I have a small wiki-like web application based on the express-framework which uses elastic search as it's back-end. For each request it basically only goes to the elastic search DB, retrieves the object and returns it rendered with by the handlebars template engine. The communication with elastic search is over HTTP
This works great as long as I have only one node-js instance running. After I updated my code to use the cluster (as described in the nodejs-documentation I started to encounter the following error: connect EADDRNOTAVAIL
This error shows up when I have 3 and more python scripts running which constantly retrieve some URL from my server. With 3 scripts I can retrieve ~45,000 pages with 4 and more scripts running it is between 30,000 and 37,000 pages Running only 2 or 1 scripts, I stopped them after half an hour when they retrieved 310,000 pages and 160,000 pages respectively.
I've found this similar question and tried changing http.globalAgent.maxSockets but that didn't have any effect.
This is the part of the code which listens for the URLs and retrieves the data from elastic search.
app.get('/wiki/:contentId', (req, res) ->
http.get(elasticSearchUrl(req.params.contentId), (innerRes) ->
if (innerRes.statusCode != 200)
res.send(innerRes.statusCode)
innerRes.resume()
else
body = ''
innerRes.on('data', (bodyChunk) ->
body += bodyChunk
)
innerRes.on('end', () ->
res.render('page', {'title': req.params.contentId, 'content': JSON.parse(body)._source.html})
)
).on('error', (e) ->
console.log('Got error: ' + e.message) # the error is reported here
)
)
UPDATE:
After looking more into it, I understand now the root of the problem. I ran the command netstat -an | grep -e tcp -e udp | wc -l several times during my test runs, to see how many ports are used, as described in the post Linux: EADDRNOTAVAIL (Address not available) error. I could observe that at the time I received the EADDRNOTAVAIL-error, 56677 ports were used (instead of ~180 normally)
Also when using only 2 simultaneous scripts, the number of used ports is saturated at around 40,000 (+/- 2,000), that means ~20,000 ports are used per script (that is the time when node-js cleans up old ports before new ones are created) and for 3 scripts running it breaches over the 56677 ports (~60,000). This explains why it fails with 3 scripts requesting data, but not with 2.
So now my question changes to - how can I force node-js to free up the ports quicker or to reuse the same port all the time (would be the preferable solution)
Thanks
For now, my solution is setting the agent of my request options to false this should, according to the documentation
opts out of connection pooling with an Agent, defaults request to Connection: close.
as a result my number of used ports doesn't exceed 26,000 - this is still not a great solution, even more since I don't understand why reusing of ports doesn't work, but it solves the problem for now.

Resources