The problem statement: I have an application running on Windows. I want to ship logs files from this application to ELK fronted by Kafka.
The challenge: This application writes a lot of process metadata to disk under a directory location. This information is important for the application's recovery and hence is stored on a network storage to support DR. The application also writes logs to the same directory location and we do not have the ability to separate the logs from the other process metadata. As a result logs are written to network share.
I want to ship the logs to Elastic. We typically use beats to do this. However, Filebeat does not recommending shipping logs from network storage on Windows. Ref: https://www.elastic.co/guide/en/beats/filebeat/7.11/filebeat-network-volumes.html. I have also read various git issues and SO posts where people have complained about Filbeat stopping harvesting on rollover.
Since this is a network share, I was also not able to create a symlink or a Junction link to trick my application to write the logs to the hard disk.
Has anyone solved this issue?
P.S.: I also read somewhere that logstash has better handling of files on network share. However, I do not need logstash and would like to avoid it if possile. Also, logstash official documentation mentions that reading files from NFS is only occasionally tested. It is not thoroughly tested.
Related
I need to build a log collection system.
I find out that common log collection schemes include elk and Hadoop / hive.
1、As a front-end developer, can I spend a certain amount of time (for example, one week) to complete simple construction without a service-side foundation?
2、Can I use nodejs, mongodb and other technology stacks to build a log system?
ELK already includes LogStash or Filebeat exactly for this purpose; you don't need Node to collect log files.
For Hadoop (and Elasticsearch), it uses Log4j, which offer different appenders to write to external locations and/or log files on disk. With files on disk, you'd use Filebeat (or FluentD) to process those files and send to external locations, such as Apache Kafka topics, which can be read into Hadoop and/or Elasticsearch (or Splunk) for analytics. Mongo generally isn't well suited for log analysis or plaintext log searching.
For handling logs from Node services, they can also write to disk using standard NPM libraries, then tools listed above would treat those the same as logs from other services.
I have a azure service fabric development cluster running locally with two applications.
After a two week holiday I come back and see that my hard drive is completely full, consequently nothing really works anymore.
the sfdevcluster\log\traces folder has many *.etl files all larger than 100MB.
And all kinds of other log files > 250 MB are present
So my questions: how to disable tracing/logging on azure service fabric and are there tools to administer log files?
The powerShell script file that does the cluster setup magic is:
Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup.ps1
Looking inside, there is a function called DeployNodeConfiguration which sets the logs and data path using the PowerShell command New-ServiceFabricNodeConfiguration. Unfortunately, It does not seem that there is a way to limit the size of those folders.
I believe that your slowness / freeze is due to insufficient space on the OS drive (happened to me too haha). A workaround can be to set the location of those folders to a non-OS drive with a limited amount of space.
Hope this helps
This turned out to be a bug in Service Fabric, upgrade your local cluster to the latest version 6.1.472.9494 which will fix the issue. more details here
I know that we can use the VM Depot to get started with the Neo4J in Azur but one thing that is not clear is where should we physically store the DB files. I tried to look around in the net if there are any recommendations on where the physical files would be stored so that then a VM crashes or restarts, the data is not lost.
can someone share their thoughts or point me to a address where some more details can be found on do and don'ts of Neo4j on Azure for a production environment.
Regards
Kiran
When you set up a Neo4j VM via VM Depot, that image, by default, configures the database files to reside within the same VM as the server itself. The location is specified in neo4j-server.properties. This lets you simply spin up the VM and start using Neo4j immediately.
However: You'll soon discover that your storage space is limited (I believe the VM instances are set up with a 127GB disk). To work with larger databases, you'll need to attach an additional disk (or disks), each disk up to 1TB in size. These disks, as well as the main VM disk, are backed by blob storage, meaning they're durable - persistent disks.
How you ultimately configure this is up to you, depending on the size of the database and its purpose. The only storage to avoid, if you need persistence, is the scratch disk provided (which is a locally-attached drive with no durability).
The documentation announcing that VM doesn't say. But when you install neo4j as a package on to other similar linux systems (the VM in question is a linux VM) then the data usually goes into /var/lib/neo4j/data. Here's an example:
user#host:/var/lib/neo4j/data$ pwd
/var/lib/neo4j/data
user#host:/var/lib/neo4j/data$ ls
graph.db keystore log neo4j-service.pid README.txt rrd
user#host:/var/lib/neo4j/data$ cat README.txt
Neo4j Data
=======================================
This directory contains all live data managed by this server, including
database files, logs, and other "live" files.
The main directory you really have to have is the "graph.db" directory. That's going to contain the bulk of the data. May as well back up the entirety of this directory. Some of the files (like the .pid file and the README.txt) of course aren't needed.
Now, there's no guarantee that in the VM that it's going to be /var/lib/neo4j/data but it's going to be something very similar. And what you're going to want is going to be a directory whose name ends in .db since that's the default for new neo4j databases.
To narrow down further, once you get that VM running, just run updatedb then locate *.db | grep neo4j and that's almost certain to find it quickly.
I am using syslog on an embedded Linux device (Debian-arm) that has a relatively smaller storage (~100 MB storage). If we assume the system will be up for 30 years and it logs all possible activities, would there be a case that the syslog fills up the storage memory? If it is the case, is syslog intelligent enough to remove old logs as there would be less space on the storage medium?
It completely depends how much stuff gets logged, but if you only have ~100MB, I would imagine that it's certainly likely that your storage will fill up before 30 years!
You didn't say which syslog server you're using. If you're on an embedded device you might be using the BusyBox syslogd, or you may be using the regular syslogd, or you may be using rsyslog. But in general, no syslog server rotates log files all by itself. They all depend on external scripts run from cron to do it. So you should make sure you have such scripts installed.
In non-embedded systems the log rotation functionality is often provided by a software package called logrotate, which is quite elaborate and has configuration files to say how and when which log files should be rotated. In embedded systems there is no standard at all. The most common configuration (especially when using BusyBox) is that logs are not written to disk at all, only to a memory ring buffer. The next most common configuration is idiosyncratic ad-hoc scripts built and installed by the embedded system integrator. So you just have to scan the crontabs and see if you can see anything that's configured to be invokes that looks like a log rotater.
I want to store logs of applications like uWSGI ("/var/log/uwsgi/uwsgi.log") on a device that can be accessed from
multiple instances and can save their logs to that particular device under their own instance name dir.
So does AWS provides any solution to do that....
There are a number of approaches you can take here. If you want to have an experience that is like writing directly to the filesystem, then you could look at using something like s3fs to mount a common S3 bucket to each of your instances. This would give you more or less a real-time log merge though honestly I would be concerned over the performance of such a set up in a high volume application.
You could process the logs at some regular interval to push the data to some common store. This would not be real time, but would likely be a pretty simple solution. The problem here is that it may be difficult to interleave your log entries from different servers if you need to have them arranged in time order.
Personally, I set up a Graylog server for each instance cluster I have, to which I log all my access logs, error logs, etc. It is UDP based, so it is fire and forget from the application servers' standpoint. It provides nice search/querying tools as well. Personally I like this approach as it removes log management from the application servers altogether.
Two options that I've used:
Use syslog (or Syslog-NG) to log to a centralized location. We do this to ship our AWS log data offsite to our datacenter. Syslog-NG is more reliable than plain ole' Syslog and allows us to use MongoDB as a backing store.
Use logrotate to push your logs to S3. It's not real-time like the Syslog solution, but it's a lot easier to set up and manage, especially if you have a lot of instances and aren't using a VPC
Loggly and Splunk Storm are also two interesting SaaS products intended to solve this problem.