How to implement continuous logging with C# EventSource and ETW? - etw

We implement structured logging with System.Diagnostics.Tracing.EventSource, and use the inline provider manifests when collecting traces to avoid the installation headaches with EventRegister and wevtutil. We have designed our EventSources to log at a sufficiently low volume for continuous, persistent logging. I'm struggling to implement collection with the array of ETW controllers available from Microsoft. I want to define an ETW session that:
Sets a max file size for my ETL files
Rolls to a new file with incremented version / timestamp info as ETL files reach max size. I do not want circular logs that overwrite old events. We'll manage archive on our own.
Maintains manifest data from the beginning of the event stream in each new file - remember we're using inline provider manifests.
I got close using the following logman command (The provider guid specified is from a custom EventSource):
logman start "Session" -p "{55a51afc-22e0-5581-6db2-01d5bbe42500}" -mode newfile -max 1 -o .\test%d.etl -ets
Viewed in Perfview, the first ETL file generated looks great:
Each subsequent file looks like the following, presumably because the manifest data is lost:
Is there an option I can provide to logman to meet my 3rd requirement? Vance Morrison hinted on his blog that ETW supports a CaptureState command to re-send the manifest data:
The solution to the circular buffer case is to ask the EventSource to dump its manifest again at the end of the trace (where it will be sure to be in the window). This is just one example of a bunch of 'rundown' that is needed. ETW supports this with the 'CaptureState' command.
If logman can't do this, can one of the other ETW controllers meet all my requirements? I'm open to perfview, windows performance recorder, xperf, tracelog, or any other I've missed.

Related

Databricks autoloader methods: file notifications vs directory listing

When using databricks autoloader functionaliy, you can either choose file.notifications method or the default directory listing method.
I realise that the file.notifications is the superior method and that having azure functions and event grids identify file updates will mean that the solution will scale out "infinitely" and so on (I've also read that this is the recommended method in prod?). But I have a use case where the frequency of incoming files is quite small and am wondering if I can just use the directory listing method in Prod environment as well. Has anyone used directory listing method in autoloader in prod? Any thoughts/concerns? The idea at the moment is to start with directory listing method and then to switch to file notifications method as we start ingesting more files and more frequently.
Directory Listing Mode:
Auto Loader locates new files in directory listing mode by listing the input directory. With the exception of having access to your data on cloud storage, directory listing mode enables you to rapidly start Auto Loader streams.
Auto Loader can automatically determine whether files are arriving to your cloud storage with lexical ordering in Databricks Runtime 9.1 and later. This drastically reduces the number of API calls necessary to discover new files.
The hazards connected with directory listings have long been observed. Directory listings allow the Web user to see the majority (if not all) of the files in a directory, as well as any lower-level subdirectories, despite small information leakage. Directory listings offer far more open access than the "guided" experience of browsing a collection of prepared pages. Even a casual Web browser could gain access to files that shouldn't be made available to the public depending on a number of variables, including the rights of the files and directories as well as the server's settings for authorised files.
File Notification Mode:
In your cloud infrastructure account, file notification and queue services are used in file notification mode. A notification service and queue service that subscribe to file events from the input directory can be automatically set up by Auto Loader.
For big input directories or a high volume of files, file notification mode is more efficient and scalable, but it requires additional cloud permissions to set up.
For improved scalability and quicker performance as your directory grows, you might want to switch to the file notification mode.

Is it possible to use cf event as an input in logstash?

I'd like to get the following system: once an event occurs in Cloud Foundry, it is loaded to elasticsearch. Using logstash would be fine, but I explored its input plugin and couldn't find anything that I could use. What is the best solution for this scenario? At the moment I can think of writing a script that would continuously pull the data using CF api and load it to elasticsearch. Is there a better way of doing it?
I can think of two solutions:
Create a "drain" (e.g., via the drain CLI) for the app you
would like to see events for and drain it to your ELK deployment.
This should forward each event (formatted as rfc 5425 syslog) to
elastic search.
If you are using the Loggregator Firehose to write data into elastic
search (e.g., via firehose-to-syslog) then you will get events
(as log messages). This has the downside of everything ends up in
your ELK deployment.

syslog: does it remove the old logs if there would be less space on the storage

I am using syslog on an embedded Linux device (Debian-arm) that has a relatively smaller storage (~100 MB storage). If we assume the system will be up for 30 years and it logs all possible activities, would there be a case that the syslog fills up the storage memory? If it is the case, is syslog intelligent enough to remove old logs as there would be less space on the storage medium?
It completely depends how much stuff gets logged, but if you only have ~100MB, I would imagine that it's certainly likely that your storage will fill up before 30 years!
You didn't say which syslog server you're using. If you're on an embedded device you might be using the BusyBox syslogd, or you may be using the regular syslogd, or you may be using rsyslog. But in general, no syslog server rotates log files all by itself. They all depend on external scripts run from cron to do it. So you should make sure you have such scripts installed.
In non-embedded systems the log rotation functionality is often provided by a software package called logrotate, which is quite elaborate and has configuration files to say how and when which log files should be rotated. In embedded systems there is no standard at all. The most common configuration (especially when using BusyBox) is that logs are not written to disk at all, only to a memory ring buffer. The next most common configuration is idiosyncratic ad-hoc scripts built and installed by the embedded system integrator. So you just have to scan the crontabs and see if you can see anything that's configured to be invokes that looks like a log rotater.

AWS EC2 instance Application logs

I want to store logs of applications like uWSGI ("/var/log/uwsgi/uwsgi.log") on a device that can be accessed from
multiple instances and can save their logs to that particular device under their own instance name dir.
So does AWS provides any solution to do that....
There are a number of approaches you can take here. If you want to have an experience that is like writing directly to the filesystem, then you could look at using something like s3fs to mount a common S3 bucket to each of your instances. This would give you more or less a real-time log merge though honestly I would be concerned over the performance of such a set up in a high volume application.
You could process the logs at some regular interval to push the data to some common store. This would not be real time, but would likely be a pretty simple solution. The problem here is that it may be difficult to interleave your log entries from different servers if you need to have them arranged in time order.
Personally, I set up a Graylog server for each instance cluster I have, to which I log all my access logs, error logs, etc. It is UDP based, so it is fire and forget from the application servers' standpoint. It provides nice search/querying tools as well. Personally I like this approach as it removes log management from the application servers altogether.
Two options that I've used:
Use syslog (or Syslog-NG) to log to a centralized location. We do this to ship our AWS log data offsite to our datacenter. Syslog-NG is more reliable than plain ole' Syslog and allows us to use MongoDB as a backing store.
Use logrotate to push your logs to S3. It's not real-time like the Syslog solution, but it's a lot easier to set up and manage, especially if you have a lot of instances and aren't using a VPC
Loggly and Splunk Storm are also two interesting SaaS products intended to solve this problem.

Moving messages between destinations in a WebSphere Application Server ND 7 system

This relates to the management of messages in a WebSphere Application Server ND 7.0 system. We are looking for a robust tool for viewing / moving / deleting messages between (JMS) destinations. In particular we need advise about a tool that can help us efficiently manage large number of messages between destinations. Please advise.
Assuming that you are using SIBus (aka WebSphere platform default messaging) as JMS provider, you may want to have a look at IBM Service Integration Bus Destination Handler.
A small notice from my side. I got the tool (version 1.1.5) to work quickly (read, export to file, import from file, move), but I discovered that the tool is of limited use.
I could not find a setting that will export the full message body for message that is queued on a JMS subscription. When I export messages from a JMS subscription it only fetches 1024 bytes of data.
It did however export the full message on a regular queue destination. However when I put it back and exported it again, there were small differences as evidenced by Beyond Compare file comparison.
The tool looks promising with scripts that can be exported but it is necessary to test your use case scenario.
The tool could use a revision and better testing before being put on the internet.
Hope that helps.

Resources