Which is a better approach in logging - files or DB? - multithreading

Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly.
It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day.
Should this log be made into a database residing on another server? Considerations:
The obvious disadvantage of multiple threads writing to the same log file is that the log messages are shuffled amongst each other. In the database, they can be grouped by batch id.
Performance - which would slow down the batch processing more? writing to a local file or sending log data to a database on another server on the same network. Theoretically, the log file is faster, but is there a gotcha here?
Are there any optimizations that can be done on either approach?
Thanks.

The interesting question, should you decide to log to the database, is where do you log database connection errors?
If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.

One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.
If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.
Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.

I second the other answers here, depends on what you are doing with the data.
We have two scenarios here:
The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.
We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.
I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.

Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:
Log parser is a powerful, versatile
tool that provides universal query
access to text-based data such as log
files, XML files and CSV files, as
well as key data sources on the
Windows® operating system such as the
Event Log, the Registry, the file
system, and Active Directory®. You
tell Log Parser what information you
need and how you want it processed.
The results of your query can be
custom-formatted in text based output,
or they can be persisted to more
specialty targets like SQL, SYSLOG, or
a chart. Most software is designed to
accomplish a limited number of
specific tasks. Log Parser is
different... the number of ways it can
be used is limited only by the needs
and imagination of the user. The
world is your database with Log
Parser.
I haven't used the program myself, but it seems quite interesting!

Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:
a poller that looks for error messages and posts them to your FogBugz account
a poller that looks for access violations ('x tried to access /foo/y/bar.html') to a 'hacking attempts' file
etc.

Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)

There are ways you can work around the limitations of file logging.
You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.
I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.

How about logging to database-file, say a SQLite database? I think it can handle multi-threaded writes - although that may also have its own performance overheads.

I think it depends greatly on what you are doing with the log files afterwards.
Of the two operations writing to the log file will be faster - especially as you are suggesting writing to a database on another server.
However if you are then trying to process and search the log files on a regular basis then the best place to do this would be a database.
If you use a logging framework like log4net they often provide simple config file based ways of redirecting input to file or database.

I like Gaius' answer. Put all the log statements in a threadsafe queue and then process them from there. For DB you could batch them up, say 100 log statements in one batch and for file you could just stream them into the file as they come into the queue.
File or Db? As many others say; it depends on what you need the log file for.

Related

What are the reasons to use a logging system/module/library?

I'm trying to evaluate the reasons to use a logging system like Winston in node.js vs just writing my own logging method. It seems like logging libraries don't really offer much.
Some logging systems (like log4j) have logging hierarchies where if you log to a.b.c it logs to a.b and a as well (unless you have other complicated stop-propogation configurations). Is this kind of stuff usually overkill? What situation would you need that for?
I'm considering just writing a logging function that writes logs to a mongo database, which I'll then be able to pretty easily query and search through. Presumably a logging library can do that, but it seems like it would be just as much work to use a library for that as to write it from scratch.
So in short: what are the benefits to using a logging system?
I don't know about log4j, and not too much about Winston; haven't used it for more than 3 minutes.
But here are the few advantages I'd like to see in a logging system:
Error levels
I must be able to specify the log level I'd like to write to. It's good to have some defaults also (warning, error, debug, etc).
Streaming
You are able to do everything you want when something gets logged: Write it to a file, write it to the database, etc. It's up to you.
Customization
I'd like to be able to:
Timestamped messages
Colored messages when writing to process.stdout (super important while developing!)
Possibility of prefixing the message with the level (for files), or with anything else (when launching various loggers within the same process). This is useful for differentiating between various levels/logger instances that write to the same stream.

Event logging with distributed database for node.js (MongoDB?)

I am looking for system or library for node.js, that can log information about client access on every remote server and automatically gather that information on central log server for later analysis. Remote server will have write only access, while central server will accumulate a lot of data to read.
I hope there is solution using distributed [NoSQL] database, like MongoDB.
However I have not found how to set it up.
For example I hope that cleaning old data can be initiated on central log server (when data has been processed) and entries on old dates can be removed on remote server with little overhead.
Currently we have logging into files and Hadoop system for log analysis.
But I think we need to accumulate data in database.
Winston, currently the best logging framework for node.js, has option to log into MongoDB or CouchDB.
Scribe could be what you're looking for. There are node packages too
I have never checked it out so I'd be interested in reading your thoughts in the comments if you investigate it and find it good/bad, easy/hard to setup, etc.
MongoDB or any other distributed databases will not solve problem.
In-house project must be created.
Some features of MongoDB for consideration:
Capped Collections are actually way to loose data. I may be good for short history.

Setup log4net appenders to create a file per context property or context stack level?

I am using log4net for logging calls to an API. Many calls. The methods I am calling have multiple megabytes of data for request/response pairs, and it is very hard to read logs that have multiple calls written to the same file, no matter what logging pattern I use. So, I feel the best approach is to log to multiple files.
I am having a hard time figuring out how to get log4net to do this, or if it even supports it.
From the Log4Net FAQ - Can the outputs of multiple client request go to different log files?
Many developers are confronted with the problem of distinguishing the log output originating from the same class but different client requests. They come up with ingenious mechanisms to fan out the log output to different files. In most cases, this is not the right approach.
It is simpler to use a context property or stack (ThreadContext) ... Thereafter, log output will automatically include the context data so that you can distinguish logs from different client requests even if they are output to the same file.
I looked at the documentation on Contexts and Context Properties. It seemed Event Context fit best, but I tried reading docs for other Contexts too. It seems they just allow me to put additional properties that end up in my log files, rather than being a component of a log file name, or allow me to automatically append to different files.
Is there a way to configure appenders to create different files for different context properties or context stack levels, etc?
Edit:
I am using log4net via Castle Windsor Logging facility, and I'm considering switching to NLog to solve this problem.
NLog seems to support this behavior by using the {logger} layout renderer in the File target's fileName property. I can effectively set this property by making a child logger with Windsor's ILogger.CreateChildLogger method, and setting {logger.shortName=True}.
See:
http://nlog-project.org/forum#nabble-td1685989
I'd still prefer to use log4net if possible, since the project I am testing uses it. Maybe my NLog example can give someone inspiration on how this could be done on log4net, and maybe they can help me figure it out :)
This article may be of interest to you: Log4Net: Programmatically specify multiple loggers (with multiple file appenders)
Also if you are only worried about readability there may be log file viewers that can seperate out log entries by thread name.
Another possibility you have is to log the entries in a database including your thread name and these entries are easily filtered using sql.

IIS 7 Logs Vs Custom

I want to log some information about my visitors. Is it better to use the IIS generated log or to create my own in an SQL 2008 db.
I know I should probably provide more information about my specific scenario, but I'd like just generally, pros and cons of either proposal.
You can add additional information to the IIS logs from ASP.NET using HttpResponse.AppendToLog, additionally you could use the Advanced Logging Module to create your own logs with custom filters and custom data including data from Performance Counters, and more.
It all depends on what information you want to analyse.
If you're doing aggregations and rollups then you'd want to pull this data into a database for analysis. Pulling your data into a database will give you access to indexes and better querying tools.
If you're doing infrequent one-off simple queries then LogParser might be sufficient for your needs. However you'll be constantly scanning unindexed flat files looking for data which is I/O intensive.
But as you say, without knowing more about your specific scenario it's hard to say what would be best.

Web site logs (IIS7, hosted)

I have an ASP.NET MVC app hosted at webhost4life
What's a good way to save logs?
I have an access to the ftp I upload site to, should I just do effectively
File.AppendAllText("log.txt", "Ooops, we have an error" + e.Message);
Or is there a better way? Send e-mail? save log into a database?
I always try to log to a database and fall back on a file if the database is inaccessible (perhaps that's the cause of the exception). This allows you to run queries and reporting against the log directly and find out what the problem is immediately. You can also run a "health check" against the application by storing critical excepions and marking them, etc.
Avoid writing to the file system; this can generate collisions/race conditions between threads that are attempting to write to the same file. Databases are wonderful solutions for this problem, and provide some nice benefits such as being able to generate reports easily from normalized data.
Also, what sort of information are you logging? The IIS logs are very detailed. Saving information that is already available in those logs duplicates work (the server writes its logs, and then you write your own), which of course incurs a performance hit.

Resources