Jenkins build logs are too large - linux

I have some jobs in Jenkins that create logs that are 300MB large, each.
The build's log gets created on my Solaris M6 machine.
Fact : I cannot pimp my job because of a business process, it must stay as it is.
My question : How to maintain such huge logs?
Is there any way that Jenkins perhaps knows how to ZIP the logs by
himself and then UNZIP it when my user tries to read the Console
Output (the logs itself via Jenkins) ?
Because if I zip the log manually on Solaris, it will no longer be readable via Jenkins.

I found out that Hudson (Jenkins) has a support for .gz format, which means you can gzip all the logs and Jenkins will still be able to read them, which is SUPER AWESOME ! That way I saved tons of GB's of storage, it shrinked my 600MB logs to just 2MB, great.

Related

Shipping logs from network share using Filebeat on Windows

The problem statement: I have an application running on Windows. I want to ship logs files from this application to ELK fronted by Kafka.
The challenge: This application writes a lot of process metadata to disk under a directory location. This information is important for the application's recovery and hence is stored on a network storage to support DR. The application also writes logs to the same directory location and we do not have the ability to separate the logs from the other process metadata. As a result logs are written to network share.
I want to ship the logs to Elastic. We typically use beats to do this. However, Filebeat does not recommending shipping logs from network storage on Windows. Ref: https://www.elastic.co/guide/en/beats/filebeat/7.11/filebeat-network-volumes.html. I have also read various git issues and SO posts where people have complained about Filbeat stopping harvesting on rollover.
Since this is a network share, I was also not able to create a symlink or a Junction link to trick my application to write the logs to the hard disk.
Has anyone solved this issue?
P.S.: I also read somewhere that logstash has better handling of files on network share. However, I do not need logstash and would like to avoid it if possile. Also, logstash official documentation mentions that reading files from NFS is only occasionally tested. It is not thoroughly tested.

How can I move catalina.out with log4j in Azure blob storage

How this can be achieve? I have a catalina.out log in a prod server which is growing fast in space. 6.7 GB in couple of days . I had the Idea at the begging to create a cronjob to be executed 2 or 3 days a week to run a script that copy catalina log to Azure blob storage and then wipe it out with just a command "echo "" > file". But moving 2 GB to azure every day that cron job executes donĀ“t know if is the best idea either. way too big file.
Is there a way that the logs is in another server/azure storage? Where should I configuer that?
I read something about implementing log4j with tomcat, is this possible also? that catalina.out using log4j move it to other server? Howcan I achieve this?. I know that development team should check also why is growing and logging so fast this file, but in the meantime I need a solution to implement.
thanks!!
I read something about implementing log4j with tomcat, is this
possible also?
I think what you want to describe is Log Rotation, if you want to use this way, here is a blog about how to configure it.
I had the Idea at the begging to create a cronjob to be executed 2 or
3 days a week to run a script that copy catalina log to Azure blob
storage
Yes, you could choose this way to manage log, however I still have something to say. If you want to upload the log file to Azure Blob, I think you may get error for the large file . You need split large file into multiple small file. In this article, under the title Upload a file in blocks programmatically, there is detailed description.
From you description, you are not using Azure Web, so if you choose Azure Web , you could also use Azure Functions or WebJobs to do the cronjob.
If you still have other questions, please let me know.

Stop Spark executor logs from getting gzipped

I have a Spark job with some very long running tasks. When the tasks start I can go to the executors tab and see all my executors and their tasks. I can click on the stderr link to see the logs for those tasks which helps a lot for monitoring. However, after a few hours the stderr link stops working. If you click on it you get java.lang.Exception: Cannot find this log on the local disk.. I dug into a bit and the issue seems to be that something has decided to gzip the logs. That is, I can still manually find the log by ssh-ing to the worker node and looking in the correct directory (e.g. /mnt/var/log/hadoop-yarn/containers/application_1486407288470_0005/container_1486407288470_0005_01_000002/stderr.gz). It's annoying that this happens since I now can't monitor my job from the UI. Also, the files are pretty tiny so the compression doesn't seem helpful (40k uncompressed). It seems like there's a lot of things that could be causing this to happen: yarn, a logroller cron job, the log4j config in my Yarn/Spark distro, AWS (since EMR zips logs and saves 'em to S3), etc. so I'm hoping someone can point me in the right direction so I don't have to search a ton of docs.
I'm using AWS EMR at emr-5.3.0 without any custom bootstrap steps.
Just had a similar issue. I haven't searched how to stop gzip from happening, but you can access the logs using the hadoop interface.
On the left menu, under Tools > Local logs
Then browse to find the log you are interested in.
For my case, the gzip from the gui at /node/containerlogs/container_1498033803655_0037_01_000001/hadoop/stderr.gz/?start=-4096
And using local logs menu, it was in
/logs/containers/application_1498033803655_0037/container_1498033803655_0037_01_000001/stderr.gz
Hope it helps

Moving files from multiple Linux servers to a central windows storage server

I have multiple Linux servers with limited storage space that create very big daily logs. I need to keep these logs but can't afford to keep them on my server for very long before it fills up. The plan is to move them to a central windows server that is mirrored.
I'm looking for suggestions on the best way to this. What I've considered so far are rsync and writing a script in python or something similar.
The ideal method of backup that I want is for the files to be copied from the Linux servers to the Windows server, then verified for size/integrity, and subsequently deleted from the Linux servers. Can rsync do that? If not, can anyone suggest a superior method?
You may want to look into using rsyslog on the linux servers to send logs elsewhere. I don't believe you can configure it to delete logged lines with a verification step - I'm not sure you'd want to either. Instead, you might be best off with an aggressive logrotate schedule + rsyslog.

syslog: does it remove the old logs if there would be less space on the storage

I am using syslog on an embedded Linux device (Debian-arm) that has a relatively smaller storage (~100 MB storage). If we assume the system will be up for 30 years and it logs all possible activities, would there be a case that the syslog fills up the storage memory? If it is the case, is syslog intelligent enough to remove old logs as there would be less space on the storage medium?
It completely depends how much stuff gets logged, but if you only have ~100MB, I would imagine that it's certainly likely that your storage will fill up before 30 years!
You didn't say which syslog server you're using. If you're on an embedded device you might be using the BusyBox syslogd, or you may be using the regular syslogd, or you may be using rsyslog. But in general, no syslog server rotates log files all by itself. They all depend on external scripts run from cron to do it. So you should make sure you have such scripts installed.
In non-embedded systems the log rotation functionality is often provided by a software package called logrotate, which is quite elaborate and has configuration files to say how and when which log files should be rotated. In embedded systems there is no standard at all. The most common configuration (especially when using BusyBox) is that logs are not written to disk at all, only to a memory ring buffer. The next most common configuration is idiosyncratic ad-hoc scripts built and installed by the embedded system integrator. So you just have to scan the crontabs and see if you can see anything that's configured to be invokes that looks like a log rotater.

Resources