How to debug python beaver? - logstash

I have python beaver service running on my machine. Its configured along with logstash to push logs to kibana dashboard. For some reason beaver is not collecting logs of the services for the starting 15min. I want to debug beaver but not sure how to do so.
I tried running the command:
/usr/bin/beaver -c /etc/beaver/conf
The Output is get is
[2014-12-18 16:42:06,084] INFO Starting worker...
[2014-12-18 16:42:06,085] INFO Working...
[2014-12-18 16:42:06,092] INFO [fe01g1e15e8] - watching logfile <some-log-file>
[2014-12-18 16:42:06,092] INFO [fe01g1e15ed] - watching logfile <some-log-file>
[2014-12-18 16:42:06,093] INFO [fe01g14105c] - watching logfile <some-log-file>
[2014-12-18 16:42:06,193] INFO Starting queue consumer
The functionality is working just fine. But how do i debug on what happened to the first 15min? Also, there are no log files for beaver

I found out that the best method for debuging beaver is to work closely with the open source code.
First of all for debuging use the parameter --debug to get more log infromation and then look in the code for the parts that collects the data or look for the problematic error prints that you recieve from the debug messages. You can also add your own prints and temporarily replace the code on your server in order to understand were the problematic part might be.
From what I know, your problem can be in one of two parts, the first part consumes the data into an internal queue and the second part gets information from the queue and sends it using the selected transport method (in your case to logstash).
I already have a pull request waiting for approval that will include prints with the status of the number of messages in the queue and the number of messages transported (in debug mode) and I guess this can really help you understand what part is not working.


Structlog different ways to log: msg versus info and debug

I see different ways to use Structlog and I was wondering what the exact difference is.
Let's say I want to log something using Structlog, you could for example use:
logger.msg("My log message")
But there are other ways to log, like info, debug (as in the standard Python logging library) which give you the possibility to say something about the importance of a message (which you can filter using loglevel):"This is an info message")
logger.debug("This is a debug message")
The question is: what is the advantage of using logger.msg as compared to the other ways to log like info and debug? Why would I choose logger.msg?
msg() is a remnant from the original generic BoundLogger that tried to have both stdlib and Twisted log methods (msg() hailing from the Twisted end).
If you use structlog's internal filtering system via structlog.make_filtering_bound_logger(), it's equivalent to the info log level.
You can safely ignore it.

How can I prune executors' logs in spark streaming

I'm working on a spark streaming job which runs on standalone mode. The executors by default append the logs in $SPARK_HOME/work/app_idxxxx/stderr and stdout files. Now the problem comes when app runs for a long time say a month or more and it generates a lot of logs inside stderr file. I would like to rollup the stderr daily for a week and archive(delete) that after that. I changed the with org.apache.log4j.RollingFileAppender and directed the logs to a file instead of stderr but the file doesn't respect the rolling and keeps growing.
Creating a cron job to do that is also not working since spark has a pointer to that specific file and changing the name probably not working.
I could't find any documentations for these specific logs. I really appreciate for any help.
After digging more, I finally found how to resolve the issue and I post it here so that the next person don't go through all this suffer and trial/error.
The setting for those logs are in two different places. One in $SPARK_HOME/conf/spark-default.conf add these three lines below in each executor:
spark.executor.logs.rolling.time.interval daily
spark.executor.logs.rolling.strategy time
spark.executor.logs.rolling.maxRetainedFiles 7
The other file that you need to change in each executor is $SPARK_HOME/conf/ add the following line:
SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800
-Dspark.executor.logs.rolling.maxRetainedFiles=7 "
After these changes it started working properly. Hope this helps some people :)
if you are in standalone mode, just export an environment is enough:
export SPARK_WORKER_OPTS="-Dspark.executor.logs.rolling.strategy=time -Dspark.executor.logs.rolling.time.interval=daily -Dspark.executor.logs.rolling.maxRetainedFiles=7"
you can also refer to:

python script gets killed by test for stdout

I'm writing a CGI script that is supposed to send data to a user until they disconnect, then run logging tasks afterwards.
THE PROBLEM: Instead of break executing and the logging getting completed when the client disconnects (detected by inability to write to the stdout buffer), the script ends or is killed (I cannot find any logs anywhere for how this exit is occurring)
Here is a snippet of the code:
for block in r.iter_content(262144):
if stopRecord == True:
if not block:
if not sys.stdout.buffer.write(block): #The code fails here after a client disconnects
####write data to other logs and exit gracefully####
I have tried using "except:" as well as "except SystemExit:" but to no avail. Has anyone been able to solve this problem? (It is for a CGI script which is supposed to log when the client terminates their connection)
UPDATE: I have now tried using signal to interrupt the kill process in the script, which also didn't work. Where can I see an error log? I know exactly which line fails and under which conditions, but there is no error log or anything like I would get if I ran a script which failed in a terminal.
When you say it kills the program, you mean the main python process exits - and not by some thrown exception? That's kinda weird. A workaround might be to have the task run in a separate Thread or process, and then monitor that until it dies and subsequently execute the second task.

Ellipsis in remote rsyslog output

I have a rsyslog daemon running on a server receiving and aggregating messages from a number of other servers. Occasionally, I see line written which looks like the start of a message, an ellipsis(...), and the end of another, different, message.
It doesn't appear to have anything to do with the length of the message, as longer messages get through with out problems.
I have looked through the rsyslog doco without success and searching google for ... is not useful. Have I just missed something in the doco, or is this a bug?
The elipsis is actually coming from the log4j syslog appender implementation. If the line is "too long" it will be truncated and an elipsis will be written instead.

activemq start suppresses stdout/stderr

when using AMQ 5.6 and starting the broker using ./activemq start...where does the stdout/stderr go?
I expected it to show up in the /data/activemq.log file, but it doesn' there are way around this with a tweak to the log4j or JavaServiceWrapper config perhaps?
When I start in console mode using ./activemq console, the stdout/stderr messages are displayed as expected. In particular, I need to get output from e.printStackTrace() to show up in the logs when running in this mode.
it seems to just get redirected to /dev/null...I changed the /bin/activemq script to redirect to ../data/start.log instead and sure enough, the stdout/err are there...not sure why this isn't the default behavior to be honest...
When i remember correctly, there is another file called wrapper.log. look out for it in the same dir where wrapper.conf is.
