Split Spark-Streaming Application Logs

Split Spark-Streaming Application Logs - apache-spark

Is there any way to split spark-streaming log file , so instead of writing in single file it can split into 50mb size.
I have added below configuration in /etc/spark/conf/log4j.properties.
# Setting for Spark Log Split
log4j.rootLogger=INFO, rolling,console
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/spark2-history
log4j.appender.rolling.encoding=UTF-8
log4j.logger.org.apache.spark=WARN
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.com.anjuke.dm=${dm.logging.level}
but its showing NofileFound " /spark2-history "
Also added below preperty in yarn.xml
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds=3600

Related

Write Spark logs to STDOUT instead of STDERR

I have a service that reads the logs from STDOUT for further analysis. It seems like there has been struggle with writing spark logs to STDOUT, by default, log4j sends any kind of log to STDERR.
Is there a way to change this behavior?
What changes need to be made specifically to move logs from STDERR to STDOUT?
Here's what my log4j file looks like:
log4j.rootLogger=INFO, FILE
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=stderr
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.conversionPattern=....
log4j.appender.FILE.MaxFileSize=5248997
log4j.appender.FILE.MaxBackupIndex=10
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

When you do a spark submit add 2>&1 at the the end. This means combine stderr(2) and stdout(1) into the stdout stream.
To do it through log4j.properties file, try adding the below properties.
# Log everything INFO and above to stdout
log4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=INFO
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=[%d] %-5p %.26c [%X{testName}] [%X{akkaSource}] - %m%n

extract gz files using hdfs/spark

I have large gzip files stored HDFS location
- /dataset1/sample.tar.gz --> contains 1.csv & 2.csv .... and so on
I would like extract
/dataset1/extracted/1.csv
/dataset1/extracted/2.csv
/dataset1/extracted/3.csv
.........................
.........................
/dataset1/extracted/1000.csv
Is there any hdfs commands that can be used to extract tar gz file (without copying to local machine) or use python/scala spark?
I tried using spark but since spark can not parallelize reading a gzipfile and the gzip file is very huge like 50GB.
I want to split the gzip and use those for spark aggregations.

Need LOG4J to drop .log from our file name when appending .date extension

Hello we have applications that write to a file called xfiles.log. Log4j does a daily rename to xfiles.log.date, for example xfiles.log.2009-04-12. I would like to remove the '.log' and just have a date extension. For example the new file would be xfiles.2009-04-12. Below is our current log4j setup. Thanks
log4j.rootCategory=INFO, DAILY
log4j.logger.org.springframework=WARN
log4j.appender.DAILY=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DAILY.file=${LOGDIR}/xfiles.log
log4j.appender.DAILY.datePattern='.'yyyy-MM-dd
log4j.appender.DAILY.append=true
log4j.appender.DAILY.layout=org.apache.log4j.PatternLayout
log4j.appender.DAILY.layout.ConversionPattern=%d{ISO8601} %-5p [%t] %c %m %n

log4j - why am I getting this ludicrous amount of DEBUG even if there's no log4j.properties?

I used to have the following file on /src but it didn't seem to be working. So I took it out (made sure to backup it first) and I am still getting a lot of debug info.
The file is:
# Log levels
# TRACE < DEBUG < INFO < WARN < ERROR < FATAL
log4j.rootLogger=ERROR
# Appender Configuration
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
# Pattern to output the caller's file name and line number
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n
# Rolling File Appender
log4j.appender.R=org.apache.log4j.RollingFileAppender
# Path and file name to store the log file
log4j.appender.rollingFile.File=/home/gtl/workspace/hrm_agent/log/log.out
log4j.appender.rollingFile.MaxFileSize=2MB
# Number of backup files
log4j.appender.R.MaxBackupIndex=10
# Layout for Rolling File Appender
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%
Am I doing something wrong?

Nixing screen output with log4j

One of the irritating things with log4j is that it always wants to dump stuff to the screen. I don't need that if I'm logging to a file. I'm sure it's in how I set up the log4j.properties file. Getting all of this configuration stuff ironed out is frustrating! :-)
For a program I'm currently calling Balancer, this is how I'm doing my logger initialization. Perhaps it is wrong or something.
static Logger log = Logger.getLogger(Balancer.class);
A partial dump of my log4j.properties:
log4j.rootLogger=fatal, stdout
log4j.logger.Balancer=fatal, rollingLog
# I still don't understand how category stuff works yet
log4j.category.Balancer=info, BalancerLog
#### First appender writes to console
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p %d [%t] (%F:%L) - %m%n
#### Second appender writes to a file
# Control the maximum log file size
# Archive log files (ten backups here)
log4j.appender.rollingLog=org.apache.log4j.RollingFileAppender
log4j.appender.rollingLog.File=default.log
log4j.appender.rollingLog.MaxFileSize=10000KB
log4j.appender.rollingLog.MaxBackupIndex=10
log4j.appender.rollingLog.layout=org.apache.log4j.PatternLayout
log4j.appender.rollingLog.layout.ConversionPattern=%5p %d [%t] (%F:%L) - %m%n
log4j.appender.BalancerLog=org.apache.log4j.RollingFileAppender
log4j.appender.BalancerLog.File=Balancer.log
log4j.appender.BalancerLog.MaxFileSize=100000KB
log4j.appender.BalancerLog.MaxBackupIndex=10
log4j.appender.BalancerLog.layout=org.apache.log4j.PatternLayout
log4j.appender.BalancerLog.layout.ConversionPattern=%5p %d [%t] (%F:%L) - %m%n
I get how the rootLogger sends stuff to the stdout appender. Is there a /dev/null appender? You have to have at least one appender.
Anyway, if nothing else, my basic work-around now is to send screen output to /dev/null. BTW, my Java programs run in a scheduled batch environment (no GUI). Having to clean up spooled files (yes this is on an AS/400) is a bit of a pain, although that can be automated as well.

Is there a /dev/null appender?
Yes.
log4j.appender.devnull=org.apache.log4j.varia.NullAppender
log4j.rootLogger=fatal, devnull
I'm not sure where you got log4j.category.* from but it's not something I've seen before, I would stick to just using appender and logger.
log4j.logger.Balancer=fatal, rollingLog, BalancerLog
would send fatal level messages for the logger named Balancer (with no package prefix) to both the rollingLog and BalancerLog appenders. If you change the logger level to info
log4j.logger.Balancer=info, rollingLog, BalancerLog
then it would send messages of level info and above to both appenders. You can't restrict it so that BalancerLog gets info and above but rollingLog gets only fatal messages on a per-logger basis, but you can set a threshold on the rollingLog appender so that it only records fatal messages (regardless of the logger they came from)
log4j.appender.rollingLog=org.apache.log4j.RollingFileAppender
log4j.appender.rollingLog.Threshold=fatal
log4j.appender.rollingLog.File=default.log
# other parameters as before

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split Spark-Streaming Application Logs - apache-spark

Related

Write Spark logs to STDOUT instead of STDERR

extract gz files using hdfs/spark

Need LOG4J to drop .log from our file name when appending .date extension

log4j - why am I getting this ludicrous amount of DEBUG even if there's no log4j.properties?

Nixing screen output with log4j

Categories

Resources