I am using version 5.1 of Active Pivot, but plan to upgrade to 5.2. I would like to read data in using the CsvSource and receive realtime updates.
Introduction
This article explains few things on how to read data from Hadoop into Active Pivot. This was tested with Active Pivot 5.1 and 5.2.
In short, you have 2 ways to fill the gap:
Using a mounted HDFS, this makes your HDFS similar to a disk
Using Hadoop Java API
Using a mounted HDFS
You can mount your HDFS easily with certain Hadoop distributions. (Ex: mounting HDFS with Cloudera CDH 5 was easy to do.)
After doing so you will have a mount point on your Active Pivot server linked to your HDFS and it will behave like a common disk. (At least for reading, writing has some limitations)
For instance if you had csv files on your HDFS, you would be able to directly use Active Pivot Csv Source.
Using Hadoop Java API
Another way is to use Hadoop Java API: http://hadoop.apache.org/docs/current/api/
Few main classes to use:
org.apache.hadoop.fs.FileSystem - Used for common operations with Hadoop.
org.apache.hadoop.conf.Configuration - Used to configure the FileSystem object.
org.apache.hadoop.hdfs.client.HdfsAdmin - Can be use to watch events (Ex: new file added to HDFS)
Note: Watching for events is available for Hadoop 2.6.0 and higher.
For previous Hadoop you could either build your own or use a mounted HDFS with an existing FileWatcher.
Dependencies
You will need few Hadoop dependencies.
Beware there can be conflicts between Hadoop dependencies and Active Pivot ones on Jaxb.
In the following pom.xml, a solution to this was to exclude Jaxb dependencies from Hadoop dependencies.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>2.6.0</version>
</dependency>
<!-- These 2 dependencies have conflicts with ActivePivotCva on Jaxb -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
<exclusions>
<exclusion>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
</exclusion>
<exclusion>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
<exclusions>
<exclusion>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
</exclusion>
<exclusion>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
</exclusion>
</exclusions>
</dependency>
Properties
You will need to define at least 2 properties:
Hadoop address (Ex: hdfs://localhost:9000)
HDFS path to your files (Ex: /user/quartetfs/data/)
If your cluster is secured then you will need to figure out how to access it remotely in a secured way.
Example of reading a file from Hadoop
// Configuring
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem hdfs = FileSystem.get(this.conf);
Path filePath = new Path(/user/username/input/file.txt);
// Reading
BufferedReader bfr =new BufferedReader(new InputStreamReader(hdfs.open(filePath)));
String str = null;
while ((str = bfr.readLine()) != null)
{
System.out.println(str);
}
Hadoop Source
When you are able to read from your HDFS you can now write your Hadoop Source as you would for other sources.
For instance you could create a HadoopSource implementing ISource.
And you could start it in your SourceConfig where you would retrieve your properties from your environment.
Watching for events (Ex: new files)
If you want to retrieve files as they are stored on HDFS you can create another class watching for events.
An example would be the following code in which you would have your own methods handling certain events. (Ex in the following code: onCreation(), onAppend())
protected HdfsAdmin admin;
protected String threadName;
public void run()
{
DFSInotifyEventInputStream eventStream;
try
{
eventStream = admin.getInotifyEventStream();
LOGGER.info(" - Thread: " + this.threadName + "Starting catching events.");
while (true)
{
try
{
Event event = eventStream.take();
// Possible eventType: CREATE, APPEND, CLOSE, RENAME, METADATA, UNLINK
switch (event.getEventType())
{
case CREATE:
CreateEvent createEvent = (CreateEvent) event;
onCreation(createEvent.getPath());
break;
case APPEND:
AppendEvent appendEvent = (AppendEvent) event;
onAppend(appendEvent.getPath());
break;
default:
break;
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (MissingEventsException e) {
e.printStackTrace();
}
}
} catch (IOException e1) {
LOGGER.severe(" - Thread: " + this.threadName + "Failure to start the eventStream");
e1.printStackTrace();
}
}
What I did for my onCreation method (not shown) was to store newly created files into a concurrent queue so my HadoopSource could retrieve several file in parallel.
-
If I have not been clear enough on certain aspects or if you have questions feel free to ask.
Related
I'm using:
Quarkus 1.6.1.Final
Vertx 3.9.1 (provided by quarkus-vertx dependency, see pom.xml below)
And I can't get the clusered Eventbus working. I've followed the instructions listed here:
https://vertx.io/docs/vertx-hazelcast/java/
I've also enabled clustering in Quarkus:
quarkus.vertx.cluster.clustered=true
quarkus.vertx.cluster.port=8081
quarkus.vertx.prefer-native-transport=true
quarkus.http.port=8080
And here is my pom.xml:
<dependencies>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-resteasy</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-resteasy-mutiny</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-vertx</artifactId>
</dependency>
<dependency>
<groupId>io.vertx</groupId>
<artifactId>vertx-hazelcast</artifactId>
<version>3.9.2</version>
<exclusions>
<exclusion>
<groupId>io.vertx</groupId>
<artifactId>vertx-core</artifactId>
</exclusion>
<!-- <exclusion>-->
<!-- <groupId>com.hazelcast</groupId>-->
<!-- <artifactId>hazelcast</artifactId>-->
<!-- </exclusion>-->
</exclusions>
</dependency>
<!-- <dependency>-->
<!-- <groupId>com.hazelcast</groupId>-->
<!-- <artifactId>hazelcast-all</artifactId>-->
<!-- <version>3.9</version>-->
<!-- </dependency>-->
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-transport-native-epoll</artifactId>
<classifier>linux-x86_64</classifier>
</dependency>
</dependencies>
And the error I get is the following:
Caused by: java.lang.ClassNotFoundException: com.hazelcast.core.MembershipListener
As you can see in my pom.xml, I've also added the dependency hazelcast-all:3.9 and excluded the hazelcast dependency from vertx-hazelcast:3.9.2, then this error disappears but another comes up:
Caused by: com.hazelcast.config.InvalidConfigurationException: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://www.hazelcast.com/schema/config":memcache-protocol}'. One of '{"http://www.hazelcast.com/schema/config":public-address, "http://www.hazelcast.com/schema/config":reuse-address, "http://www.hazelcast.com/schema/config":outbound-ports, "http://www.hazelcast.com/schema/config":join, "http://www.hazelcast.com/schema/config":interfaces, "http://www.hazelcast.com/schema/config":ssl, "http://www.hazelcast.com/schema/config":socket-interceptor, "http://www.hazelcast.com/schema/config":symmetric-encryption, "http://www.hazelcast.com/schema/config":member-address-provider}' is expected.
Am I doing something wrong or forgetting something, or is this simply a bug in Quarkus or in Vertx ?
Thx in advance for any help.
I think the most probable reason of your issue is that you are using the quarkus-universe-bom which enforces a version of Hazelcast (we have an Hazelcast extension there) which is not compatible with vertx-hazelcast.
Check your dependency tree with mvn dependency:tree and make sure the Hazelcast artifacts are of the version required by vertx-hazelcast.
Another option would be to simply use the quarkus-bom which does not enforce an Hazelcast version and let vertx-hazelcast drag the dependency by itself.
It seems like a bug in Quarkus and this issue is related to:
https://github.com/quarkusio/quarkus/issues/10889
Bringing this from its winter sleep...
I am looking to use quarkus 2 + vert.x 4 and use either the shared data vert.x api or vert.x cluster manager in order to achieve an in-process, distributed cache (as opposed to an external dist. cache cluster)
What's unclear to me, also by looking at the git issue described above (thats still open), is if I can count on these APIs working for me at this time with the versions I mentioned.
Any comments will be great!
Thanks in advance...
[UPDATE]: looks like the clustered cache works with no issues using the shared data API along with quarkus, vertx, hazlecast & mutiny bindings for vertx (all with latest versions).
all I needed to do is set quarkus.vertx.cluster.clustered=true in quarkus properties file, use vertx.sharedData().getClusterWideMap implementation for the distrubuted cache and add gradle/maven 'io.vertx:vertx-hazelcast:4.3.1' support.
in general, thats all it took for a small poc code.
thanks
I am connecting with Hive using Spark 2.x and I am running following Spark Query:
spark.sql("""DROP TABLE IF EXISTS db_name.table_name""")
spark.sql("""Create TABLE IF NOT EXISTS db_name.table_name""")
if the table doesn't exist then the first query gives exception of Table Does not exist.
If Table exist and I am running the second query in the first place then it throws Table already exists exception.
Which Means the condition IF EXISTS and IF NOT EXISTS is not working.
I read it somewhere that there may be data nucleus dependency issue. so Below are the dependencies I am using for Data nucleus:
<dependency>
<groupId>org.datanucleus</groupId>
<artifactId>datanucleus-rdbms</artifactId>
<version>3.2.9</version>
</dependency>
<dependency>
<groupId>org.datanucleus</groupId>
<artifactId>datanucleus-core</artifactId>
<version>3.2.10</version>
</dependency>
<dependency>
<groupId>org.datanucleus</groupId>
<artifactId>datanucleus-api-jdo</artifactId>
<version>3.2.6</version>
</dependency>
The exact Exception is as follows
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal]
These are the versions of Software I am using
Spark 1.5
Datastax-cassandra 3.2.1
CDH 5.5.1
The code I am trying to execute is a Spark program using the java api and it basically reads data (csv's) from hdfs and loads it into cassandra tables . I am using the spark-cassandra-connector. I had a lot of issues regarding the google s guava library conflict initially which I was able to resolve by shading the guava library and building a snap-shot jar with all the dependencies.
However I was able to load data for some files but for some files I get the Codec Exception . When I researched on this issue I got these following threads on the same issue.
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/yZyaOQ-wazk
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/yZyaOQ-wazk
After going through these discussion what I understand is either it is a wrong version of the cassandra-driver I am using . Or there is still a class path issue related to the guava library as cassandra 3.0 and later versions use guava 16.0.1 and the discussions above say that there might be a lower version of the guava present in the class path .
Here is pom.xml file
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-clientutil</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>com.pointcross.shaded.google</shadedPattern>
</relocation>
</relocations>
<minimizeJar>false</minimizeJar>
<shadedArtifactAttached>true</shadedArtifactAttached>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
and these are the dependencies that were downloaded using the above pom
spark-core_2.10-1.5.0.jar
spark-cassandra-connector- java_2.10-1.5.0-M3.jar
spark-cassandra-connector_2.10-1.5.0-M3.jar
spark-repl_2.10-1.5.1.jar
spark-bagel_2.10-1.5.1.jar
spark-mllib_2.10-1.5.1.jar
spark-streaming_2.10-1.5.1.jar
spark-graphx_2.10-1.5.1.jar
guava-16.0.1.jar
cassandra-clientutil-3.2.1.jar
cassandra-driver-core-3.0.0-alpha4.jar
Above are some of the main dependencies on in my snap-shot jar.
Y is the CodecNotFoundException ? Is it because of the class path (guava) ? or cassandra-driver (cassandra-driver-core-3.0.0-alpha4.jar for datastax cassandra 3.2.1) or because of the code .
Another point is all the dates I am inserting to columns who's data type is timestamp .
Also when I do a spark-submit I see the class path in the logs , There are other guava versions which are under the hadoop libs . R these causing the problem ?
How do we specify the a user-specific class path while do a spark-submit. Will that help ?
Would be glad to get some points on these.
Thanks
Following is the stacktrace
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [timestamp <-> java.lang.String]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:689)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:550)
at com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:530)
at com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:485)
at com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:85)
at com.datastax.driver.core.BoundStatement.bind(BoundStatement.java:198)
at com.datastax.driver.core.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:126)
at com.cassandra.test.LoadDataToCassandra$1.call(LoadDataToCassandra.java:223)
at com.cassandra.test.LoadDataToCassandra$1.call(LoadDataToCassandra.java:1)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I also got
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [Math.BigDecimal <-> java.lang.String]
When you call bind(params...) on a PreparedStatement the driver expects you to provide values w/ java types that map to the cql types.
This error ([timestamp <-> java.lang.String]) is telling you that there is no such Codec registered that maps the java String to a cql timestamp. In the java driver, the timestamp type maps to java.util.Date. So you have 2 options here:
Where the column being bound is for a timestamp, provide a Date-typed value instead of a String.
Create a codec that maps timestamp <-> String. To do so you could create sub class of MappingCodec as described on the documentation site, that maps String to timestamp:
public class TimestampAsStringCodec extends MappingCodec<String, Date> {
public TimestampAsStringCodec() { super(TypeCodec.timestamp(), String.class); }
#Override
protected Date serialize(String value) { ... }
#Override
protected String deserialize(Date value) { ... }
}
You then would need to register the Codec:
cluster.getConfiguration().getCodecRegistry()
.register(new TimestampAsStringCodec());
Better solution is provided here
The correct mappings that the driver offers out of the box for temporal types are:
DATE <-> com.datastax.driver.core.LocalDate : use getDate()
I am trying to execute hive query using spark 1.5.1 in standalone mode and hive 1.2.0 jdbc version.
Here is my piece of code:
private static final String HIVE_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static final String HIVE_CONNECTION_URL = "jdbc:hive2://localhost:10000/idw";
private static final SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
.set("spark.cores.max", "1").set("spark.executor.memory", "512m");
private static final JavaSparkContext sc = new JavaSparkContext(sparkconf);
private static final SQLContext sqlContext = new SQLContext(sc);
public static void main(String[] args) {
//Data source options
Map<String, String> options = new HashMap<String, String>();
options.put("driver", HIVE_DRIVER);
options.put("url", HIVE_CONNECTION_URL);
options.put("dbtable", "(select * from idw.emp) as employees_name");
DataFrame jdbcDF = sqlContext.read().format("jdbc").options(options).load();
}
I am getting below error at sqlContext.read().format("jdbc").options(options).load();
Exception in thread "main" java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:135)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
I am running spark 1.5.1 in standalone mode
Hadoop version is 2.6
Hive version is 1.2.0
Here is the dependency that I have added in java project in pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.0</version>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.0</version>
</dependency>
Can anyone help me out in this?
If somebody has used spark 1.5.1 with hive jdbc, then can you please tell me the compatible version of hive for spark 1.5.1.
Thank you in advance..!
As far as I can tell, you're unfortunately out of luck in terms of using the jdbc connector until it's fixed upstream; the "Method not supported" in this case is not just a version mismatch, but is explicitly not implemented in the hive jdbc library branch-1.2 and even if you look at the hive jdbc master branch or branch-2.0 it's still not implemented:
public boolean isSigned(int column) throws SQLException {
throw new SQLException("Method not supported");
}
Looking at the Spark callsite, isSigned is called during resolveTable in Spark 1.5 as well as at master.
That said, most likely the real reason this "issue" remains is that when interactive with Hive, you're expected to connect to the Hive metastore directly rather than needing to mess around with jdbc connectors; see the Hive Tables in Spark documentation for how to do this. Essentially, you want to think of Spark as an equal/replacement of Hive rather than being a consumer of Hive.
This way, pretty much all you do is add hive-site.xml to your Spark's conf/ directory and make sure the datanucleus jars under lib_managed/jars are available to all Spark executors, and then Spark talks directly to the Hive metastore for schema info and fetches data directly from your HDFS in a way amenable to nicely parallelized RDDs.
I work in a project that uses Log4J. One of the requirement is to create a separate log file for each thread; this itself was a odd issue, somewhat sorted by creating a new FileAppender on the fly and attaching it to the Logger instance.
Logger logger = Logger.getLogger(<thread dependent string>);
FileAppender appender = new FileAppender();
appender.setFile(fileName);
appender.setLayout(new PatternLayout(lp.getPattern()));
appender.setName(<thread dependent string>);
appender.setThreshold(Level.DEBUG);
appender.activateOptions();
logger.addAppender(appender);
Everything went fine until we realised that another library we use - Spring Framework v3.0.0 (which use Commons Logging) - does not play ball with the technique above – the Spring logging data is “seen” only by Appenders initialised from the log4.configuration file but not by the runtime created Appenders.
So, back to square one.
After some investigation, I found out that the new and improved LogBack has an appender - SiftingAppender – which does exactly what we need i.e. thread level logging on independent files.
At the moment, moving to LogBack is not an option, so, being stuck with Log4J, how can I achieve SiftingAppender-like functionality and keep Spring happy as well ?
Note: Spring is only used for JdbcTemplate functionality, no IOC; in order to “hook” Spring’s Commons Logging to Log4J I added this line in the log4j.properties file:
log4j.logger.org.springframework=DEBUG
as instructed here.
In Log4j2, we can now use RoutingAppender:
The RoutingAppender evaluates LogEvents and then routes them to a subordinate Appender. The target Appender may be an appender previously configured and may be referenced by its name or the Appender can be dynamically created as needed.
From their FAQ:
How do I dynamically write to separate log files?
Look at the RoutingAppender. You can define multiple routes in the configuration, and put values in the ThreadContext map that determine which log file subsequent events in this thread get logged to.
LogBack is accessed via the slf4j api. There is an adapter library called jcl-over-sjf4j which exposes the commons logging interface but makes all the logging to the slf4j API, which goes directly to the implementation - LogBack. If you are using maven, here are the dependencies:
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.5.8</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>1.5.8</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>0.9.18</version>
</dependency>
(and add the commons-logging to the exclusion list, see here)
I struggled for a while to find SiftingAppender-like functionality in log4j (we couldn't switch to logback because of some dependencies), and ended up with a programmatic solution that works pretty well, using an MDC and appending loggers at runtime:
// this can be any thread-specific string
String processID = request.getProcessID();
Logger logger = Logger.getRootLogger();
// append a new file logger if no logger exists for this tag
if(logger.getAppender(processID) == null){
try{
String pattern = "%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n";
String logfile = "log/"+processID+".log";
FileAppender fileAppender = new FileAppender(
new PatternLayout(pattern), logfile, true);
fileAppender.setName(processID);
// add a filter so we can ignore any logs from other threads
fileAppender.addFilter(new ProcessIDFilter(processID));
logger.addAppender(fileAppender);
}catch(Exception e){
throw new RuntimeException(e);
}
}
// tag all child threads with this process-id so we can separate out log output
MDC.put("process-id", processID);
//whatever you want to do in the thread
LOG.info("This message will only end up in "+processID+".log!");
MDC.remove("process-id");
The filter appended above just checks for a specific process id:
public class RunIdFilter extends Filter {
private final String runId;
public RunIdFilter(String runId) {
this.runId = runId;
}
#Override
public int decide(LoggingEvent event) {
Object mdc = event.getMDC("run-id");
if (runId.equals(mdc)) {
return Filter.ACCEPT;
}
return Filter.DENY;
}
}
Hope this helps a bit.
I like to include all of the slf4j facades/re-routers/whateveryoucallthem. Also note the "provided" hack, which keeps dependencies from pulling in commons logging; previously I was using a fake empty commons logging library called version-99.0-does-not-exist.
Also see http://blog.springsource.com/2009/12/04/logging-dependencies-in-spring/
<dependencies>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<!-- use provided scope on real JCL instead -->
<!-- <version>99.0-does-not-exist</version> -->
<version>1.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging-api</artifactId>
<!-- use provided scope on real JCL instead -->
<!-- <version>99.0-does-not-exist</version> -->
<version>1.1</version>
<scope>provided</scope>
</dependency>
<!-- the slf4j commons-logging replacement -->
<!-- if any package is using jakarta commons logging this will -->
<!-- re-route it through slf4j. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>${version.slf4j}</version>
</dependency>
<!-- the slf4j log4j replacement. -->
<!-- if any package is using log4j this will re-route -->
<!-- it through slf4j. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>${version.slf4j}</version>
</dependency>
<!-- the slf4j java.util.logging replacement. -->
<!-- if any package is using java.util.logging this will re-route -->
<!-- it through slf4j. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
<version>${version.slf4j}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${version.slf4j}</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>${version.logback}</version>
</dependency>
</dependencies>
<properties>
<version.logback>0.9.15</version.logback>
<version.slf4j>1.5.8</version.slf4j>
</properties>
have you looked at log4j.NDC and MDC? This at least allows you to tag thread specific data to your logging. Not exactly what you're asking, but might be useful. There's a discussion here.