Error in submitting the es-injector.flux topology - stormcrawler

I have setup the stormcrawler project using this medium story https://medium.com/analytics-vidhya/web-scraping-and-indexing-with-stormcrawler-and-elasticsearch-a105cb9c02ca, but when I tried to submit the es-injector.flux, then I recevied this error:
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't find a suitable
constructor for class 'com.digitalpebble.stormcrawler.util.StringTabScheme' with
arguments '[DISCOVERED]'.
at org.apache.storm.flux.FluxBuilder.buildObject(FluxBuilder.java:358)
at org.apache.storm.flux.FluxBuilder.buildComponents(FluxBuilder.java:421)
at org.apache.storm.flux.FluxBuilder.buildTopology(FluxBuilder.java:101)
at org.apache.storm.flux.Flux.runCli(Flux.java:158)
at org.apache.storm.flux.Flux.main(Flux.java:103)
The command that I run is:
storm jar target/project-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local es-
injector.flux
Can someone please tell me what does it mean and how can I get rid of this error?

The latest ES tutorial is probably a better starting point, I'd recommend that you use it instead.

Related

While running UploadJars utility in OIM following exception is coming

While running UploadJars utility after giving all the parameters, utility is failing with following exception:
Logging configuration class "oracle.core.ojdl.logging.LoggingConfiguration" failed
java.lang.ClassNotFoundException: oracle.core.ojdl.logging.LoggingConfiguration
Error occurred in performing the operation:
Exception in thread "main" java.lang.NullPointerException
at oracle.iam.platformservice.utils.JarUploadUtility.main(JarUploadUtility.java:232)
Any help will be appreciated :)
We had the same problem.
We added
ORACLE_COMMON/modules/oracle.odl_11.1.1/ojdl.jar
(where ORACLE_COMMON is your oracle_common directory) to the start of our CLASSPATH and the error went away.
I think UploadJars.sh worked anyway, it was just writing out a problem with logging what it was doing.
btw, Note you might need to use UpdateJars.sh if an earlier version of the file already exists. You can check if the upload/update was successful by looking at the date in OIMHOME_JARS.UPDATED_ON.

Spark 1.4 image for Google Cloud?

With bdutil, the latest version of tarball I can find is on spark 1.3.1:
gs://spark-dist/spark-1.3.1-bin-hadoop2.6.tgz
There are a few new DataFrame features in Spark 1.4 that I want to use. Any chance the Spark 1.4 image be available for bdutil, or any workaround?
UPDATE:
Following the suggestion from Angus Davis, I downloaded and pointed to spark-1.4.1-bin-hadoop2.6.tgz, the deployment went well; however, run into error when calling SqlContext.parquetFile(). I cannot explain why this exception is possible, GoogleHadoopFileSystem should be a subclass of org.apache.hadoop.fs.FileSystem. Will continue investigate on this.
Caused by: java.lang.ClassCastException: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem cannot be cast to org.apache.hadoop.fs.FileSystem
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2595)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:112)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:144)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:504)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)
Asked a separate question about the exception here
UPDATE:
The error turned out to be a Spark defect; resolution/workaround provided in the above question.
Thanks!
Haiying
If a local workaround is acceptable, you can copy the spark-1.4.1-bin-hadoop2.6.tgz from an apache mirror into a bucket that you control. You can then edit extensions/spark/spark-env.sh and change SPARK_HADOOP2_TARBALL_URI='<your copy of spark 1.4.1>' (make certain that the service account running your VMs has permission to read the tarball).
Note that I haven't done any testing to see if Spark 1.4.1 works out of the box right now, but I'd be interested in hearing your experience if you decide to give it a go.

JXTA Practical jxta II examples ....exception when NetManager.startNetwork(); is executed

I have been trying to run examples of practical jxta ii but iam getting this exception
Exception in thread "main" java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory
this exception is encountered when this line is executed
PeerGroup TheNetPeerGroup = NetManager.startNetwork();
I have used jxse-2.6 to run the examples...i hve followed all the steps mentioned in the book...
Where can i find jxse-2.7.jar file ...couldnot find it...??
I am running the examples in netbeans and i have included the netty-3.1.5.GA.jar in the library
You need to add netty in your classpath. Get it from netty.io website.

Error running cassandra Word count example

I am tryin to run the cassandra word count example on eclipse. I have loaded all the requisite jar files. But i am still getting some errors, in fileCassandraDemonThread.java
TNonblockingServer.Args serverArgs = new TNonblockingServer.Args(serverTransport).inputTransportFactory(inTransportFactory)
.outputTransportFactory(outTransportFactory)
.inputProtocolFactory(tProtocolFactory)
.outputProtocolFactory(tProtocolFactory)
.processor(processor);
It throws the compilation error: TNonblockingServer.Args cannot be resolved to a type
Can somebody tell if i am missing any file to be linked?
Thanks for the help.
Sounds like you don't have lib/*.jar on your runtime classpath, or less likely you have an old Thrift jar somewhere else that's getting used instead of the right one.

TaskMemoryManager is disabled

i am trying to execute tasktracker on Cygwin but following error occur's as:-
mapred.TaskTracker: Process Tree implementation is missing on this system. TaskMemoryManager is disabled.
Rest all (i.e. Namenode,Secondarynamenode,Jobtracker and Datanode) working properly through cygwin but the issue is with the Tasktracker.I am hadoop version:hadoop-19.0.1
So,How I get rid of it.If anybody knows please help!.
Your Help will be appreciated!
I didn't encountered this specific problem but ...
Make sure that you are using the same hadoop version that it is in use on the cluster.
Update Hadoop to more recent version if possible.
The following patches may address (or maybe not) your problem:
https://issues.apache.org/jira/browse/HADOOP-6230
https://issues.apache.org/jira/browse/MAPREDUCE-834

Resources