STREAM keyword in pig script that runs in Amazon Mapreduce - amazon

I have a pig script, that activates another python program.
I was able to do so in my own hadoop environment, but I always fail when I run my script in Amazon map reduce WS.
The log say:
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: '' failed with exit status: 127
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:347)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:260)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:321)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)
Any Idea?

Have you made sure that the script is sent along to the Elastic MapReduce job?

Problem solved!
All I need is to use the cache('s3://') option when defining the streaming command

Related

Want to run ccloud command to create a topic using python

I want to run this command ccloud create topic mytopic
using python
You can create topics using actual Python Kafka clients, so you shouldn't need this command.
But, assuming you still wanted to use it, you'd use subprocess module to run exactly that string, as commented

AzCopy - what are return values?

I use AzCopy in a .CMD batch file and I need to know when it exited successfully or errored out. Specifically, when the transfer is interrupted I need to restart the command.
What are exit codes for AzCopy?
ok, it's not implemented yet but hopefully some day. Track this issue - https://github.com/Azure/azure-storage-azcopy/issues/289
For Azcopy V10(latest), AzCopy will print UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED strings for failures with the paths along with the error reason. You can find more details in the provided link within Github.

How to clear the consul in jmeter - using groovy

I have a question about the printing that I perform in Jmeter, I use
System.out.println and the consul is full with data.
the problem is that if I want in each iteration to clear the consul how can I do it?
the consul is full with data from last run, or Prev thread and I just want to clear it.
regards (it is the black consul that opened in a new window.)
You should log messages using log.info("Hello World"); (log is a script global variable) instead of using OUT in your scripts. Therefore, the logs will be sent to jmeter.log file.
See the documentation about JSR223 Sampler for more information.
For Windows you can use something like:
new ProcessBuilder('cmd', '/c', 'cls').inheritIO().start().waitFor()
For other operating system it will depend on SHELL implementation, you will need to amend the command line according to your requirements, most likely using clear command.
References:
ProcessBuilder documentation
Groovy is the New Black

Error While inserting rows into Kudu using Spark Shell

I am new to Apache Kudu, I installed it on my Ubuntu system and later created a table in it using Apache Spark shell. Now I am trying to insert data into that table using insertRows() for that I am using the but below given command,
kuduContext.insertRows(customersDF, "spark_kudu_tbl")
Where customersDF is a Data Frame and spark_kudu_tbl is a table in the Kudu data base. I am getting below error,
java.lang.NoSuchMethodError: org.apache.kudu.spark.kudu.KuduContext.insertRows(Lorg/apache/spark/sql/Dataset;Ljava/lang/String;)V
... 70 elided
I have tried different options but no one is giving results to me. Can any one give any solution for my question.
From the error message it appears as though you are using wrong kudu-spark artifact, you should use kudu-spark2_2. please start your spark-shell as below (replace the last bit with your kudu version)
spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.3.0

Printing spark command in yarn and cluster mode

I need to print some commands in spark yarn mode. Obviously println(message) doesn't work... I want to find a way to collect the message. Can someone point me to the current method for example using collect?
How to use collect?
Does the below code work?
val c=message.collect()
println (c)
You can achieve this using the below command:
message.foreach(println)
You will find output of above call in executor logs.

Resources