print spark streaming not working

print spark streaming not working - apache-spark

I have written a simple spark streaming receiver but I have trouble in processing the stream..The data is received but its not processed by spark streaming.
public class JavaCustomReceiver extends Receiver<String> {
private static final Pattern SPACE = Pattern.compile(" ");
public static void main(String[] args) throws Exception {
// if (args.length < 2) {
// System.err.println("Usage: JavaCustomReceiver <hostname> <port>");
// System.exit(1);
// }
// StreamingExamples.setStreamingLogLevels();
LogManager.getRootLogger().setLevel(Level.WARN);
Log log = LogFactory.getLog("EXECUTOR-LOG:");
// Create the context with a 1 second batch size
SparkConf sparkConf = new SparkConf().setAppName("JavaCustomReceiver").setMaster("local[4]");
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));
// Create an input stream with the custom receiver on target ip:port and count the
// words in input stream of \n delimited text (eg. generated by 'nc')
JavaReceiverInputDStream<String> lines = ssc.receiverStream(
new JavaCustomReceiver("localhost", 9999));
// JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(""))).iterator();
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(" ")).iterator());
words.foreachRDD( x-> {
x.collect().stream().forEach(n-> System.out.println("item of list: "+n));
});
words.foreachRDD( rdd -> {
if (!rdd.isEmpty()) System.out.println("its empty"); });
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
.reduceByKey((i1, i2) -> i1 + i2);
wordCounts.count();
System.out.println("WordCounts == " + wordCounts);
wordCounts.print();
log.warn("This is a test message");
log.warn(wordCounts.count());
ssc.start();
ssc.awaitTermination();
}
// ============= Receiver code that receives data over a socket
// ==============
String host = null;
int port = -1;
public JavaCustomReceiver(String host_, int port_) {
super(StorageLevel.MEMORY_AND_DISK_2());
host = host_;
port = port_;
}
#Override
public void onStart() {
// Start the thread that receives data over a connection
new Thread(this::receive).start();
}
#Override
public void onStop() {
// There is nothing much to do as the thread calling receive()
// is designed to stop by itself isStopped() returns false
}
/** Create a socket connection and receive data until receiver is stopped */
private void receive() {
try {
Socket socket = null;
BufferedReader reader = null;
try {
// connect to the server
socket = new Socket(host, port);
reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
// Until stopped or connection broken continue reading
String userInput;
while (!isStopped() && (userInput = reader.readLine()) != null) {
System.out.println("Received data '" + userInput + "'");
store(userInput);
}
} finally {
Closeables.close(reader, /* swallowIOException = */ true);
Closeables.close(socket, /* swallowIOException = */ true);
}
// Restart in an attempt to connect again when server is active
// again
restart("Trying to connect again");
} catch (ConnectException ce) {
// restart if could not connect to server
restart("Could not connect", ce);
} catch (Throwable t) {
restart("Error receiving data", t);
}
}
}
Here is my logs - You can see the testdata being displayed but after that I dont see the contents of the result being printed at all..
I have set the master to local[2]/local[4] but nothing works.
Received data 'testdata'
17/10/04 11:43:14 INFO MemoryStore: Block input-0-1507131793800 stored as values in memory (estimated size 80.0 B, free 912.1 MB)
17/10/04 11:43:14 INFO BlockManagerInfo: Added input-0-1507131793800 in memory on 10.99.1.116:50088 (size: 80.0 B, free: 912.2 MB)
17/10/04 11:43:14 WARN BlockManager: Block input-0-1507131793800 replicated to only 0 peer(s) instead of 1 peers
17/10/04 11:43:14 INFO BlockGenerator: Pushed block input-0-1507131793800
17/10/04 11:43:20 INFO JobScheduler: Added jobs for time 1507131800000 ms
17/10/04 11:43:20 INFO JobScheduler: Starting job streaming job 1507131800000 ms.0 from job set of time 1507131800000 ms
17/10/04 11:43:20 INFO SparkContext: Starting job: collect at JavaCustomReceiver.java:61
17/10/04 11:43:20 INFO DAGScheduler: Got job 44 (collect at JavaCustomReceiver.java:61) with 1 output partitions
17/10/04 11:43:20 INFO DAGScheduler: Final stage: ResultStage 59 (collect at JavaCustomReceiver.java:61)
17/10/04 11:43:20 INFO DAGScheduler: Parents of final stage: List()
17/10/04 11:43:20 INFO DAGScheduler: Missing parents: List()
17/10/04 11:43:20 INFO DAGScheduler: Submitting ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59), which has no missing parents
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32 stored as values in memory (estimated size 2.7 KB, free 912.1 MB)
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 1735.0 B, free 912.1 MB)
17/10/04 11:43:20 INFO BlockManagerInfo: Added broadcast_32_piece0 in memory on 10.99.1.116:50088 (size: 1735.0 B, free: 912.2 MB)
17/10/04 11:43:20 INFO SparkContext: Created broadcast 32 from broadcast at DAGScheduler.scala:1012
17/10/04 11:43:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59)
17/10/04 11:43:20 INFO TaskSchedulerImpl: Adding task set 59.0 with 1 tasks
17/10/04 11:43:20 INFO TaskSetManager: Starting task 0.0 in stage 59.0 (TID 60, localhost, partition 0, ANY, 5681 bytes)
17/10/04 11:43:20 INFO Executor: Running task 0.0 in stage 59.0 (TID 60)

Found the answer .
Why doesn't the Flink SocketTextStreamWordCount work?
I changed my program to save the output on the text file and it was saving the streaming data perfectly.
Thanks
Adi

Related

Sending byte[] array to Host using Chrome Native Messaging

I am trying to send a JSON object that contains a byte[] to a "Console Application" using a Chrome Extention by NativeMessaging.
I have no problems with some of the calls, but I get a problem when I try to send a byte array.
Behavior is following:
ChromeExtention creates a JSON object that contains a list and a byte[].
NativeApp receives it but remains somehow blocked.
reader.Read(buffer, 0, buffer.Length);
I will post here a version of code that I hope it helps.
Native App.
Program.cs
public static void Main(string[] args)
{
m_Logger = LogManager.Setup().GetCurrentClassLogger();
m_Logger.Info("App Started!");
try
{
JObject command = Read();
if (command != null)
{
var ceCommand = command.ToObject<CommandModel>();
if (ceCommand == null)
{
m_Logger.Warn("Could not deserialize command from extention");
return;
}
//do other things
}
}
}
catch (Exception ex)
{
throw;
}
}
public static JObject Read()
{
try
{
m_Logger.Debug("Read started");
var stdin = Console.OpenStandardInput();
m_Logger.Debug("StDin initialized");
var length = 0;
var lengthBytes = new byte[4];
stdin.Read(lengthBytes, 0, 4);
length = BitConverter.ToInt32(lengthBytes, 0);
m_Logger.Debug($"Message length: {length}");
var buffer = new char[length];
m_Logger.Debug($"Buffer lenght: {buffer.Length}");
using (var reader = new StreamReader(stdin))
{
m_Logger.Debug("Reading from stream...");
while (reader.Peek() >= 0)
{
m_Logger.Debug("We are in reading process...please wait");
int result = reader.Read(buffer, 0, buffer.Length);
m_Logger.Debug($"Read {result} numer of chars");
}
m_Logger.Debug("Read finished");
}
var stringMessage = new string(buffer);
m_Logger.Info($"Recieved from CE {stringMessage}");
return JsonConvert.DeserializeObject<JObject>(stringMessage);
}
catch (Exception ex)
{
m_Logger.Error(ex, $"Error at reading input data");
return null;
}
}
public static void Write(JToken data)
{
var json = new JObject();
json["data"] = data;
var bytes = System.Text.Encoding.UTF8.GetBytes(json.ToString(Formatting.None));
var stdout = Console.OpenStandardOutput();
stdout.WriteByte((byte)((bytes.Length >> 0) & 0xFF));
stdout.WriteByte((byte)((bytes.Length >> 8) & 0xFF));
stdout.WriteByte((byte)((bytes.Length >> 16) & 0xFF));
stdout.WriteByte((byte)((bytes.Length >> 24) & 0xFF));
stdout.Write(bytes, 0, bytes.Length);
stdout.Flush();
}
/// <summary>
/// Copies the contents of input to output. Doesn't close either stream.
/// </summary>
public static string ByteToString(byte[] array, int count)
{
char[] chars = Encoding.UTF8.GetChars(array, 0, count);
return new string(chars);
}
I deducted that the problem is at reader.Read(buffer, 0, buffer.Length); based on logs that look like this
2022-11-17 16:34:14.9604 INFO App Started!
2022-11-17 16:34:15.0837 DEBUG Read started
2022-11-17 16:34:15.0837 DEBUG StDin initialized
2022-11-17 16:34:15.0837 DEBUG Message length: 1862
2022-11-17 16:34:15.0837 DEBUG Buffer lenght: 1862
2022-11-17 16:34:15.0837 DEBUG Reading from stream...
2022-11-17 16:34:15.0837 DEBUG We are in reading process...please wait
2022-11-17 16:34:15.0837 DEBUG Read 1862 numer of chars
2022-11-17 16:34:44.2786 DEBUG Read finished
2022-11-17 16:34:44.2787 DEBUG Message length: 0
2022-11-17 16:34:44.2787 DEBUG Buffer lenght: 0
2022-11-17 16:34:44.2786 INFO Recieved from CE {"CommandId":2,"CommandValue":"{\"ImportCertificates\":[],\"byteArray\":\"MIIKWAIBAzCCChQGCSqGSIb3DQEHAaCCCgUEggoBMIIJ/TCCBe4GCSqGSIb3DQEHAaCCBd8EggXbMIIF1zCCBdMGCyqGSIb3DQEMCgECoIIE7jCCBOowHAYKKoZIhvcNAQwBAzAOBAjslY0EWPiyRwICB9AEggTItTgbudKRaEBCdOn1rMgGUoGjVIOxTgT/5Oo5D0gfBe/0yorvQKenVimcfDmWCXZGlsvou5Km7g3yjmK7PTZh3IUuX86bJeyECrJBZa+4ZXyrbkVV7R2GEwy99ACOkevHxBZ7H6deVKdRFDsCSC+cdLpsSGoNNi3Coowqw8i7lzXDPzph64L2Rre7cky/QJdkXKIEUGjxKYUR4cpOOmlLXfbQMR3fOChT5FrxbOnTRTAvVELtwFTh8Gxq55Et8rLgIktQ2eL8FIx43sGspukhZvm5bH05aEwVz5df0T3e3cMsZ13t4oc72phNKx9HiZmPvmi4lUdBwn6qXEJSRg+P5MWGD41StCAoZ/pDtMTPSV5DalLpDpKF6XfxByOSXfn27KYWFZoiEdxCmabS7eomqeD03uWEzWbgPW1hha69Bg1DcmSYZgDfISugtow6p+ozQSjaNnELlbz1SsiBMPbkoNj944IxSKkHgfiqwOyQm3JqHGjDH6Hp9OB3RyuQHoeWLrW9ulmWvnT/Dn+LC0T26YHeVeTO30U86r9ElBiv4iXa5JdflWqFecbjQC1Xs29yBkkFRNrfgJ4Txd/xOz0rhzFxZmcIwiJsQgh+WxSWLG3em02xmtyO7Rcxc6g+onVpEFXKiaYDqTWNDKE2wDSDPZauheyiiluj42gi5A086rSjuqqyidzWQb6FKYs73ZUM8drnYK36RT7zG8TFpV3RdrXbnui8eFRZ4bFHNlptHTwvqpZY/sDDP1R1q1EgrckMbZhOpHhKW/1GRd95eHlSGJPdTROtR3uEARZD0+1zBarw68p1k650DWua88cMrOQPKH1X8Ce++KIdFG2RtfI2P3lhPsD0gdpet0gDEC1peUXfrZVQYD3G8IfEjpYgDpgPa6zjblScidem628MPz4dnTnWrp1S7P2ILlGYinb9uCzSR17IvPoVITQvXk9KpTp1uzuWTnRQHMfUyNvgjHSAOwt3bmx5kEyC/04ujhUvQkiyPMCrDtiO/fLYkrWuqOLNr74l/VAJYU06YR7yNMfG1kEpRYkf96v0s3XK1tZEQBt26lZFPIMl/NOHYpbCfxCGZkpOGIFDojjo72p4VUuyGD/MZ41B99jcNVB2/Sqmc9XwALl/jjkJz6+aCmvLUePeLmhg2Bxrjkmf8iDfDnKByXM5lam3D\"}","SessionKey":"QXCUUKNB0T"}
2022-11-17 16:34:44.2787 DEBUG Reading from stream...
2022-11-17 16:34:44.2787 DEBUG Read finished
2022-11-17 16:34:44.2787 INFO Recieved from CE
However this is not a problem if
byteArray property is smaller in size
2022-11-17 16:32:49.7558 INFO App Started!
2022-11-17 16:32:49.8809 DEBUG Read started
2022-11-17 16:32:49.8809 DEBUG StDin initialized
2022-11-17 16:32:49.8809 DEBUG Message length: 546
2022-11-17 16:32:49.8809 DEBUG Buffer lenght: 546
2022-11-17 16:32:49.8809 DEBUG Reading from stream...
2022-11-17 16:32:49.8809 DEBUG We are in reading process...please wait
2022-11-17 16:32:49.8809 DEBUG Read 546 numer of chars
2022-11-17 16:32:49.8809 DEBUG Read finished
2022-11-17 16:32:49.8809 INFO Recieved from CE {"CommandId":2,"CommandValue":"{\"ImportCertificates\":[],\"byteArray\":\"test/wqewq+./qweqeq//eqwq\"}","SessionKey":"0SZFlRVrQR"}
As you can see here from logs app does the full flow without a problem from Starting and reading and parsing my message.
I suspected that is something regarding the size of the array, but from what I've read you can send up to 1mb of data in one NativeMessage call, data that I want to send is about 4Kb.
Not even close to the max allowed size.
Any ideas on what I am doing wrong?
I will post here also the Javascript code that send this, but I don't believe problem relies in Javascript code.
export const syncCertificates = async () => {
var requestToHost = await api.getInfo();
var responsefromHost = await nativeService.syncCertificates(requestToHost);
return certUploadResponse;
}
export const syncCertificates = async (certs) => {
var command = commands[1];
await sendNativeMessageV2(command.Id, certs);
}
async function sendNativeMessageV2(command, commandValue) {
commandInProgress = command;
var sessionKey = await storage.getSessionKey();
var message =
{
"CommandId": command,
"CommandValue": JSON.stringify(commandValue),
"SessionKey": sessionKey
}
return new Promise((resolve, reject) => {
chrome.runtime.sendNativeMessage('com.allianz.usercertificateautoenrollment', message,
async function (result) {
var response = await onNativeMessage(result);
resolve(response);
}
);
});
}

After more investigations I found this Why is my C# native messaging host hanging when reading a message sent by a Firefox browser extension?
Changing the loop as answered here did the trick.
Change the loop as follows:
using (var reader = new StreamReader(stdin))
{
var offset = 0;
while (offset < length && reader.Peek() >= 0)
{
offset += reader.Read(buffer, offset, length - offset);
}
}

How to Register Custom Metrics in Executors of spark

I have a spark program and I have a metrics defined like this
object CustomESMetrics {
lazy val metrics = new CustomESMetrics
}
class CustomESMetrics extends Source with Serializable {
lazy val metricsPrefix = "dscc_harmony_sync_handlers"
override lazy val sourceName: String = "CustomMetricSource"
override lazy val metricRegistry: MetricRegistry = new MetricRegistry
lazy val inputKafkaRecord: Counter =
metricRegistry.counter(MetricRegistry.name("input_kafka_record"))
}
Now I have spark executor code that operates on a dataset
val kafkaDs: Dataset[A] = ............. // get from kafka
kafkaDs.map { ka =>
// executor code starts
SparkEnv.get.metricsSystem.registerSource(CustomESMetrics.metrics) // line X
CustomESMetrics.metrics.inputKafkaRecord.inc() // line Y
// do something else
........
}
For line Y to increment properly line X has to be executed on each element of DS, which not efficient. It gives a warning
{"timestamp":"18/10/2022 16:05:05","logLevel":"INFO","class":"MetricsSystem","thread":"Executor task launch worker for task 2.0 in stage 0.0 (TID 2)","message":"Metrics already registered"}
Is there way that I executed line X only once for executor ?

Spark Structured Stream get messages from only one partition of Kafka

I got the situation when spark can stream and get messages from only one partition of Kafka 2-patition topic.
My topics:
C:\bigdata\kafka_2.11-0.10.1.1\bin\windows>kafka-topics --create --zookeeper localhost:2181 --partitions 2 --replication-factor 1 --topic test4
Kafka Producer:
public class KafkaFileProducer {
// kafka producer
Producer<String, String> producer;
public KafkaFileProducer() {
// configs
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
//props.put("group.id", "testgroup");
props.put("batch.size", "16384");
props.put("auto.commit.interval.ms", "1000");
props.put("linger.ms", "0");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("block.on.buffer.full", "true");
// instantiate a producer
producer = new KafkaProducer<String, String>(props);
}
/**
* #param filePath
*/
public void sendFile(String filePath) {
FileInputStream fis;
BufferedReader br = null;
try {
fis = new FileInputStream(filePath);
//Construct BufferedReader from InputStreamReader
br = new BufferedReader(new InputStreamReader(fis));
int count = 0;
String line = null;
while ((line = br.readLine()) != null) {
count ++;
// dont send the header
if (count > 1) {
producer.send(new ProducerRecord<String, String>("test4", count + "", line));
Thread.sleep(10);
}
}
System.out.println("Sent " + count + " lines of data");
} catch (Exception e) {
e.printStackTrace();
}finally{
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
producer.close();
}
}
}
Spark Structured Stream:
System.setProperty("hadoop.home.dir", "C:\\bigdata\\winutils");
final SparkSession sparkSession = SparkSession.builder().appName("Spark Data Processing").master("local[2]").getOrCreate();
// create kafka stream to get the lines
Dataset<Tuple2<String, String>> stream = sparkSession
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "test4")
.option("startingOffsets", "{\"test4\":{\"0\":-1,\"1\":-1}}")
.option("failOnDataLoss", "false")
.load().selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").as(Encoders.tuple(Encoders.STRING(), Encoders.STRING()));
Dataset<String> lines = stream.map((MapFunction<Tuple2<String, String>, String>) (Tuple2<String, String> tuple) -> tuple._2, Encoders.STRING());
Dataset<Row> result = lines.groupBy().count();
// Start running the query that prints the running counts to the console
StreamingQuery query = result//.orderBy("callTimeBin")
.writeStream()
.outputMode("complete")
.format("console")
.start();
// wait for the query to finish
try {
query.awaitTermination();
} catch (StreamingQueryException e) {
e.printStackTrace();
}
When I run the producer to send 100 lines in a file, the query only returned 51 lines. I read the debug logs of spark and noticed something as follow:
17/02/15 10:52:49 DEBUG StreamExecution: Execution stats: ExecutionStats(Map(),List(),Map(watermark -> 1970-01-01T00:00:00.000Z))
17/02/15 10:52:49 DEBUG StreamExecution: Starting Trigger Calculation
17/02/15 10:52:49 DEBUG KafkaConsumer: Pausing partition test4-1
17/02/15 10:52:49 DEBUG KafkaConsumer: Pausing partition test4-0
17/02/15 10:52:49 DEBUG KafkaSource: Partitions assigned to consumer: [test4-1, test4-0]. Seeking to the end.
17/02/15 10:52:49 DEBUG KafkaConsumer: Seeking to end of partition test4-1
17/02/15 10:52:49 DEBUG KafkaConsumer: Seeking to end of partition test4-0
17/02/15 10:52:49 DEBUG Fetcher: Resetting offset for partition test4-1 to latest offset.
17/02/15 10:52:49 DEBUG Fetcher: **Fetched {timestamp=-1, offset=49} for partition test4-1
17/02/15 10:52:49 DEBUG Fetcher: Resetting offset for partition test4-1 to earliest offset.
17/02/15 10:52:49 DEBUG Fetcher: Fetched {timestamp=-1, offset=0} for partition test4-1**
17/02/15 10:52:49 DEBUG Fetcher: Resetting offset for partition test4-0 to latest offset.
17/02/15 10:52:49 DEBUG Fetcher: Fetched {timestamp=-1, offset=51} for partition test4-0
17/02/15 10:52:49 DEBUG KafkaSource: Got latest offsets for partition : Map(test4-1 -> 0, test4-0 -> 51)
17/02/15 10:52:49 DEBUG KafkaSource: GetOffset: ArrayBuffer((test4-0,51), (test4-1,0))
17/02/15 10:52:49 DEBUG StreamExecution: getOffset took 0 ms
17/02/15 10:52:49 DEBUG StreamExecution: triggerExecution took 0 ms
I don't know why test4-1 is always reset to ealiest offset.
If someone know how to get all messages from all partitions, I would greatly appreciate.
Thanks,

There is a known Kafka issue in 0.10.1.* client: https://issues.apache.org/jira/browse/KAFKA-4547
Right now you can use 0.10.0.1 client as a workaround. It can talk to a Kafka 0.10.1.* cluster.
See https://issues.apache.org/jira/browse/SPARK-18779 for more details.

Union a List of Flume Receivers in Spark Streaming

In order to increase parallelism as recommended in the Spark Streaming Programming guide I'm setting up multiple receivers and trying to union a list of them. This code works as expected:
private JavaDStream<SparkFlumeEvent> getEventsWorking(JavaStreamingContext jssc, List<String> hosts, List<String> ports) {
List<JavaReceiverInputDStream<SparkFlumeEvent>> receivers = new ArrayList<>();
for (String host : hosts) {
for (String port : ports) {
receivers.add(FlumeUtils.createStream(jssc, host, Integer.parseInt(port)));
}
}
JavaDStream<SparkFlumeEvent> unionStreams = receivers.get(0)
.union(receivers.get(1))
.union(receivers.get(2))
.union(receivers.get(3))
.union(receivers.get(4))
.union(receivers.get(5));
return unionStreams;
}
But I don't actually know how many receivers my cluster will have until runtime. When I try to do this in a loop I get an NPE.
private JavaDStream<SparkFlumeEvent> getEventsNotWorking(JavaStreamingContext jssc, List<String> hosts, List<String> ports) {
List<JavaReceiverInputDStream<SparkFlumeEvent>> receivers = new ArrayList<>();
for (String host : hosts) {
for (String port : ports) {
receivers.add(FlumeUtils.createStream(jssc, host, Integer.parseInt(port)));
}
}
JavaDStream<SparkFlumeEvent> unionStreams = null;
for (JavaReceiverInputDStream<SparkFlumeEvent> receiver : receivers) {
if (unionStreams == null) {
unionStreams = receiver;
} else {
unionStreams.union(receiver);
}
}
return unionStreams;
}
ERROR:
16/09/15 17:05:25 ERROR JobScheduler: Error in job generator
java.lang.NullPointerException
at org.apache.spark.streaming.DStreamGraph$$anonfun$getMaxInputStreamRememberDuration$2.apply(DStreamGraph.scala:172)
at org.apache.spark.streaming.DStreamGraph$$anonfun$getMaxInputStreamRememberDuration$2.apply(DStreamGraph.scala:172)
at scala.collection.TraversableOnce$$anonfun$maxBy$1.apply(TraversableOnce.scala:225)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.reduceLeft(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayBuffer.reduceLeft(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:225)
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:105)
at org.apache.spark.streaming.DStreamGraph.getMaxInputStreamRememberDuration(DStreamGraph.scala:172)
at org.apache.spark.streaming.scheduler.JobGenerator.clearMetadata(JobGenerator.scala:270)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/09/15 17:05:25 INFO MemoryStore: ensureFreeSpace(15128) called with
curMem=520144, maxMem=555755765 16/09/15 17:05:25 INFO MemoryStore:
Block broadcast_24 stored as values in memory (estimated size 14.8 KB,
free 529.5 MB) Exception in thread "main"
java.lang.NullPointerException
at org.apache.spark.streaming.DStreamGraph$$anonfun$getMaxInputStreamRememberDuration$2.apply(DStreamGraph.scala:172)
at org.apache.spark.streaming.DStreamGraph$$anonfun$getMaxInputStreamRememberDuration$2.apply(DStreamGraph.scala:172)
at scala.collection.TraversableOnce$$anonfun$maxBy$1.apply(TraversableOnce.scala:225)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.reduceLeft(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayBuffer.reduceLeft(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:225)
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:105)
at org.apache.spark.streaming.DStreamGraph.getMaxInputStreamRememberDuration(DStreamGraph.scala:172)
at org.apache.spark.streaming.scheduler.JobGenerator.clearMetadata(JobGenerator.scala:270)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
What's the correct way to do this?

Can you please try out the below code, It would solve your problem:
private JavaDStream<SparkFlumeEvent> getEventsNotWorking(JavaStreamingContext jssc, List<String> hosts, List<String> ports) {
List<JavaDStream<SparkFlumeEvent>> receivers = new ArrayList<JavaDStream<SparkFlumeEvent>>();
for (String host : hosts) {
for (String port : ports) {
receivers.add(FlumeUtils.createStream(jssc, host, Integer.parseInt(port)));
}
}
return jssc.union(receivers.get(0), receivers.subList(1, receivers.size()));;
}

how to check the number of entries on local member

my prime member
public static void main(String[] args) throws InterruptedException {
Config config = new Config();
config.setProperty(GroupProperty.ENABLE_JMX, "true");
config.setProperty(GroupProperty.BACKPRESSURE_ENABLED, "true");
config.setProperty(GroupProperty.SLOW_OPERATION_DETECTOR_ENABLED, "true");
config.getSerializationConfig().addPortableFactory(1, new MyPortableFactory());
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
IMap<Integer, Rule> ruleMap = hz.getMap("ruleMap");
// TODO generate rule map data ; more than 100,000 entries
generateRuleMapData(ruleMap);
logger.info("generate rule finised!");
// TODO rule map index
// health check
PartitionService partitionService = hz.getPartitionService();
LocalMapStats mapStatistics = ruleMap.getLocalMapStats();
while (true) {
logger.info("isClusterSafe:{},isLocalMemberSafe:{},number of entries owned on this node = {}",
partitionService.isClusterSafe(), partitionService.isLocalMemberSafe(),
mapStatistics.getOwnedEntryCount());
Thread.sleep(1000);
}
}
logs
2016-06-28 13:53:05,048 INFO [main] b.PrimeMember (PrimeMember.java:41) - isClusterSafe:true,isLocalMemberSafe:true,number of entries owned on this node = 997465
2016-06-28 13:53:06,049 INFO [main] b.PrimeMember (PrimeMember.java:41) - isClusterSafe:true,isLocalMemberSafe:true,number of entries owned on this node = 997465
2016-06-28 13:53:07,050 INFO [main] b.PrimeMember (PrimeMember.java:41) - isClusterSafe:true,isLocalMemberSafe:true,number of entries owned on this node = 997465
my slave member
public static void main(String[] args) throws InterruptedException {
Config config = new Config();
config.setProperty(GroupProperty.ENABLE_JMX, "true");
config.setProperty(GroupProperty.BACKPRESSURE_ENABLED, "true");
config.setProperty(GroupProperty.SLOW_OPERATION_DETECTOR_ENABLED, "true");
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
IMap<Integer, Rule> ruleMap = hz.getMap("ruleMap");
PartitionService partitionService = hz.getPartitionService();
LocalMapStats mapStatistics = ruleMap.getLocalMapStats();
while (true) {
logger.info("isClusterSafe:{},isLocalMemberSafe:{},number of entries owned on this node = {}",
partitionService.isClusterSafe(), partitionService.isLocalMemberSafe(),
mapStatistics.getOwnedEntryCount());
Thread.sleep(1000);
}
}
logs
2016-06-28 14:05:53,543 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
2016-06-28 14:05:54,556 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
2016-06-28 14:05:55,563 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
2016-06-28 14:05:56,578 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
my question is :
why number of entries owned on prime member is not changed, after the cluster adds one slave member?

I should get statics per second.
while (true) {
LocalMapStats mapStatistics = ruleMap.getLocalMapStats();
logger.info(
"isClusterSafe:{},isLocalMemberSafe:{},rulemap.size:{}, number of entries owned on this node = {}",
partitionService.isClusterSafe(), partitionService.isLocalMemberSafe(), ruleMap.size(),
mapStatistics.getOwnedEntryCount());
Thread.sleep(1000);
}

Another option is to make use of localKeySet which returns the locally owned set of keys.
IMap::localKeySet.size()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

print spark streaming not working - apache-spark

Found the answer . Why doesn't the Flink SocketTextStreamWordCount work? I changed my program to save the output on the text file and it was saving the streaming data perfectly. Thanks Adi

Related

Sending byte[] array to Host using Chrome Native Messaging

How to Register Custom Metrics in Executors of spark

Spark Structured Stream get messages from only one partition of Kafka

Union a List of Flume Receivers in Spark Streaming

how to check the number of entries on local member

Categories

Resources