Reactive parallelization doesn't work - multithreading

Using project Reactor 3.0.4.RELEASE. Conceptually, should be same in RxJava as well.
public Mono<Map<String, Boolean>> refreshPods(List<String> apps) {
return pods(apps)
.filter(this::isRunningAndNotThisApp)
.groupBy(Item::getName)
.flatMap(g -> g
.distinct(Item::getIp)
.collectList()
// TODO: This doesn't seem to be working as expected
.subscribeOn(Schedulers.newParallel("par-grp"))
.flatMap(client::refreshPods))
.flatMap(m -> Flux.fromIterable(m.entrySet()))
.collectMap(Map.Entry::getKey, Map.Entry::getValue);
}
The idea is to run client.refreshPods in a separate thread for each group.
Edit: I'd tried publishOn before posting this question and after the answers given here, but the output doesn't change.
Client:
public class MyServiceClientImpl implements MyServiceClient {
private final RestOperations restOperations;
private final ConfigRefreshProperties configRefreshProperties;
public Mono<Map<String, Boolean>> refreshPods(List<Item> pods) {
return Flux.fromIterable(pods)
.zipWith(Flux.interval(Duration.ofSeconds(configRefreshProperties.getRefreshDelaySeconds())),
(x, delay) -> x)
.flatMap(this::refreshWithRetry)
.collectMap(Tuple2::getT1, Tuple2::getT2);
}
private Mono<Tuple2<String, Boolean>> refreshWithRetry(Item pod) {
return Mono.<Boolean>create(emitter -> {
try {
log.info("Attempting to refresh pod: {}.", pod);
ResponseEntity<String> tryRefresh = refresh(pod);
if (!tryRefresh.getStatusCode().is2xxSuccessful()) {
log.error("Failed to refresh pod: {}.", pod);
emitter.success();
} else {
log.info("Successfully refreshed pod: {}.", pod);
emitter.success(true);
}
} catch (Exception e) {
emitter.error(e);
}
})
.map(b -> Tuples.of(pod.getIp(), b))
.log(getClass().getName(), Level.FINE)
.retryWhen(errors -> {
int maxRetries = configRefreshProperties.getMaxRetries();
return errors.zipWith(Flux.range(1, maxRetries + 1), (ex, i) -> Tuples.of(ex, i))
.flatMap(t -> {
Integer retryCount = t.getT2();
if (retryCount <= maxRetries && shouldRetry(t.getT1())) {
int retryDelaySeconds = configRefreshProperties.getRetryDelaySeconds();
long delay = (long) Math.pow(retryDelaySeconds, retryCount);
return Mono.delay(Duration.ofSeconds(delay));
}
log.error("Done retrying to refresh pod: {}.", pod);
return Mono.<Long>empty();
});
});
}
private ResponseEntity<String> refresh(Item pod) {
return restOperations.postForEntity(buildRefreshEndpoint(pod), null, String.class);
}
private String buildRefreshEndpoint(Item pod) {
return UriComponentsBuilder.fromUriString("http://{podIp}:{containerPort}/refresh")
.buildAndExpand(pod.getIp(), pod.getPort())
.toUriString();
}
private boolean shouldRetry(Throwable t) {
boolean clientError = ThrowableAnalyzer.getFirstOfType(t, HttpClientErrorException.class)
.map(HttpClientErrorException::getStatusCode)
.filter(s -> s.is4xxClientError())
.isPresent();
boolean timeoutError = ThrowableAnalyzer.getFirstOfType(t, TimeoutException.class)
.isPresent();
return timeoutError || !clientError;
}
}
The problem is that the log statement Attempting to refresh pod is printed on the same thread for every group. What am I missing here?
Logs from test run:
2017-02-07 10:g12:55.348 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Attempting to refresh pod: Item(name=news, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.357 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Successfully refreshed pod: Item(name=news, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.358 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Attempting to refresh pod: Item(name=parking, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.363 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Successfully refreshed pod: Item(name=parking, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.364 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Attempting to refresh pod: Item(name=localsearch, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.368 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Successfully refreshed pod: Item(name=localsearch, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.369 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Attempting to refresh pod: Item(name=auth, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.372 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Successfully refreshed pod: Item(name=auth, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.373 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Attempting to refresh pod: Item(name=log, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.377 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Successfully refreshed pod: Item(name=log, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.378 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Attempting to refresh pod: Item(name=fuel, ip=127.0.0.1, port=8888, podPhase=Running).
2017-02-07 10:12:55.381 INFO 33905 --- [ timer-1] c.n.d.cloud.config.MyServiceClientImpl : Successfully refreshed pod: Item(name=fuel, ip=127.0.0.1, port=8888, podPhase=Running).

edit: As made more explicit thanks to your newly provided log, and as picked up by David in the issue you created, the root cause is that you use an interval here. This will switch context to the default TimedScheduler (which will be the same for all groups). That's why anything done before the call to refreshPods seems to be ignored (work is done on the interval thread), but publishOn/subscribeOn after the interval operator should work. In a nutshell my advice of using subscribeOn directly after the create still stands.
You trigger a blocking behavior (refresh(pod)) that you wrap as a Mono in refreshWithRetry.
Unless you have a strong need for being concurrency-agnostic at this level, I'd advise you to immediately chain your subscribeOn next to the create.
This way, no surprise when using that Mono: it respects the contract and doesn't block. Like this:
return Mono.<Boolean>create(emitter -> {
//...
})
.subscribeOn(Schedulers.newParallel("par-grp"))
.map(b -> Tuples.of(pod.getIp(), b))
If you want the method to return a concurrency-agnostic publisher, then you'd need to put the subscribeOn closer to your blocking publisher, so you'd need to expand the flatMap lambda:
.flatMap(pods -> client.refreshPods(pods)
.subscribeOn(Schedulers.newParallel("par-grp"))
)

In your code you put publishOn before flatMap. Methods combining observables like flatMap or zip do their own re-scheduling when working with async sources. interval is such an async source in your case. That is why you get all results on 'timer' thread.
1) Use publishOn right before the operation you wish to go in parallel. The operation itself should not involve re-scheduling. Eg. map is a good one, flatMap is bad.
2) Use another publishOn right after it to reschedule results. Otherwise subscriber's thread might interfere.
Flux.range(1, 100)
.groupBy(i -> i % 5)
.flatMap(group -> group
.publishOn(Schedulers.newParallel("grp", 8))
.map(v -> {
// processing here
String threadName = Thread.currentThread().getName();
logger.info("processing {} from {} on {}", v, group.key(), threadName);
return v;
})
.publishOn(Schedulers.single())
)
.subscribe(v -> logger.info("got {}", v));
In case you want to make sure all group's items run on the same thread see this answer: https://stackoverflow.com/a/41697348/697313

I'm posting an answer myself for completeness. With help from #simon-baslé and #akarnokd, I got it right. Both of the following work. See reactor-core#421 for details.
Solution 1:
zipWith(Flux.interval(Duration.ofSeconds(groupMemberDelaySeconds)),
(x, delay) -> x)
.publishOn(Schedulers.newParallel("par-grp"))
.flatMap(this:: refreshWithRetry)
Solution 2:
zipWith(Flux.intervalMillis(1000 * groupMemberDelaySeconds, Schedulers.newTimer("par-grp")),
(x, delay) -> x)
.flatMap(this:: refreshWithRetry)
No subscribeOn or publishOn is necessary in the refreshPods method.

Related

Is tracing intended to be used as I am using it? How can I trace errors chain?

I'm using async-graphql and axum.
This is a reproduction of the issue: https://github.com/frederikhors/iss-async-graphql-error-handling.
To start:
cargo run
If you open the GraphiQL client at http://localhost:8000, you can use the below query to simulate what I'm trying to understand:
mutation {
mutateWithError
}
The backend response is:
{
"data": null,
"errors": [
{
"message": "I cannot mutate now, sorry!",
"locations": [
/*...*/
],
"path": ["mutateWithError"]
}
]
}
I like this, but what I don't understand is the tracing part:
2022-09-29T17:01:14.249236Z INFO async_graphql::graphql:84: close, time.busy: 626µs, time.idle: 14.3µs
in async_graphql::graphql::parse
in async_graphql::graphql::request
2022-09-29T17:01:14.252493Z INFO async_graphql::graphql:108: close, time.busy: 374µs, time.idle: 8.60µs
in async_graphql::graphql::validation
in async_graphql::graphql::request
2022-09-29T17:01:14.254592Z INFO async_graphql::graphql:146: error, error: I cannot mutate now, sorry!
in async_graphql::graphql::field with path: mutateWithError, parent_type: Mutation, return_type: String!
in async_graphql::graphql::execute
in async_graphql::graphql::request
2022-09-29T17:01:14.257389Z INFO async_graphql::graphql:136: close, time.busy: 2.85ms, time.idle: 30.8µs
in async_graphql::graphql::field with path: mutateWithError, parent_type: Mutation, return_type: String!
in async_graphql::graphql::execute
in async_graphql::graphql::request
2022-09-29T17:01:14.260729Z INFO async_graphql::graphql:122: close, time.busy: 6.31ms, time.idle: 7.80µs
in async_graphql::graphql::execute
in async_graphql::graphql::request
2022-09-29T17:01:14.264606Z INFO async_graphql::graphql:56: close, time.busy: 16.1ms, time.idle: 22.6µs
in async_graphql::graphql::request
Do you see the INFO async_graphql::graphql:146: error, error: I cannot mutate now, sorry!?
Why is it an INFO event? I would expect it to be an ERROR event.
And where is the innter error this is a DB error?
Code
The code is very simple:
pub struct Mutation;
#[Object]
impl Mutation {
async fn mutate_with_error(&self) -> async_graphql::Result<String> {
let new_string = mutate_with_error().await?;
Ok(new_string)
}
}
async fn mutate_with_error() -> anyhow::Result<String> {
match can_i_mutate_on_db().await {
Ok(s) => Ok(s),
Err(err) => Err(err.context("I cannot mutate now, sorry!")),
}
}
async fn can_i_mutate_on_db() -> anyhow::Result<String> {
bail!("this is a DB error!")
}
async fn graphql_handler(
schema: Extension<Schema<Query, Mutation, EmptySubscription>>,
req: GraphQLRequest,
) -> GraphQLResponse {
schema.execute(req.into_inner()).await.into()
}

How to consume 2 Azure Event Hubs in Spring Cloud Stream

I want to consume messages from following 2 connection strings
Endpoint=sb://region1.servicebus.windows.net/;SharedAccessKeyName=abc;SharedAccessKey=123;EntityPath=my-request
Endpoint=sb://region2.servicebus.windows.net/;SharedAccessKeyName=def;SharedAccessKey=456;EntityPath=my-request
It is very simple to use Java API
EventHubConsumerAsyncClient client = new EventHubClientBuilder()
.connectionString("Endpoint=sb://region1.servicebus.windows.net/;SharedAccessKeyName=abc;SharedAccessKey=123;EntityPath=my-request")
.buildAsyncConsumerClient();
However, how to make this work in yaml file using Spring Cloud Stream (equivalent to the Java code above)? Tried all tutorials found online and none of them works.
spring:
cloud:
stream:
function:
definition: consumeRegion1;consumeRegion2
bindings:
consumeRegion1-in-0:
destination: my-request
binder: eventhub1
consumeRegion2-in-0:
destination: my-request
binder: eventhub2
binders:
eventhub1:
type: eventhub
default-candidate: false
environment:
spring:
cloud:
azure:
eventhub:
connection-string: Endpoint=sb://region1.servicebus.windows.net/;SharedAccessKeyName=abc;SharedAccessKey=123;EntityPath=my-request
eventhub2:
type: eventhub
default-candidate: false
environment:
spring:
cloud:
azure:
eventhub:
connection-string: Endpoint=sb://region2.servicebus.windows.net/;SharedAccessKeyName=def;SharedAccessKey=456;EntityPath=my-request
#Bean
public Consumer<Message<String>> consumeRegion1() {
return message -> {
System.out.printf(message.getPayload());
};
}
#Bean
public Consumer<Message<String>> consumeRegion2() {
return message -> {
System.out.printf(message.getPayload());
};
}
<dependency>
<groupId>com.azure.spring</groupId>
<artifactId>azure-spring-cloud-stream-binder-eventhubs</artifactId>
<version>2.5.0</version>
</dependency>
error log
2021-10-14 21:12:26.760 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.integration.config.IntegrationManagementConfiguration' of type [org.springframework.integration.config.IntegrationManagementConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2021-10-14 21:12:26.882 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'integrationChannelResolver' of type [org.springframework.integration.support.channel.BeanFactoryChannelResolver] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2021-10-14 21:12:26.884 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'integrationDisposableAutoCreatedBeans' of type [org.springframework.integration.config.annotation.Disposables] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2021-10-14 21:12:29.587 WARN 1 --- [ main] a.s.c.a.e.AzureEventHubAutoConfiguration : Can't construct the EventHubConnectionStringProvider, namespace: null, connectionString: null
2021-10-14 21:12:29.611 INFO 1 --- [ main] a.s.c.a.e.AzureEventHubAutoConfiguration : No event hub connection string provided.
2021-10-14 21:12:30.290 INFO 1 --- [ main] c.a.s.i.eventhub.impl.EventHubTemplate : Started EventHubTemplate with properties: {checkpointConfig=CheckpointConfig{checkpointMode=RECORD, checkpointCount=0, checkpointInterval=null}, startPosition=LATEST}
2021-10-14 21:12:32.934 INFO 1 --- [ main] c.f.c.c.BeanFactoryAwareFunctionRegistry : Can't determine default function definition. Please use 'spring.cloud.function.definition' property to explicitly define it.
As log says use spring.cloud.function.definition property.
Refer to docs

Cannot write to Druid through SparkStreaming and Tranquility

I am trying to write results from Spark Streaming job to Druid datasource. Spark successfully completes its jobs and hands to Druid. Druid starts indexing but does not write anything.
My code and logs are as follows:
import org.apache.spark._
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
impor org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import scala.util.parsing.json._
import com.metamx.tranquility.spark.BeamRDD._
import org.joda.time.{DateTime, DateTimeZone}
object MyDirectStreamDriver {
def main(args:Array[String]) {
val sc = new SparkContext()
val ssc = new StreamingContext(sc, Minutes(5))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "[$hadoopURL]:6667",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "use_a_separate_group_id_for_each_stream",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val eventStream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](Array("events_test"), kafkaParams))
val t = eventStream.map(record => record.value).flatMap(_.split("(?<=\\}),(?=\\{)")).
map(JSON.parseRaw(_).getOrElse(new JSONObject(Map(""-> ""))).asInstanceOf[JSONObject]).
map( new DateTime(), x => (x.obj.getOrElse("OID", "").asInstanceOf[String], x.obj.getOrElse("STATUS", "").asInstanceOf[Double].toInt)).
map(x => MyEvent(x._1, x._2, x._3))
t.saveAsTextFiles("/user/username/result", "txt")
t.foreachRDD(rdd => rdd.propagate(new MyEventBeamFactory))
ssc.start
ssc.awaitTermination
}
}
case class MyEvent (time: DateTime,oid: String, status: Int)
{
#JsonValue
def toMap: Map[String, Any] = Map(
"timestamp" -> (time.getMillis / 1000),
"oid" -> oid,
"status" -> status
)
}
object MyEvent {
implicit val MyEventTimestamper = new Timestamper[MyEvent] {
def timestamp(a: MyEvent) = a.time
}
val Columns = Seq("time", "oid", "status")
def fromMap(d: Dict): MyEvent = {
MyEvent(
new DateTime(long(d("timestamp")) * 1000),
str(d("oid")),
int(d("status"))
)
}
}
import org.apache.curator.framework.CuratorFrameworkFactory
import org.apache.curator.retry.BoundedExponentialBackoffRetry
import io.druid.granularity._
import io.druid.query.aggregation.LongSumAggregatorFactory
import com.metamx.common.Granularity
import org.joda.time.Period
class MyEventBeamFactory extends BeamFactory[MyEvent]
{
// Return a singleton, so the same connection is shared across all tasks in the same JVM.
def makeBeam: Beam[MyEvent] = MyEventBeamFactory.BeamInstance
object MyEventBeamFactory {
val BeamInstance: Beam[MyEvent] = {
// Tranquility uses ZooKeeper (through Curator framework) for coordination.
val curator = CuratorFrameworkFactory.newClient(
"{IP_2}:2181",
new BoundedExponentialBackoffRetry(100, 3000, 5)
)
curator.start()
val indexService = DruidEnvironment("druid/overlord") // Your overlord's druid.service, with slashes replaced by colons.
val discoveryPath = "/druid/discovery" // Your overlord's druid.discovery.curator.path
val dataSource = "events_druid"
val dimensions = IndexedSeq("oid")
val aggregators = Seq(new LongSumAggregatorFactory("status", "status"))
// Expects simpleEvent.timestamp to return a Joda DateTime object.
DruidBeams
.builder((event: MyEvent) => event.time)
.curator(curator)
.discoveryPath(discoveryPath)
.location(DruidLocation(indexService, dataSource))
.rollup(DruidRollup(SpecificDruidDimensions(dimensions), aggregators, QueryGranularities.MINUTE))
.tuning(
ClusteredBeamTuning(
segmentGranularity = Granularity.HOUR,
windowPeriod = new Period("PT10M"),
partitions = 1,
replicants = 1
)
)
.buildBeam()
}
}
}
This is the druid indexing task log: (index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0)
2017-12-28T13:05:19,299 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Running with task: {
"type" : "index_realtime",
"id" : "index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0",
"resource" : {
"availabilityGroup" : "events_druid-2017-12-28T13:00:00.000Z-0000",
"requiredCapacity" : 1
},
"spec" : {
"dataSchema" : {
"dataSource" : "events_druid",
"parser" : {
"type" : "map",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "iso",
"missingValue" : null
},
"dimensionsSpec" : {
"dimensions" : [ "oid" ],
"spatialDimensions" : [ ]
}
}
},
"metricsSpec" : [ {
"type" : "longSum",
"name" : "status",
"fieldName" : "status",
"expression" : null
} ],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "HOUR",
"queryGranularity" : {
"type" : "duration",
"duration" : 60000,
"origin" : "1970-01-01T00:00:00.000Z"
},
"rollup" : true,
"intervals" : null
}
},
"ioConfig" : {
"type" : "realtime",
"firehose" : {
"type" : "clipped",
"delegate" : {
"type" : "timed",
"delegate" : {
"type" : "receiver",
"serviceName" : "firehose:druid:overlord:events_druid-013-0000-0000",
"bufferSize" : 100000
},
"shutoffTime" : "2017-12-28T14:15:00.000Z"
},
"interval" : "2017-12-28T13:00:00.000Z/2017-12-28T14:00:00.000Z"
},
"firehoseV2" : null
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : 75000,
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M",
"basePersistDirectory" : "/tmp/1514466313873-0",
"versioningPolicy" : {
"type" : "intervalStart"
},
"rejectionPolicy" : {
"type" : "none"
},
"maxPendingPersists" : 0,
"shardSpec" : {
"type" : "linear",
"partitionNum" : 0
},
"indexSpec" : {
"bitmap" : {
"type" : "concise"
},
"dimensionCompression" : "lz4",
"metricCompression" : "lz4",
"longEncoding" : "longs"
},
"buildV9Directly" : true,
"persistThreadPriority" : 0,
"mergeThreadPriority" : 0,
"reportParseExceptions" : false,
"handoffConditionTimeout" : 0,
"alertTimeout" : 0
}
},
"context" : null,
"groupId" : "index_realtime_events_druid",
"dataSource" : "events_druid"
}
2017-12-28T13:05:19,312 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Attempting to lock file[/apps/druid/tasks/index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0/lock].
2017-12-28T13:05:19,313 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Acquired lock file[/apps/druid/tasks/index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0/lock] in 1ms.
2017-12-28T13:05:19,317 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Running task: index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0
2017-12-28T13:05:19,323 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0] location changed to [TaskLocation{host='hadooptest9.{host}', port=8100}].
2017-12-28T13:05:19,323 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0] status changed to [RUNNING].
2017-12-28T13:05:19,327 INFO [main] org.eclipse.jetty.server.Server - jetty-9.3.19.v20170502
2017-12-28T13:05:19,350 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Creating plumber using rejectionPolicy[io.druid.segment.realtime.plumber.NoopRejectionPolicyFactory$1#7925d517]
2017-12-28T13:05:19,351 INFO [task-runner-0-priority-0] io.druid.server.coordination.CuratorDataSegmentServerAnnouncer - Announcing self[DruidServerMetadata{name='hadooptest9.{host}:8100', host='hadooptest9.{host}:8100', maxSize=0, tier='_default_tier', type='realtime', priority='0'}] at [/druid/announcements/hadooptest9.{host}:8100]
2017-12-28T13:05:19,382 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Expect to run at [2017-12-28T14:10:00.000Z]
2017-12-28T13:05:19,392 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2017-12-28T13:05:19,392 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z].
2017-12-28T13:05:19,392 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
2017-12-28T13:05:19,451 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Connecting firehose: firehose:druid:overlord:events_druid-013-0000-0000
2017-12-28T13:05:19,453 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Found chathandler of class[io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider]
2017-12-28T13:05:19,453 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[firehose:druid:overlord:events_druid-013-0000-0000]
2017-12-28T13:05:19,454 INFO [task-runner-0-priority-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName='firehose:druid:overlord:events_druid-013-0000-0000', host='hadooptest9.{host}', port=8100}]
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider as a provider class
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering io.druid.server.initialization.jetty.CustomExceptionMapper as a provider class
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering io.druid.server.StatusResource as a root resource class
2017-12-28T13:05:19,505 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version 'Jersey: 1.19.3 10/24/2016 03:43 PM'
2017-12-28T13:05:19,515 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[events_druid-013-0000-0000]
2017-12-28T13:05:19,515 INFO [task-runner-0-priority-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName='events_druid-013-0000-0000', host='hadooptest9.{host}', port=8100}]
2017-12-28T13:05:19,529 WARN [task-runner-0-priority-0] org.apache.curator.utils.ZKPaths - The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.
2017-12-28T13:05:19,535 INFO [task-runner-0-priority-0] io.druid.server.metrics.EventReceiverFirehoseRegister - Registering EventReceiverFirehoseMetric for service [firehose:druid:overlord:events_druid-013-0000-0000]
2017-12-28T13:05:19,536 INFO [task-runner-0-priority-0] io.druid.data.input.FirehoseFactory - Firehose created, will shut down at: 2017-12-28T14:15:00.000Z
2017-12-28T13:05:19,574 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2017-12-28T13:05:19,576 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope "Singleton"
2017-12-28T13:05:19,583 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope "Singleton"
2017-12-28T13:05:19,845 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,863 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.SegmentListerResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,874 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.QueryResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,876 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.segment.realtime.firehose.ChatHandlerResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,880 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,882 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,883 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope "Undefined"
2017-12-28T13:05:19,896 WARN [main] com.sun.jersey.spi.inject.Errors - The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.
2017-12-28T13:05:19,905 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#2fba0dac{/,null,AVAILABLE}
2017-12-28T13:05:19,914 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector#25218a4d{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}
2017-12-28T13:05:19,914 INFO [main] org.eclipse.jetty.server.Server - Started #6014ms
2017-12-28T13:05:19,915 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer#426710f0].
2017-12-28T13:05:19,919 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid/listeners/lookups/__default/hadooptest9.{host}:8100]
2017-12-28T13:05:20,517 WARN [task-runner-0-priority-0] io.druid.segment.realtime.firehose.PredicateFirehose - [0] InputRow(s) ignored as they do not satisfy the predicate
This is index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0 payload:
{
"task":"index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0","payload":{
"id":"index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0","resource":{
"availabilityGroup":"events_druid-2017-12-28T13:00:00.000Z-0000","requiredCapacity":1},"spec":{
"dataSchema":{
"dataSource":"events_druid","parser":{
"type":"map","parseSpec":{
"format":"json","timestampSpec":{
"column":"timestamp","format":"iso","missingValue":null},"dimensionsSpec":{
"dimensions":["oid"],"spatialDimensions":[]}}},"metricsSpec":[{
"type":"longSum","name":"status","fieldName":"status","expression":null}],"granularitySpec":{
"type":"uniform","segmentGranularity":"HOUR","queryGranularity":{
"type":"duration","duration":60000,"origin":"1970-01-01T00:00:00.000Z"},"rollup":true,"intervals":null}},"ioConfig":{
"type":"realtime","firehose":{
"type":"clipped","delegate":{
"type":"timed","delegate":{
"type":"receiver","serviceName":"firehose:druid:overlord:events_druid-013-0000-0000","bufferSize":100000},"shutoffTime":"2017-12-28T14:15:00.000Z"},"interval":"2017-12-28T13:00:00.000Z/2017-12-28T14:00:00.000Z"},"firehoseV2":null},"tuningConfig":{
"type":"realtime","maxRowsInMemory":75000,"intermediatePersistPeriod":"PT10M","windowPeriod":"PT10M","basePersistDirectory":"/tmp/1514466313873-0","versioningPolicy":{
"type":"intervalStart"},"rejectionPolicy":{
"type":"none"},"maxPendingPersists":0,"shardSpec":{
"type":"linear","partitionNum":0},"indexSpec":{
"bitmap":{
"type":"concise"},"dimensionCompression":"lz4","metricCompression":"lz4","longEncoding":"longs"},"buildV9Directly":true,"persistThreadPriority":0,"mergeThreadPriority":0,"reportParseExceptions":false,"handoffConditionTimeout":0,"alertTimeout":0}},"context":null,"groupId":"index_realtime_events_druid","dataSource":"events_druid"}}
This is end of spark job stderr
50:09 INFO ZooKeeper: Client environment:os.version=3.10.0-514.10.2.el7.x86_64
17/12/28 14:50:09 INFO ZooKeeper: Client environment:user.name=yarn
17/12/28 14:50:09 INFO ZooKeeper: Client environment:user.home=/home/yarn
17/12/28 14:50:09 INFO ZooKeeper: Client environment:user.dir=/data1/hadoop/yarn/local/usercache/hdfs/appcache/application_1512485869804_6924/container_e58_1512485869804_6924_01_000002
17/12/28 14:50:09 INFO ZooKeeper: Initiating client connection, connectString={IP2}:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#5967905
17/12/28 14:50:09 INFO ClientCnxn: Opening socket connection to server {IP2}/{IP2}:2181. Will not attempt to authenticate using SASL (unknown error)
17/12/28 14:50:09 INFO ClientCnxn: Socket connection established, initiating session, client: /{IP6}:42704, server: {IP2}/{IP2}:2181
17/12/28 14:50:09 INFO ClientCnxn: Session establishment complete on server {IP2}/{IP2}:2181, sessionid = 0x25fa4ea15980119, negotiated timeout = 40000
17/12/28 14:50:10 INFO ConnectionStateManager: State change: CONNECTED
17/12/28 14:50:10 INFO Version: HV000001: Hibernate Validator 5.1.3.Final
17/12/28 14:50:10 INFO JsonConfigurator: Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, directory='extensions', hadoopDependenciesDir='hadoop-dependencies', hadoopContainerDruidClasspath='null', loadList=null}]
17/12/28 14:50:10 INFO LoggingEmitter: Start: started [true]
17/12/28 14:50:11 INFO FinagleRegistry: Adding resolver for scheme[disco].
17/12/28 14:50:11 INFO CachedKafkaConsumer: Initial fetch for spark-executor-use_a_separate_group_id_for_each_stream events_test 0 6658
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 WARN MapPartitioner: Cannot partition object of class[class MyEvent] by time and dimensions. Consider implementing a Partitioner.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:16 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1541 bytes result sent to driver
I have also written result to a text file to make sure data is coming and formatted. Here are a few lines of text file:
MyEvent(2017-12-28T16:10:00.387+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.406+03:00,0030,1)
MyEvent(2017-12-28T16:10:00.417+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.431+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.448+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.464+03:00,0030,1)
Help is much appreciated. Thanks.
This problem was solved by adding timestampSpec to DruidBeams as such:
DruidBeams
.builder((event: MyEvent) => event.time)
.curator(curator)
.discoveryPath(discoveryPath)
.location(DruidLocation(indexService, dataSource))
.rollup(DruidRollup(SpecificDruidDimensions(dimensions), aggregators, QueryGranularities.MINUTE))
.tuning(
ClusteredBeamTuning(
segmentGranularity = Granularity.HOUR,
windowPeriod = new Period("PT10M"),
partitions = 1,
replicants = 1
)
)
.timestampSpec(new TimestampSpec("timestamp", "posix", null))
.buildBeam()

File Tail Inbound Channel Adapter stderr and stdout

I am trying to a tail file using spring integration and it is working as the code below but i have two questions
#Configuration
public class RootConfiguration {
#Bean(name = PollerMetadata.DEFAULT_POLLER)
public PollerMetadata defaultPoller() {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setTrigger(new PeriodicTrigger(10));
return pollerMetadata;
}
#Bean
public MessageChannel input() {
return new QueueChannel(50);
}
#Bean
public FileTailInboundChannelAdapterFactoryBean tailInboundChannelAdapterParser() {
FileTailInboundChannelAdapterFactoryBean x = new FileTailInboundChannelAdapterFactoryBean();
x.setAutoStartup(true);
x.setOutputChannel(input());
x.setTaskExecutor(taskExecutor());
x.setNativeOptions("-F -n +0");
x.setFile(new File("/home/shahbour/Desktop/file.txt"));
return x;
}
#Bean
#ServiceActivator(inputChannel = "input")
public LoggingHandler loggingHandler() {
return new LoggingHandler("info");
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
}
Per below log i have 4 threads used for tailing the file. Do i need all of them or i can disable some. Why do i have a thread for Monitoring process java.lang.UNIXProcess#b37e761 , Reading stderr ,Reading stdout .
I am asking this because i am going to run the program on voip switch and i want to use the minimal resources possible.
2016-12-10 13:22:55.666 INFO 14862 --- [ taskExecutor-1] t.OSDelegatingFileTailingMessageProducer : Starting tail process
2016-12-10 13:22:55.665 INFO 14862 --- [ main] t.OSDelegatingFileTailingMessageProducer : started tailInboundChannelAdapterParser
2016-12-10 13:22:55.682 INFO 14862 --- [ main] o.s.i.endpoint.PollingConsumer : started rootConfiguration.loggingHandler.serviceActivator
2016-12-10 13:22:55.682 INFO 14862 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 2147483647
2016-12-10 13:22:55.701 INFO 14862 --- [ main] c.t.SonusbrokerApplication : Started SonusbrokerApplication in 3.84 seconds (JVM running for 4.687)
2016-12-10 13:22:55.703 DEBUG 14862 --- [ taskExecutor-2] t.OSDelegatingFileTailingMessageProducer : Monitoring process java.lang.UNIXProcess#b37e761
2016-12-10 13:22:55.711 DEBUG 14862 --- [ taskExecutor-3] t.OSDelegatingFileTailingMessageProducer : Reading stderr
2016-12-10 13:22:55.711 DEBUG 14862 --- [ taskExecutor-4] t.OSDelegatingFileTailingMessageProducer : Reading stdout
My Second question is , is it possible to start reading the file from the begging and continue to tail , i was thinking in using native options for this -n 1000
note: the real code will be to monitor folder for new files as they are created and then start the tail process
The process monitor is needed to waitFor() the process - it doesn't use any resources except a bit of memory.
The stdout reader is needed to actually handle the data that is produced by the tail command.
The starting thread (in your case taskExecutor-1 exits after it has done its work of starting the other threads).
There is not currently an option to disable the stderr reader, but it would be easy for us to add one so there are only 2 threads at runtime.
Feel free to open a JIRA 'improvement' Issue and, of course, contributions are welcome.

Griffon: Error when execute code for JInternalFrame

I just searching in google recently and want to try this code below to work in Griffon 1.2.0 but when i run it it give me error like this:
2013-03-23 12:13:01,877 [main] DEBUG griffon.plugins.i18n.I18nEnhancer - Enhancing org.codehaus.groovy.runtime.HandleMetaClass#b7ea5c[groovy.lang.ExpandoMetaClass#b7ea5c[class griffon.swing.SwingApplication]] with griffon.plugins.i18n.MessageSourceHolder#7b7bee
2013-03-23 12:13:01,909 [main] ERROR griffon.util.GriffonExceptionHandler - Uncaught Exception
groovy.lang.GroovyRuntimeException: Cannot add new method [getMessage] for arguments [[class java.lang.String, class java.util.Locale]]. It already exists!
at griffon.plugins.i18n.I18nEnhancer.enhance(I18nEnhancer.groovy:34)
at griffon.plugins.i18n.I18nEnhancer.enhance(I18nEnhancer.groovy)
at griffon.plugins.i18n.I18nEnhancer$enhance.call(Unknown Source)
at I18nSupportGriffonAddon.addonPostInit(I18nSupportGriffonAddon.groovy:40)
at griffon.core.GriffonAddon$addonPostInit.call(Unknown Source)
at griffon.core.GriffonAddon$addonPostInit.call(Unknown Source)
at org.codehaus.griffon.runtime.util.AddonHelper$_handleAddonsAtStartup_closure3.doCall(AddonHelper.groovy:110)
at org.codehaus.griffon.runtime.util.AddonHelper.handleAddonsAtStartup(AddonHelper.groovy:108)
at org.codehaus.griffon.runtime.core.DefaultAddonManager.doInitialize(DefaultAddonManager.java:33)
at org.codehaus.griffon.runtime.core.AbstractAddonManager.initialize(AbstractAddonManager.java:101)
at org.codehaus.griffon.runtime.util.GriffonApplicationHelper.initializeAddonManager(GriffonApplicationHelper.java:394)
at org.codehaus.griffon.runtime.util.GriffonApplicationHelper.prepare(GriffonApplicationHelper.java:149)
at org.codehaus.griffon.runtime.core.AbstractGriffonApplication.initialize(AbstractGriffonApplication.java:231)
at griffon.swing.AbstractSwingGriffonApplication.bootstrap(AbstractSwingGriffonApplication.java:75)
at griffon.swing.AbstractSwingGriffonApplication.run(AbstractSwingGriffonApplication.java:132)
at griffon.swing.SwingApplication.run(SwingApplication.java:45)
at griffon.swing.SwingApplication.main(SwingApplication.java:37)
[delete] Deleting directory G:\latihan\outer\staging\windows
Here's the code, this was originaly made by aalmiray
--- OuterController ---
package outer
class OuterController {
def model
def view
def builder
def newFrame = {
String id = 'inner-' + System.currentTimeMillis()
def (m, v, c) = createMVCGroup('inner', id, title: "Frame ${model.count++}")
builder.desktopPane(view.desktop) {
widget(v.innerFrame)
}
}
}
--- OuterModel ---
package outer
class OuterModel {
int count = 1
}
--- OuterView ---
package outer
application(title: 'outer',
preferredSize: [320, 240],
pack: true,
locationByPlatform:true,
iconImage: imageIcon('/griffon-icon-48x48.png').image,
iconImages: [imageIcon('/griffon-icon-48x48.png').image,
imageIcon('/griffon-icon-32x32.png').image,
imageIcon('/griffon-icon-16x16.png').image]) {
borderLayout()
button(newFrameAction, constraints: NORTH)
desktopPane(id: 'desktop', constraints: CENTER)
}
--- InnerController ---
package outer
class InnerController {
private String id
def view
void mvcGroupInit(Map args) {
id = args.mvcName
}
void mvcGroupDestroy() {
execAsync {
def desktop = view.innerFrame.parent
desktop.remove view.innerFrame
desktop.invalidate()
desktop.repaint()
}
}
def close = {
destroyMVCGroup id
}
}
--- InnerView ---
package outer
internalFrame(title: title, size: [200, 200], id: 'innerFrame',
visible: true, iconifiable: true, maximizable: true,
resizable: true, closable: true) {
gridLayout(cols: 1, rows: 2)
label 'Content goes here'
button closeAction
}
I need your help.
Regards,
Hendra
Gievn the error message I'd say you have the i18n and i18n-support plugins installed. These plugins are no longer required since Griffon 1.1.0. Uninstalling the plugins will solve your problem.
The reason is that the functionality provided by those plugins is now available in Griffon core since 1.1.0, so those plugins create a conflict.

Resources