I would like to know If it's possible?
here is the code: numStreams I get it by using AmazonKinesisClient API
// Create the Kinesis DStreams
List<JavaDStream<byte[]>> streamsList = new ArrayList<>(numStreams);
for (int i = 0; i < numStreams; i++) {
streamsList.add(
KinesisUtils.createStream(jssc, kinesisAppName, streamName, endpointUrl, regionName,
InitialPositionInStream.TRIM_HORIZON, kinesisCheckpointInterval,
StorageLevel.MEMORY_AND_DISK_2(),accessesKey,secretKey)
);
}
I tried looking through the API and I just couldn't find any reference to disabling Apache Streaming CloudWatch.
here is the Warnings that i try getting rid of:
17/01/23 17:46:29 WARN CWPublisherRunnable: Could not publish 16 datums to CloudWatch
com.amazonaws.AmazonServiceException: User: arn:aws:iam:::user/Kinesis_Service is not authorized to perform: cloudwatch:PutMetricData (Service: AmazonCloudWatch; Status Code: 403; Error Code: AccessDenied; Request ID: *****)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1377)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:923)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:701)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:453)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:415)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:364)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:984)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:954)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.putMetricData(AmazonCloudWatchClient.java:853)
at com.amazonaws.services.kinesis.metrics.impl.DefaultCWMetricsPublisher.publishMetrics(DefaultCWMetricsPublisher.java:63)
at com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:144)
at com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.run(CWPublisherRunnable.java:90)
at java.lang.Thread.run(Unknown Source)
Preface : I know this is kind of old question, but just faced this so posting a solution for anyone who encounter this issue with Spark <= 2.3.3
It is possible to disable Cloudwatch metrics reporting at KCL (Kinesis Client) library level with withMetrics methods when building the client.
Unfortunately, Spark KinesisInputDStream method does not expose a way to change this setting and to make things worse, the default level is "DETAILED" which send 10s of metric every 10 seconds.
The way I took in order to disable it is to provide invalid credential to the method cloudWatchCredentials from KinesisInputDStream. IE : .cloudWatchCredentials(SparkAWSCredentials.builder.basicCredentials("DISABLED", "DISABLED").build())
Then comes the issue for CloudWatchAsyncClient logging warning at each tick, which I disabled by setting the following in spark log4j.properties config :
# Set Kinesis logging metrics to Warn - Since we intentionally provide
# wrong credentials in order to disable cloudwatch logging. Bad credential
# warning are logged at WARN level - so we still get errors.
log4j.logger.com.amazonaws.services.kinesis.metrics=ERROR
This will suppress warning for the metrics package class only (such as the one you mentioned) but will not suppress the error, in case those are needed.
This is nowhere close to an ideal solution, but this allowed us deploying a solution while existing Spark version deployed.
Next steps : open a ticket to Spark so they can hopefully allow us to disable it for the next versions.
Edit - created: https://issues.apache.org/jira/browse/SPARK-27420 for tracking
Related
I am generating logs for my client application where there is very limited internet connectivity. I am storing the offline logs and generating it to application insights once the user is back online. The problem I am facing is out of all the logs only request logs are coming rest are getting discarded. This is happening because of sampling even though I have already disabled the sampling from Startup.cs. Here is my code:
var aiOptions = new Microsoft.ApplicationInsights.AspNetCore.Extensions.ApplicationInsightsServiceOptions();
aiOptions.EnableAdaptiveSampling = false;
services.AddApplicationInsightsTelemetry(aiOptions);
Any Suggestions how to completely remove the sampling so that I can have all the logs in application insight.
Check this document to see different log levels. if you have latest version of sdk than ILogger Can capture without required action.
It will capture log level.
Here is the configuration of logging level.
.ConfigureLogging(
builder =>
{
builder.AddApplicationInsights("ikey");
builder.AddFilter<Microsoft.Extensions.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider>("", LogLevel.Information); // this will capture Info level traces and above.
}
For complete information check this SO thread.
I'm working on upgrading our service to use 3.63.0 (upgrading from 3.57.0) and I've noticed the following warning (with stack trace) shows up in the logs that wasn't there on the previous version:
2022-02-18 14:03:41.038 WARN 1088 --- [ main] c.s.c.s.c.c.AbstractHttpClientCache : Could not get HttpClient cache.
com.sap.cloud.sdk.cloudplatform.thread.exception.ThreadContextAccessException: No ThreadContext available for thread id=1.
at com.sap.cloud.sdk.cloudplatform.thread.ThreadLocalThreadContextFacade.lambda$tryGetCurrentContext$0(ThreadLocalThreadContextFacade.java:39) ~[cloudplatform-core-3.63.0.jar:na]
at io.vavr.Value.toTry(Value.java:1414) ~[vavr-0.10.4.jar:na]
at com.sap.cloud.sdk.cloudplatform.thread.ThreadLocalThreadContextFacade.tryGetCurrentContext(ThreadLocalThreadContextFacade.java:37) ~[cloudplatform-core-3.63.0.jar:na]
at io.vavr.control.Try.flatMapTry(Try.java:490) ~[vavr-0.10.4.jar:na]
at io.vavr.control.Try.flatMap(Try.java:472) ~[vavr-0.10.4.jar:na]
at com.sap.cloud.sdk.cloudplatform.thread.ThreadContextAccessor.tryGetCurrentContext(ThreadContextAccessor.java:84) ~[cloudplatform-core-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.RequestScopedHttpClientCache.getCache(RequestScopedHttpClientCache.java:28) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.AbstractHttpClientCache.tryGetOrCreateHttpClient(AbstractHttpClientCache.java:78) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.AbstractHttpClientCache.tryGetHttpClient(AbstractHttpClientCache.java:46) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.HttpClientAccessor.tryGetHttpClient(HttpClientAccessor.java:153) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.HttpClientAccessor.getHttpClient(HttpClientAccessor.java:131) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.octanner.mca.service.MarketingCloudApiContactService.uploadContacts(MarketingCloudApiContactService.java:138) ~[classes/:na]
...
This happens when the following calls are made...
Using the lower level API
HttpClient httpClient = HttpClientAccessor.getHttpClient(destination); // warning happens here
ODataRequestResultMultipartGeneric batchResult = requestBatch.execute(httpClient);
Using the higher level API
service
.getAllContactOriginData()
.withQueryParameter("$expand", "AdditionalIDs")
.top(size)
.filter(filter)
.executeRequest(destination)); // warning happens here
Even though this warning shows up in the logs the service requests do continue to work as expected. It's just a little concerning to see this and I'm wondering if maybe I have something misconfigured. I reviewed all of the java docs and the troubleshooting page and didn't see anything out of the ordinary other than how I am fetching my destination, but even using the DestinationAccessor didn't seem to make a difference. Also, I'm not doing any asynchronous or multi-tenant processing.
Any help you or guidance you can give on this would be appreciated!
Cheers!
Such an issue is often the result of missing Spring Boot annotations - especially in synchronous executions.
Please refer to our documentation to learn more about the SAP Cloud SDK Spring Boot integration.
Edit Feb. 28th 2022
It is safe to ignore the logged warning if your application does not need any of the SAP Cloud SDK's multitenancy features.
Error Cause
The SAP Cloud SDK for Java recently (in version 3.63.0) introduced a change to the thread propagation behavior of the HttpClientCache.
With that change, we also adapted the logging in case the propagation didn't work as expected - this is often caused by not using the ThreadContextExecutor for wrapping asynchronous operations.
This is the reason for logs like the one described by the issue author.
Planned Mitigation
In the meanwhile, we realized that these WARN logs are causing confusion on the consumer side.
We are working on improving the situation by degrading the log level to INFO for the message and to DEBUG for the exception.
I am trying to read data from a secured Kafka cluster using spark structured streaming.
Also I am using the below library to read the data - "spark-sql-kafka-0-10_2.12":"3.0.0-preview" since it has the feature to specify our custom group id (instead of spark setting its own custom group id)
Dependency used in code:
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.12</artifactId>
<version>3.0.0-preview</version>
I am getting the below error - even after specifying the required JAAS configuration in spark options.
Caused by: java.lang.IllegalArgumentException: requirement failed: Delegation token must exist for this connector.
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.kafka010.KafkaTokenUtil$.isConnectorUsingCurrentToken(KafkaTokenUtil.scala:299)
at org.apache.spark.sql.kafka010.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:533)
at org.apache.spark.sql.kafka010.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:275)
Following document specifies that we can disable the feature of obtaining delegation token - https://spark.apache.org/docs/3.0.0-preview/structured-streaming-kafka-integration.html
I tried setting this property spark.security.credentials.kafka.enabled to false in spark config, but it is still failing with the same error.
Apparently there seems to be a bug on the preview release and has been fixed on the GA Spark 3.x release.
Reference :
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-30495
Now, we can specify our custom consumer group name while fetching the data from Kafka (Even though it's not recommended and we will see a warning message while specifying it).
I'm new to service bus, I'm curious about RetryPolicy and how it works, as per the documentation, retry had happened automatically for transient exceptions(MessagingExcepitons, ServerBusy), and the default retry count is 3, but we can set out custom retry policy using RetryExponential class.
I want to see the logs does the RetryPolicy did actually trying to connect or not when exception occurs.
How can I check this, how to replicate MessagingExcepitons, ServerBusy exceptions, so that I can see the logs. I'm using azure service bus java sdk.
Can any one help me to understand this. Thanks in advance
The Java SDK is open source and looking for retryPolicy in these files shows how the underlying implementation uses it
CoreMessageSender
CoreMessageReceiver
For example, here's the flow for CoreMessageSender when an error is thrown
When an error occurs and if its a ServiceBusException, a retry is scheduled - See line
After waiting, it ensures the link is still open and increments the retry count - See line
This continues and on successful completion it resets the count - See line
As for logging, the Java SDK uses SLF4J and you can see the required logs with a line like this in your code
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
Logger.getLogger("com.microsoft.azure.servicebus").setLevel(Level.WARN);
I am stuck in one problem which I need to resolve quickly. I have gone through many posts and tutorial about spark cluster deploy mode, but I am clueless about the approach as I am stuck for some days.
My use-case :- I have lots of spark jobs submitted using 'spark2-submit' command and I need to get the application id printed in the console once they are submitted. The spark jobs are submitted using cluster deploy mode. ( In normal client mode , its getting printed )
Points I need to consider while creating solution :- I am not supposed to change code ( as it would take long time, cause there are many applications running ), I can only provide log4j properties or some custom coding.
My approach:-
1) I have tried changing the log4j levels and various log4j parameters but the logging still goes to the centralized log directory.
Part from my log4j.properties:-
log4j.logger.org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend=ALL,console
log4j.appender.org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.Target=System.out
log4j.logger.org.apache.spark.deploy.SparkSubmit=ALL
log4j.appender.org.apache.spark.deploy.SparkSubmit=console
log4j.logger.org.apache.spark.deploy.SparkSubmit=TRACE,console
log4j.additivity.org.apache.spark.deploy.SparkSubmit=false
log4j.logger.org.apache.spark.deploy.yarn.Client=ALL
log4j.appender.org.apache.spark.deploy.yarn.Client=console
log4j.logger.org.apache.spark.SparkContext=WARN
log4j.logger.org.apache.spark.scheduler.DAGScheduler=INFO,console
log4j.logger.org.apache.hadoop.ipc.Client=ALL
2) I have also tried to add custom listener and I am able to get the spark application id after the applications finishes , but not to console.
Code logic :-
public void onApplicationEnd(SparkListenerApplicationEnd arg0)
{
for (Thread t : Thread.getAllStackTraces().keySet())
{
if (t.getName().equals("main"))
{
System.out.println("The current state : "+t.getState());
Configuration config = new Configuration();
ApplicationId appId = ConverterUtils.toApplicationId(getjobUId);
// some logic to write to communicate with the main thread to print the app id to console.
}
}
}
3) I have enabled the spark.eventLog to true and specified a directory in HDFS to write the event logs from spark-submit command .
If anyone could help me in finding an approach to the solution, it would be really helpful. Or if I am doing something very wrong, any insights would help me.
Thanks.
After being stuck at the same place for some days, I was finally able to get a solution to my problem.
After going through the Spark Code for the cluster deploy mode and some blogs, few things got clear. It might help someone else looking to achieve the same result.
In cluster deploy mode, the job is submitted via a Client thread from the machine from which the user is submitting. Actually I was passing the log4j configs to the driver and executors, but missed out on the part that the log 4j configs for the "Client" was missing.
So we need to use :-
SPARK_SUBMIT_OPTS="-Dlog4j.debug=true -Dlog4j.configuration=<location>/log4j.properties" spark-submit <rest of the parameters>
To clarify:
client mode means the Spark driver is running on the same machine you ran spark submit from
cluster mode means the Spark driver is running out on the cluster somewhere
You mentioned that it is getting logged when you run the app in client mode and you can see it in the console. Your output is also getting logged when you run in cluster mode you just can't see it because it is running on a different machine.
Some ideas:
Aggregate the logs from the worker nodes into one place where you can parse them to get the app ID.
Write the appIDs to some shared location like HDFS or a database. You might be able to use a Log4j appender if you want to keep log4j.