How to access statistics endpoint for a Spark Streaming application?

How to access statistics endpoint for a Spark Streaming application? - apache-spark

As of Spark 2.2.0, there's are new endpoints in the API for getting information about streaming jobs.
I run Spark on EMR clusters, using Spark 2.2.0 in cluster mode.
When I hit the endpoint for my streaming jobs, all it gives me is the error message:
no streaming listener attached to <stream name>
I've dug through the Spark codebase a bit, but this feature is not very well documented. So I'm curious if this is a bug? Is there some configuration I need to do to get this endpoint working?
This appears to be an issue specifically when running on the cluster. The same code running on Spark 2.2.0 on my local machine shows the statistics as expected, but gives that error message when run on the cluster.

I'm using the very latest Spark 2.3.0-SNAPSHOT built today from the master so YMMV. It worked fine.
Is there some configuration I need to do to get this endpoint working?
No. It's supposed to work fine with no changes to the default configuration.
Make sure the you use the host and port of the driver (as rumors are that you could also access 18080 of Spark History Server that does show all the same endpoints, and the same jobs running, but no streaming listener attached).
As you can see in the source code where the error message lives it can happen only when ui.getStreamingJobProgressListener has not been registered (that ends up in case None).
So the question now is why would that SparkListener not be registered?
That leads us to the streamingJobProgressListener var that is set using setStreamingJobProgressListener method exclusively while StreamingTab is being instantiated (which was the reason why I asked you if you can see the Streaming tab).
In other words, if you see the Streaming tab in web UI, you have the streaming metric endpoint(s) available. Check the URL to the endpoint which should be in the format:
http://[driverHost]:[port]/api/v1/applications/[appId]/streaming/statistics
I tried to reproduce your case and did the following that led me to a working case.
Started one of the official examples of Spark Streaming applications.
$ ./bin/run-example streaming.StatefulNetworkWordCount localhost 9999
I did run nc -lk 9999 first.
Opened the web UI # http://localhost:4040/streaming to make sure the Streaming tab is there.
Made sure http://localhost:4040/api/v1/applications/ responds with application ids.
$ http http://localhost:4040/api/v1/applications/
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 266
Content-Type: application/json
Date: Wed, 13 Dec 2017 07:58:04 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Vary: Accept-Encoding, User-Agent
[
{
"attempts": [
{
"appSparkVersion": "2.3.0-SNAPSHOT",
"completed": false,
"duration": 0,
"endTime": "1969-12-31T23:59:59.999GMT",
"endTimeEpoch": -1,
"lastUpdated": "2017-12-13T07:53:53.751GMT",
"lastUpdatedEpoch": 1513151633751,
"sparkUser": "jacek",
"startTime": "2017-12-13T07:53:53.751GMT",
"startTimeEpoch": 1513151633751
}
],
"id": "local-1513151634282",
"name": "StatefulNetworkWordCount"
}
]
Accessed the endpoint for the Spark Streaming application # http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics.
$ http http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 219
Content-Type: application/json
Date: Wed, 13 Dec 2017 08:00:10 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Vary: Accept-Encoding, User-Agent
{
"avgInputRate": 0.0,
"avgProcessingTime": 30,
"avgSchedulingDelay": 0,
"avgTotalDelay": 30,
"batchDuration": 1000,
"numActiveBatches": 0,
"numActiveReceivers": 1,
"numInactiveReceivers": 0,
"numProcessedRecords": 0,
"numReceivedRecords": 0,
"numReceivers": 1,
"numRetainedCompletedBatches": 376,
"numTotalCompletedBatches": 376,
"startTime": "2017-12-13T07:53:54.921GMT"
}

TL;DR
Just go to:
http://localhost:4040/streaming
Had a same issue. I ran spark application from Pycharm Python virtual environment. Spark reported that port 4040 was taken:
Spark context Web UI available at http://192.168.100.221:4042
but I saw no jobs there and Streaming tab missing.
Then I went to
http://localhost:4040/streaming
and behold, everything was there.

If you look at the output of PyCharm in the console window it will show what port it used streaming on. I was assuming it was 4040 but when i checked the output carefully the port was on 4041. Here is the output:
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Then you can use localhost:4041 on any web browser and you should see the streaming output. Hope this helps!

Related

ECONNREFUSED when attempting to POST to emulator from within local Docker container

TLDR:
Can't post to local Cosmos Emulator. Can post to Azure Cosmos, but not with #azure/cosmos-sign, only with #azure/cosmos (which seems utterly bizare as the latter is supposedly built upon the former.) This is not ideal (as the message signing portion alone is very lightweight with REST API directly). Bug, or user error? Why do the instructions for enabling networking/https not seem to work?
Details:
I have a Node.js based app, and am using the Azure/cosmos-sign package to generate the correct headers via the generateHeaders method to save a JSON object in the local Cosmos Emulator.
Upon trying to post from the Node app to the URI provided in the Emulator Quickstart (https://localhost:8081), the error returned is...
Error: connect ECONNREFUSED 127.0.0.1:8081 : https://localhost:8081
As per these instructions...
Enable access to emulator on a local network
If you have multiple machines using a single network, and if you set
up the emulator on one machine and want to access it from other
machine. In such case, you need to enable access to the emulator on a
local network.
You can run the emulator on a local network. To enable network access,
specify the /AllowNetworkAccess option at the command-line, which
also requires that you specify /Key=key_string or
/KeyFile=file_name. You can use /GenKeyFile=file_name to generate
a file with a random key upfront. Then you can pass that to
/KeyFile=file_name or /Key=contents_of_file.
To enable network access for the first time, the user should shut down
the emulator and delete the emulator's data directory
%LOCALAPPDATA%\CosmosDBEmulator.
-https://learn.microsoft.com/en-us/azure/cosmos-db/local-emulator?tabs=cli%2Cssl-netstd21#enable-access-to-emulator-on-a-local-network
...I thought perhaps I needed to enable the networking functionality. It is all on the same (Windows) host (with the Node.js application running in Docker on the same host as the Emulator is installed). But this caused more problems with no benefit. With the generated key, I can load the included UI for managing the local emulator instance, but I then can't create Databases or Containers (without resetting the emulator and starting it again normally, eg: without the AllowNetworkAccess and related settings).
Attempting to use the included Explorer to create a Database returns...
Error while creating database SampleDb:
{
"code": 401,
"body": {
"code": "Unauthorized",
"message": "The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'post\ndbs\n\nmon, 29 mar 2021 23:33:45 gmt\n\n'\r\nActivityId: 29e4e700-d1b7-4d59-bdea-5931e4d6622d, Microsoft.Azure.Documents.Common/2.11.0"
},
"headers": {
"access-control-allow-credentials": "true",
"access-control-allow-origin": "https://localhost:8081",
"access-control-expose-headers": "Access-Control-Allow-Origin,Access-Control-Allow-Credentials,Content-Type,x-ms-activity-id,x-ms-gatewayversion",
"content-type": "application/json",
"date": "Mon, 29 Mar 2021 23:33:45 GMT",
"server": "Microsoft-HTTPAPI/2.0",
"x-firefox-spdy": "h2",
"x-ms-activity-id": "29e4e700-d1b7-4d59-bdea-5931e4d6622d",
"x-ms-gatewayversion": "version=2.11.0",
"x-ms-throttle-retry-count": 0,
"x-ms-throttle-retry-wait-time-ms": 0
},
"activityId": "29e4e700-d1b7-4d59-bdea-5931e4d6622d"
}
I did see this somewhat similar SO question, but it was abandoned.
This one, however seems to imply they simply reverted the KeyFile steps mentioned in the MS Docs. It seems odd that I am getting the same error from the Node.js POST regardless of if I use the AllowNetworkAccess switch or not.
Using the /NoFirewall switch as recommended here didnt resolve POSTs but did allow the Explorer UI to still work properly. The upvoted answer for that question is what I have already tried (/AllowNetworkAccess /KeyFile=...., and is not working, as explained above).
The docs here indicate that TLS (https) is in fact required...
"The Azure Cosmos DB Emulator supports only secure communication via TLS"
However, here they seem to indicate that, in the Node SDK (which relies on the same cosmos-sign library I am using)...
"TLS verification is disabled. By default the Node.js SDK(version 1.10.1 or higher) for the SQL API will not try to use the TLS/SSL certificate when connecting to the local emulator."
I tried adjusting the start script for my Node Docker image as suggested here...
If connecting to the Cosmos DB Emulator, disable TLS verification
for your node process:
process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";
const client = new CosmosClient({ endpoint, key });
...and changed the start script in my package.json from...
"start": "node $NODE_OPTIONS node_modules...."
...to...
"start": "NODE_TLS_REJECT_UNAUTHORIZED=0 node $NODE_OPTIONS node_modules...."
...and rebuilt my images, but still receive the same ECONNREFUSED error from the Node client/app.
As I was reading the documentation for the REST API I was reminded that, as opposed to using the CosmosClient (which just needs the base URL), to do a post to the API the url needs to be fully formed as indicated here...
Method: POST
Request URI: https://{databaseaccount}.documents.azure.com/dbs/{db-id}/colls/{coll-id}/docs
Description: The {databaseaccount} is the name of the Azure Cosmos DB account created under your subscription. The {db-id} value is the
user generated name/ID of the database, not the system generated ID
(rid). The {coll-id} value is the name of the collection that contains
the document.
After appending /dbs/SampleDB/colls/SampleCollection/docs (yes, my entities are CamelCase) to the base url offered by the Emulator UI's Quickstart URI (https://localhost:8081)... I am still getting the ECONNREFUSED error to http posts.
Hmm... retargeted the Node app to point to a collection in my Azure Cosmos DB, and I am still having no luck.
400: Invalid API version. Ensure a valid x-ms-version header value is
passed. Please update to the latest version of Azure Cosmos DB
SDK.ActivityId: bfdeb339-8fef-4ba9-a03d-444a8664c02b,
Microsoft.Azure.Documents.Common/2.11.0
Added x-ms-version and set it to 2018-12-31 (latest, as per here).
Now I am getting (after trying both my secondary, and primary keys... just in case)...
401: The input authorization token can't serve the request. Please
check that the expected payload is built as per the protocol, and
check the key being used. Server used the following payload to sign:
'postdocsdbs/TopHand/colls/SampleTbltue, 30 mar 2021 02:54:25
gmt'ActivityId: bb258bb4-f5a8-4495-b0b5-b54fa8b7c46f,
Microsoft.Azure.Documents.Common/2.11.0
I verified that the required headers are all present. What can possibly be left?!
Base URI for Azure Cosmos had a trailing /, which ended up duplicated when the rest of the path was appended. Fixing the url string, still getting the 401.
A github issue pointed me to what may have been an error in the URL/REST path I was posting to. Rather than posting to (what I had previously)...
dbs/SampleDb/colls/SampleTbl/docs
...I changed it to...
dbs/SampleDb/colls/SampleTbl
...and am now getting error 405, MethodNotAllowed, RequestHandler.Post. 405 isn't listed as code returned by the Cosmos REST service.
This example in the MS docs definitely uses the /docs string at the end of the url/REST path.
Example
POST https://querydemo.documents.azure.com/dbs/1KtjAA==/colls/1KtjAImkcgw=/docs HTTP/1.1
x-ms-documentdb-partitionkey: ["Andersen"]
x-ms-date: Tue, 29 Mar 2016 02:28:29 GMT
authorization: type%3dmaster%26ver%3d1.0%26sig%3d92WMAkQv0Zu35zpKZD%2bcGSH%2b2SXd8HGxHIvJgxhO6%2fs%3d
Cache-Control: no-cache
User-Agent: Microsoft.Azure.Documents.Client/1.6.0.0
x-ms-version: 2015-12-16
Accept: application/json
Host: querydemo.documents.azure.com
Cookie: x-ms-session-token#0=602; x-ms-session-token=602
Content-Length: 344
Expect: 100-continue
{
"id": "AndersenFamily",
"LastName": "Andersen",
}

I contacted MS support and was giving some info that unblocked me (but doesn't entirely address the issues noted above).
For my own use-case, simply setting a key and allowing network access to the emulator was sufficient.
Note: This doesn't address the issues of the Emulator's Data Explorer becoming nonfunctional.
The feedback I received from the support personnel in regard to using the command line switches disabling the UI was...
By changing the key to something other than default one, you also
protect your emulator data from being seen via the Data Explorer.
Apparently the key alone isn't enough to protect the data, and disabling the UI is a "feature".
Solution: Simply executing...
.\Microsoft.Azure.Cosmos.Emulator.exe /AllowNetworkAccess /Key={insert your base64 encoded 64+ character string}
...allowed network access to systems on the same host as the emulator. This avoided all the certificate/key generation/importing/etc headache.
You must connect to the non-loopback IP of the host the emulator is running on to connect to it (writes/reads/etc).

Azure IoT Device Offline Commands Issue

I have an Azure IoT Device in an IoT Central application.
We don't want it to execute offline commands. Is there any way to switch off this offline commands execution capability.

Based on my test (sync command), the behavior of executing "offline command" is working well. In the case, when the device is disconnected from Azure IoT Central App, the error Not Found is returned back after 30 seconds, see my example:
{
"error": {
"code": "NotFound",
"message": "Could not connect to device in order to send command. You can contact support at https://aka.ms/iotcentral-support. Please include the following information. Request ID: cic9xs38, Time: Sun, 09 Aug 2020 05:08:00 GMT.",
"requestId": "cic9xs38",
"time": "Sun, 09 Aug 2020 05:08:00 GMT"
}
}
and the following screen snippet shows a command history in the IoT Central App:
Note, that in the present version there is no feature such as re-executing (retrying) a sync or async command on the re-connected device. If the device is not connected, the command is completed with a failed status = NotFound, in other words, the command is invoking in the sync manner, see more details here.

How to run a http server on EMR master node of a Spark application

I have a Spark streaming application (Spark 2.4.4) running on AWS EMR 5.28.0. In the driver application on master node, besides setting up the spark streaming job, I am also running a http server (Akka-http 10.1.6) which can query the driver application for data, I bind to port 6161 like the following:
val bindingFuture: Future[ServerBinding] = Http().bindAndHandle(myapiroutes, "127.0.0.1", 6161)
try {
bindingFuture.map { serverBinding =>
log.info(s"AlertRestApi bound to ${serverBinding.localAddress}")
}
} catch {
case ex: Exception => {
log.error(s"Failed to bind to 127.0.0:6161")
system.terminate()
}
}
then I start spark streaming:
ssc.start()
When I test this on local spark, I am able to access http://localhost:6161/myapp/v1/data and get data from spark streaming, everything is good so far.
However, when I run this application in AWS EMR, I could not access port 6161. I ssh into the driver node and try to curl my url, it gives me error message:
[hadoop#ip-xxx-xx-xx-x ~]$ curl http://xxx.xx.xx.x:6161/myapp/v1/data
curl: (7) Failed to connect to xxx.xx.xx.x port 6161: Connection refused
when I look into the log in the driver node, I do see the port is bound (why the host shows 0:0:0:0:0:0:0:0? I don't know, that is the way in my dev testing, and it works, I see the same log and able to access the url):
20/04/13 16:53:26 INFO MyApp: MyRestApi bound to /0:0:0:0:0:0:0:0:6161
So my question is, what should I do so that I can access the api at port 6161 on the driver node? I realize Yarn resource manager may be involved but I know nothing about Yarn resource manager to point myself where to investigate.
Please help. Thanks

You are mentioning 127.0.0.1 as the host name or 0.0.0.0??
127.0.0.1 will work in your local system but not in AWS as it is loopback address. In such case you need to use 0.0.0.0 as the host name
Also make sure that ports are open and access is provided from your IP. To do that, go to Inbound rules for your instance and add 6161 under custom TCP rule if not done already.
Let me know if this makes any difference

Kafka Zookeeper Security Authentication & Authorization(JAAS) Using SASL

Regarding Kafka-Zookeeper Security using DIGEST MD5 Authentication, I am trying to rotate/change credentials/password for both server(zookeeper) and client(kafka) jaas config file.
We have a 3 node cluster of 3 zookeepers and 3 kafka broker nodes with below jaas configuration file.
kafka.conf
org.apache.zookeeper.server.auth.DigestLoginModule required
username="super"
password="password";
};
zookeeper.conf
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_super="password";
};
To rotate we do a rolling restart of server(zookeeper) instances after updating the credential(password) and during the process of rolling restart after updating the same credential/password for super user for client(kafka instances) one at a time, we notice
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
these info level in server logs, which eventually results in unclean shutdown and restart of the broker which impacts the writes and reads for longer than expected. I have tried commenting requireClientAuthScheme=sasl in zookeeper zoo.cfg https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication to allow any clients authenticate to zookeeper but no success.
Also, alternative approach - tried to update the credential/password in jaas config file dynamically using sasl.jaas.config and do get the same exception documented in this jira (reference: https://issues.apache.org/jira/browse/KAFKA-8010).
can someone have any suggestions? Thanks in advance.

Host resolution error while using node-rdkafka

I'm running node-rdkafka as a Node.js application. The consumer hangs indefinitely without pulling any messages from kafka (works on localhost).
Emits the below error,
{ Error: Local: Host resolution failure
origin: 'local',
message: 'host resolution failure',
code: -1,
errno: -1,
stack: 'Error: Local: Host resolution failure' }
The application works to the point of receiving data from kafka. The kafka instance is fine, validated by producing and consuming messages using the console.
Any help with debugging why this is occurring is much appreciated.
Sample consumer code here - https://github.com/Blizzard/node-rdkafka/blob/master/examples/consumer-flow.md

This issue happens due to the different networks of your client and broker.
The simple hack is to make host entry of advertised.listeners
For example,
advertised.listeners=PLAINTEXT://kafka:9092
Then add an entry in /etc/hosts with your kafka-broker-IP. For e.g. kafka-borker-IP is 192.168.1.1
192.168.1.1 kafka
You can use kafkacat utility to check your broker's IP.
kafkacat -b kafka:9092 -L
It will return metadata about the brokers.
You need to check that returned broker's IP is reachable or not from your machine.
For a better understanding of this issue.
You can refer https://www.confluent.io/blog/kafka-listeners-explained/

I had this exact problem when running kafka locally using the quick start instructions from https://kafka.apache.org/quickstart
For me, adding the following two lines to config/server.properties before starting kafka-server has solved the issue -
listeners=PLAINTEXT://localhost:9092
advertised.listeners=PLAINTEXT://localhost:9092

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string