Is there a way to check if there exists a Pulsar producer with the same name on the same topic? - apache-pulsar

Pulsar allows multiple producers to subscribe to the same topic only if they have different producer names. Is there a way to check if a producer with the same name (and same topic) already exists?

You can use the stats command from the pulsar-admin CLI tool to list all of the producers attached to the topic as follows, then just look inside the publishers section of the JSON output for the producerName
root#6b40ffcc05ec:/pulsar# ./bin/pulsar-admin topics stats persistent://public/default/test-topic
{
"msgRateIn" : 19.889469865137894,
"msgThroughputIn" : 1253.0366015036873,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"bytesInCounter" : 65442,
"msgInCounter" : 1002,
"bytesOutCounter" : 0,
"msgOutCounter" : 0,
"averageMsgSize" : 63.0,
"msgChunkPublished" : false,
"storageSize" : 65442,
"backlogSize" : 0,
"publishers" : [ {
"msgRateIn" : 19.889469865137894,
"msgThroughputIn" : 1253.0366015036873,
"averageMsgSize" : 63.0,
"chunkedMessageRate" : 0.0,
"producerId" : 0,
"metadata" : { },
"producerName" : "standalone-3-1",
"connectedSince" : "2020-08-06T15:51:48.279Z",
"clientVersion" : "2.6.0",
"address" : "/127.0.0.1:53058"
} ],
"subscriptions" : { },
"replication" : { },
"deduplicationStatus" : "Disabled"
}

Related

Structured Spark streaming metrics retrieval

I have an application with structured Spark streaming and I would like to get some metrics like scheduling delay, latency, etc. Usually, such metrics can be found in Spark UI Streaming tab, however, such functionality does not exist for structured streaming as far as I know.
So how can I get these metrics values?
For now, I have tried to use query progress but not all of the required metrics can be found in the results:
QueryProgress {
"timestamp" : "2019-11-19T20:14:07.011Z",
"batchId" : 1,
"numInputRows" : 8,
"inputRowsPerSecond" : 0.8429038036034138,
"processedRowsPerSecond" : 1.1210762331838564,
"durationMs" : {
"addBatch" : 6902,
"getBatch" : 1,
"getEndOffset" : 0,
"queryPlanning" : 81,
"setOffsetRange" : 20,
"triggerExecution" : 7136,
"walCommit" : 41
},
"stateOperators" : [ {
"numRowsTotal" : 2,
"numRowsUpdated" : 2,
"memoryUsedBytes" : 75415,
"customMetrics" : {
"loadedMapCacheHitCount" : 400,
"loadedMapCacheMissCount" : 0,
"stateOnCurrentVersionSizeBytes" : 17815
}
} ],
"sources" : [ {
"description" : "KafkaV2[Subscribe[tweets]]",
"startOffset" : {
"tweets" : {
"0" : 579
}
},
"endOffset" : {
"tweets" : {
"0" : 587
}
},
"numInputRows" : 8,
"inputRowsPerSecond" : 0.8429038036034138,
"processedRowsPerSecond" : 1.1210762331838564
} ]

How to group a document with the same name that has different values for a specific attribute in one array using Mongodb?

If I have these objects :
{
"_id" : ObjectId("5caf2c1642e3731464c2c79d"),
"requested" : [],
"roomNo" : "E0-1-09",
"capacity" : 40,
"venueType" : "LR(M)",
"seatingType" : "TB",
"slotStart" : "8:30AM",
"slotEnd" : "9:50AM",
"__v" : 0
}
/* 2 */
{
"_id" : ObjectId("5caf2deb4a7f5222305b55d5"),
"requested" : [],
"roomNo" : "E0-1-09",
"capacity" : 40,
"venueType" : "LR(M)",
"seatingType" : "TB",
"slotStart" : "10:00AM",
"slotEnd" : "11:20AM",
"__v" : 0
}
is it possible to get something like this using aggregate in mongodb?
[{ roomNo: "E0-1-09" , availability : [{slotStart : "8:30AM", slotEnd: "9:50AM"} ,
{slotStart: "10:00AM", slotEnd : "11:20AM"}]
what im using currently:
db.getDB().collection(collection).aggregate([
{ $group: {_id:{roomNo: "$roomNo", availability :[{slotStart:"$slotStart", slotEnd:"$slotEnd"}]}}}
])
actually getting it twice like so :
[{ roomNo: "E0-1-09" , availability : [{slotStart : "8:30AM", slotEnd: "9:50AM"}]
[{ roomNo: "E0-1-09" , availability : [{slotStart: "10:00AM", slotEnd : "11:20AM"}]
You have to use $push accumulator
db.collection.aggregate([
{ "$group": {
"_id": "$roomNo",
"availability": {
"$push": {
"slotEnd": "$slotEnd",
"slotStart": "$slotStart"
}
}
}}
])

Number of input rows in spark structured streaming with custom sink

I'm using a custom sink in structured stream (spark 2.2.0) and noticed that spark produces incorrect metrics for number of input rows - it's always zero.
My stream construction:
StreamingQuery writeStream = session
.readStream()
.schema(RecordSchema.fromClass(TestRecord.class))
.option(OPTION_KEY_DELIMITER, OPTION_VALUE_DELIMITER_TAB)
.option(OPTION_KEY_QUOTE, OPTION_VALUE_QUOTATION_OFF)
.csv(s3Path.toString())
.as(Encoders.bean(TestRecord.class))
.flatMap(
((FlatMapFunction<TestRecord, TestOutputRecord>) (u) -> {
List<TestOutputRecord> list = new ArrayList<>();
try {
TestOutputRecord result = transformer.convert(u);
list.add(result);
} catch (Throwable t) {
System.err.println("Failed to convert a record");
t.printStackTrace();
}
return list.iterator();
}),
Encoders.bean(TestOutputRecord.class))
.map(new DataReinforcementMapFunction<>(), Encoders.bean(TestOutputRecord.clazz))
.writeStream()
.trigger(Trigger.ProcessingTime(WRITE_FREQUENCY, TimeUnit.SECONDS))
.format(MY_WRITER_FORMAT)
.outputMode(OutputMode.Append())
.queryName("custom-sink-stream")
.start();
writeStream.processAllAvailable();
writeStream.stop();
Logs:
Streaming query made progress: {
"id" : "a8a7fbc2-0f06-4197-a99a-114abae24964",
"runId" : "bebc8a0c-d3b2-4fd6-8710-78223a88edc7",
"name" : "custom-sink-stream",
"timestamp" : "2018-01-25T18:39:52.949Z",
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"getOffset" : 781,
"triggerExecution" : 781
},
"stateOperators" : [ ],
"sources" : [ {
"description" : "FileStreamSource[s3n://test-bucket/test]",
"startOffset" : {
"logOffset" : 0
},
"endOffset" : {
"logOffset" : 0
},
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "com.mycompany.spark.MySink#f82a99"
}
}
Do I have to populate any metrics in my custom sink to be able to track progress? Or could it be a problem in FileStreamSource when it reads from s3 bucket?
The problem was related to using dataset.rdd in my custom sink that creates a new plan so that StreamExecution doesn't know about it and therefore is not able to get metrics.
Replacing data.rdd.mapPartitions with data.queryExecution.toRdd.mapPartitions fixes the issue.

mongodb taking too much time for old entries

i am new in mongodb and i am facing an issue, i have around millions of documents in my collectionand i am trying to find single entry using findOne({}) command and when i am trying to find recent entries then response comes in miliseconds but when i am trying to fetch older entries around 600 millionth document then it takes around 2 minutes on mongo shell and my node server gives
{ MongoErro : connection 1 to 127.0.0.1:27017 timed out }
and my nodejs server sends an empty response. can any one tell me what should i do to resolve this issueThanks in advance
explain gives me
db.contacts.find({"phoneNumber":"9165900137"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "meanApp.contacts",
"indexFilterSet" : false,
"parsedQuery" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 321188,
"totalKeysExamined" : 0,
"totalDocsExamined" : 495587806,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 295230,
"works" : 495587808,
"advanced" : 1,
"needTime" : 495587806,
"needYield" : 0,
"saveState" : 3871779,
"restoreState" : 3871779,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 495587806
}
},
"serverInfo" : {
"host" : "li1025-15.members.linode.com",
"port" : 27017,
"version" : "3.2.16",
"gitVersion" : "056bf45128114e44c5358c7a8776fb582363e094"
},
"ok" : 1
}
As indicated in the explain plan results, the current query is doing Collection Scan. This means it has to scan every document in collection to produce the match and you have got about half a billion documents.
Try adding this index and it might take a bit to create it.
db.contacts.createIndex( { phoneNumber: 1 }, { background: true } )
Run the query once the index creation is successful, you must see a dramatic improvement in performance. To be certain whether index got picked up, try explain again and it should no longer say COLLSCAN.

Query time mongoose occasionally taking 3-4 seconds

I've written a back end node server for a multiplayer game I'm developing and most of the time each request takes about 20-100ms to resolve. However, sometimes (Maybe 1 out of 50 requests) I will do the same request and it will take 2000+ms to resolve.
The server is written entirely in node.js and is hosted on heroku. I am using mongoose to make the calls to the database.
Here is a screenshot of the logs, at the top you can see how queries normally function. The request comes in at 19:03:03.68 and the response is sent out at 19:03:03.73, saving all the data finishes at at 19:03:03.74. Heroku logs the request as taking 58ms which is the desired and expect outcome.
Below that is when the issue occurs. You can see multiple requests come in from two separate clients (Each client sends ~1 request per second which is correct) However the requests build up and after about 2000-5000ms they will all quickly resolve one after another. I’ve tried narrowing down the issue without much luck, but I believe it’s related to when I query the database as you can see multiple requests come in but the first query to the database doesn’t actually resolve until around 2300ms later. As far as I can tell these requests are identical to the ones that resolve in 20-100ms and occur completely at random.
The actual code is similar to this on the server (Simplified for the sake of this question):
console.log (“request received”);
Game.findOne({‘id’: gameID}, function(err, theGame){
console.log("First Query");
I also opened up the mongo shell for the database to look for queries taking an excessive amount of time (>2000ms) with this code:
db.system.profile.find( {millis: {$gt : 2000} } ).sort( { ts: 1} );
Here are the slightly modified results which should include everything relevant:
{ "op" : "update", "ns" : "theDb.players", "query" :
{ "_id" : ObjectId("572b8eb242d70903005df0df")
}, "updateobj" :
{ "$set" :
{ "lastSeen" : ISODate("2016-05-05T18:19:30.761Z"), "timeElapsed" : 16
}
}, "nscanned" : 1, "nscannedObjects" : 1, "nMatched" : 1, "nModified" : 1, "fastmod" : true, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 0, "locks" :
{ "Global" :
{ "acquireCount":
{ "r" : NumberLong(2), "w" : NumberLong(2) }
}, "MMAPV1Journal" :
{ "acquireCount" :
{ "w" : NumberLong(2) }, "acquireWaitCount" :
{ "w" : NumberLong(1) }, "timeAcquiringMicros" :
{ "w" : NumberLong(7294179) }
}, "Database" :
{ "acquireCount" :
{ "w" : NumberLong(2) }
}, "Collection" :
{ "acquireCount" :
{ "W" : NumberLong(1) }
}, "oplog" :
{ "acquireCount" :
{ "w" : NumberLong(1) }
}
}, "milli" : 2298, "execStats" : {}, "ts" : ISODate("2016-05-05T18:19:33.060Z")
Second Result:
{ "op" : "update", "ns" : "theDb.connections", "query" :
{ "_id" : ObjectId("572b8eaf42d70903005df0dd")
}, "updateobj" :
{ "$set" :
{ "internalCounter" : 3, "lastCount" : 3, "lastSeen" : ISODate("2016-05-05T18:19:30.761Z"), "playerID" : 128863276517, "sinceLast" : 0
}
}, "nscanned" : 1, "nscannedObjects" : 1, "nMatched" : 1, "nModified" : 1, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 0, "locks" :
{ "Global" :
{ "acquireCount" :
{ "r" : NumberLong(2), "w" : NumberLong(2)
}
}, "MMAPV1Journal" :
{ "acquireCount" :
{ "w" : NumberLong(2) }, "acquireWaitCount" :
{ "w" : NumberLong(1) }, "timeAcquiringMicros" :
{ "w" :NumberLong(7294149) }
}, "Database" :
{ "acquireCount" :
{ "w" : NumberLong(2) }
}, "Collection" :
{ "acquireCount" :
{ "W" : NumberLong(1) }
}, "oplog" :
{ "acquireCount" :
{ "w" : NumberLong(1) }
}
}, "millis" : 2299, "execStats" : {},"ts" : ISODate("2016-05-05T18:19:33.061Z")
I really need to ensure the latency for any request never exceeds 500ms otherwise it extremely irritating in the game itself. I’m really at a loss for what might be causing this and how to figure out more.
I'm assuming the cause for the issue is that timeAcquiringMicros is so long. I'm unsure of what is causing this though.
*Note, the client is requesting the data with just standard http requests, I’m not currently using any sockets.
Alright, I've finally solved the issue. The problem wasn't actually connected to anything that I had done. I was using the sandbox plan that mlab offers in connection to heroku which had my application competing for processing time with other people also using the sandbox plan. Their queries were slowing down the database causing those spikes in response times.
The solution: I had to upgrade to their shared cluster plan. Since upgrading I haven't had any irregularities in query times.

Resources