What does "Got unknown event 17 ... continuing ..." mean with MPI

What does "Got unknown event 17 ... continuing ..." mean with MPI - multithreading

I am running an MPI job and am getting this warning message:
[comet-05-08.sdsc.edu:mpi_rank_10][async_thread] Got unknown event 17 ... continuing ...
I am compiling with icc (ICC) 15.0.2 20150121 using MVAPICH 2.1.
What does the message mean? Is it harmful?

From this mailing list:
this error message is being printed by the asynchronous progress
thread because of receving an IBV_EVENT_CLIENT_REREGISTER event (event
#17).
It is suggested that you update to the latest version. The mail I linked to suggest MVAPICH2 1.4 (which is newer than yours), despite that the fact that the mail is from 2009.
The code, that probably generates that is:
switch (event.event_type) {
...
break;
default:
NEM_IB_ERR("Got unknown event %d ... continuing ...",
event.event_type);
}
where you can find the full code here.
As indicated in the comment section:
IBV_EVENT_CLIENT_REREGISTER
The SM requests that the client will reregister to all subscriptions
previously requested from this port, for example (but not limited to)
join a multicast group. This event may be generated when the SM
suffered from a failure, which caused it to lose his records or when
there is new SM in the subnet.
This event will be generated by the device only if the bit that
indicates that client reregister is supported set in
port_attr.port_cap_flags.
Source
I wouldn't be happy with that event, so if I were you, I would update. If the issue persists, I would contact the MVAPICH2 people.

Related

how to cleanly shutdown high-concurrency Jms.messageDrivenChannelAdapter?

When I try to shutdown my spring-integration process, the flow using an inbound Jms.messageDrivenChannelAdapter throws the following error message:
"org.springframework.jms.listener.DefaultMessageListenerContainer - Rejecting received message because of the listener container been stopeed in the meantime"
my inbound adapter is defined as follows:
Jms.messageDrivenChannelAdapter(
Jms.container(jmsConnectionFactory, destinationName)
.concurrency(highConcurrency)
.get()
)
I believe that my problem is that the default "receiveTimeout" on my jms container is too small and that I need to increase that value to cater for my "high-concurrency" (right ?), as "receiveTimeout" seems to be the only value the container "doShutdown" method cares about.
Now, the sourceCode for the receiveTimeout property says "this value needs to be smaller than the transaction timeout". Also the spring-integration doco regarding inbound jms adapters says "if you want the entire flow to be transactional [...] consider using a jms-message-driven-channel-adapter with acknowledge set to transacted (the default)", which seems to imply that the jms adapter is transactional by default.
Hence, my main question is: even though I'm not using any explicit transaction manager, do I need to not only explicitely set "receiveTimeout" on my container but also "transactionTimeout" with transactionTimeout > receiveTimeout ?
Thanks a lot in advance for your expertise and your time.
Best Regards

That is not "throws". That is just warn:
protected void doExecuteListener(Session session, Message message) throws JMSException {
if (!isAcceptMessagesWhileStopping() && !isRunning()) {
if (logger.isWarnEnabled()) {
logger.warn("Rejecting received message because of the listener container " +
"having been stopped in the meantime: " + message);
}
rollbackIfNecessary(session);
throw new MessageRejectedWhileStoppingException();
}
And pay attention to that rollbackIfNecessary(session);. So, even if the received message slips somehow into this listener function, the whole environment makes it sure that the state is not broken and the data is not lost - the session is rolled back.
The transactionTimeout does not make sense if you don't use a transactionManager. Spring Integration makes it transacted exactly for the use-case we see around that warn log.

zeromq.js - Await connection

The documentation for connect method says,
Connects to the socket at the given remote address and returns immediately. The connection will be made asynchronously in the background.
But, await does not seem to be applicable as shown in their example of subscriber code.
subscriber.js
const zmq = require("zeromq")
async function run() {
const sock = new zmq.Subscriber
sock.connect("tcp://127.0.0.1:3000") //Happens async; can we await this?
sock.subscribe("kitty cats")
console.log("Subscriber connected to port 3000")
for await (const [topic, msg] of sock) {
console.log("received a message related to:", topic, "containing message:", msg)
}
}
run()
Also, what error(s) maybe raised by the connect() method? I provided an 'obscene' port number such as, 8124000, to connect. I was hoping for some error messages to be raised.

Q : "what error(s) maybe raised by the connect() method?"
The Error(s) part
The ZeroMQ native API distinguishes ( unchanged since v2.1 ) these errors for this :
EINVAL
The endpoint supplied is invalid.
EPROTONOSUPPORT
The requested transport protocol is not supported.
ENOCOMPATPROTO
The requested transport protocol is not compatible with the socket type.
ETERM
The ØMQ context associated with the specified socket was terminated.
ENOTSOCK
The provided socket was invalid.
EMTHREAD
No I/O thread is available to accomplish the task.
Yet your actual observer is dependent on the zeromq.js re-wrapping these principal states, so the best next step is to re-read the wrapper source code, so as to see, how these native API error states get actually handled inside the zeromq.js-wrapper.
The remarks :
The following socket events can be generated. This list may be different depending on the ZeroMQ version that is used.
Note that the error event is avoided by design, since this has a special behaviour in Node.js causing an exception to be thrown if it is unhandled.
Other error names are adjusted to be as close to possible as other networking related event names in Node.js and/or to the corresponding ZeroMQ.js method call. Events (including any errors) that correspond to a specific operation are namespaced with a colon :, e.g. bind:error or connect:retry.
are nevertheless quite warning, aren't they?
The await part
The MCVE-code ( as-is ) is unable to reproduce the live-session, so best adapt the MCVE-code so as to get run-able and we can proceed further on this.

Twilio Taskrouter: How to prevent last worker in queue from being reassigned rejected task?

I'm using NodeJS to manage a Twilio Taskrouter workflow. My goal is to have a task assigned to an Idle worker in the main queue identified with queueSid, unless one of the following is true:
No workers in the queue are set to Idle
Reservations for the task have already been rejected by every worker in the queue
In these cases, the task should fall through to the next queue identified with automaticQueueSid. Here is how I construct the JSON for the workflow (it includes a filter such that an inbound call from an agent should not generate an outbound call to that same agent):
configurationJSON(){
var config={
"task_routing":{
"filters":[
{
"filter_friendly_name":"don't call self",
"expression":"1==1",
"targets":[
{
"queue":queueSid,
"expression":"(task.caller!=worker.contact_uri) and (worker.sid NOT IN task.rejectedWorkers)",
"skip_if": "workers.available == 0"
},
{
"queue":automaticQueueSid
}
]
}
],
"default_filter":{
"queue":queueSid
}
}
}
return config;
}
This results in no reservation being created after the task reaches the queue. My event logger shows that the following events have occurred:
workflow.target-matched
workflow.entered
task.created
That's as far as it gets and just hangs there. When I replace the line
"expression":"(task.caller!=worker.contact_uri) and (worker.sid NOT IN task.rejectedWorkers)"
with
"expression":"(task.caller!=worker.contact_uri)
Then the reservation is correctly created for the next available worker, or sent to automaticQueueSid if no workers are available when the call comes in, so I guess the skip_if is working correctly. So maybe there is something wrong with how I wrote the target expression?
I tried working around this by setting a worker to unavailable once they reject a reservation, as follows:
clientWorkspace
.workers(parameters.workerSid)
.reservations(parameters.reservationSid)
.update({
reservationStatus:'rejected'
})
.then(reservation=>{
//this function sets the worker's Activity to Offline
var updateResult=worker.updateWorkerFromSid(parameters.workerSid,process.env.TWILIO_OFFLINE_SID);
})
.catch(err=>console.log("/agent_rejects: error rejecting reservation: "+err));
But what seems to be happening is that as soon as the reservation is rejected, before worker.updateWorkerFromSid() is called, Taskrouter has already generated a new reservation and assigned it to that same worker, and my Activity update fails with the following error:
Error: Worker [workerSid] cannot have its activity updated while it has 1 pending reservations.
Eventually, it seems that the worker is naturally set to Offline and the task does time out and get moved into the next queue, as shown by the following events/descriptions:
worker.activity.update
Worker [friendly name] updated to Offline Activity
reservation.timeout
Reservation [sid] timed out
task-queue.moved
Task [sid] moved out of TaskQueue [friendly name]
task-queue.timeout
Task [sid] timed out of TaskQueue [friendly name]
After this point the task is moved into the next queue automaticQueueSid to be handled by available workers registered with that queue. I'm not sure why a timeout is being used, as I haven't included one in my workflow configuration.
I'm stumped--how can I get the task to successfully move to the next queue upon the last worker's reservation rejection?
UPDATE: although #philnash's answer helped me correctly handle the worker.sid NOT IN task.rejectedWorkers issue, I ultimately ended up implementing this feature using the RejectPendingReservations parameter when updating the worker's availability.

Twilio developer evangelist here.
rejectedWorkers is not an attribute that is automatically handled by TaskRouter. You reference this answer by my colleague Megan in which she says:
For example, you could update TaskAttributes to have a rejected worker SID list, and then in the workflow say that worker.sid NOT IN task.rejectedWorkerSids.
So, in order to filter by a rejectedWorkers attribute you need to maintain one yourself, by updating the task before you reject the reservation.
Let me know if that helps at all.

What is the timing of ClearOutput method in OPOS-OLE for Retail POS?

I am developing an interface of OPOS for CAT-Credit Authorization Terminal using VC++ 2010. My question is about ClearOutput method if anyone here have tried coding opos controls. When I call that function it returns code 106 which means OPOS_E_ILLEGAL.
here are the sequence of my code:
OPOSCAT.Open()
OPOSCAT.ClaimDevice()
OPOSCAT.DeviceEnabled = True
OPOSCAT.Asyncmode = true
Perform SALES and it fires event, wait until it's finish, calls OutputCompleteEvent
OPOSCAT.ClearOutput()
OPOSCAT.DeviceEnabled = False
OPOSCAT.Asyncmode = False
OPOSCAT.ReleaseDevice()
OPOSCAT.Close()
Click here for more reference:
See Chapter 5
Thanks

First, that PDF is for 1.6, which is over 15 years out of date. See http://monroecs.com/unifiedpos.htm for the current version of OPOS, which is 1.14. If nothing else you should upgrade for PCI compliance reasons.
OPOS Common Controls are pretty generic, and wouldn't have a lot of reasons to return that error on the clearOutput() method. The only thing I can think of would be if it's not in a legal state (claimed) to call it. Perhaps the previous call created an error condition so bad the device changed the state from claimed to released?
If that's not it, it's probably a device specific error. Contact the device vendor who provided the service object. You could try downloading the debug version of OPOS and enabling logging, which would provide more evidence to support this claim.

For details, please check the specification of the CAT service object you are using.
Probably because ClearOutput was called after OutputCompleteEvent, that is, after asynchronous processing of SALES has ended.
In general, the ClearOutput method is called to cancel the processing during asynchronous processing.
In your code, it is part of "wait until it's finish".
If you call the ClearOutput method while waiting for this completion, the ClearOutput method will succeed and the SALES operation (such as AuthorizeSales) will be canceled.
However, depending on the specification of the CAT service object you are using, the service object may not support the ClearOutput method.
In that case, an error occurs regardless of the time of the call.

Can the Azure Service Bus be delayed before retrying a message?

The Azure Service Bus supports a built-in retry mechanism which makes an abandoned message immediately visible for another read attempt. I'm trying to use this mechanism to handle some transient errors, but the message is made available immediately after being abandoned.
What I would like to do is make the message invisible for a period of time after it is abandoned, preferably based on an exponentially incrementing policy.
I've tried to set the ScheduledEnqueueTimeUtc property when abandoning the message, but it doesn't seem to have an effect:
var messagingFactory = MessagingFactory.CreateFromConnectionString(...);
var receiver = messagingFactory.CreateMessageReceiver("test-queue");
receiver.OnMessageAsync(async brokeredMessage =>
{
await brokeredMessage.AbandonAsync(
new Dictionary<string, object>
{
{ "ScheduledEnqueueTimeUtc", DateTime.UtcNow.AddSeconds(30) }
});
}
});
I've considered not abandoning the message at all and just letting the lock expire, but this would require having some way to influence how the MessageReceiver specifies the lock duration on a message, and I can't find anything in the API to let me change this value. In addition, it wouldn't be possible to read the delivery count of the message (and therefore make a decision for how long to wait for the next retry) until after the lock is already required.
Can the retry policy in the Message Bus be influenced in some way, or can a delay be artificially introduced in some other way?

Careful here because I think you are confusing the retry feature with the automatic Complete/Abandon mechanism for the OnMessage event-driven message handling. The built in retry mechanism comes into play when a call to the Service Bus fails. For example, if you call to set a message as complete and that fails, then the retry mechanism would kick in. If you are processing a message an exception occurs in your own code that will NOT trigger a retry through the retry feature. Your question doesn't get explicit on if the error is from your code or when attempting to contact the service bus.
If you are indeed after modifying the retry policy that occurs when an error occurs attempting to communicate with the service bus you can modify the RetryPolicy that is set on the MessageReciver itself. There is an RetryExponitial which is used by default, as well as an abstract RetryPolicy you can create your own from.
What I think you are after is more control over what happens when you get an exception doing your processing, and you want to push off working on that message. There are a few options:
When you create your message handler you can set up OnMessageOptions. One of the properties is "AutoComplete". By default this is set to true, which means as soon as processing for the message is completed the Complete method is called automatically. If an exception occurs then abandon is automatically called, which is what you are seeing. By setting the AutoComplete to false you required to call Complete on your own from within the message handler. Failing to do so will cause the message lock to eventually run out, which is one of the behaviors you are looking for.
So, you could write your handler so that if an exception occurs during your processing you simply do not call Complete. The message would then remain on the queue until it's lock runs out and then would become available again. The standard dead lettering mechanism applies and after x number of tries it will be put into the deadletter queue automatically.
A caution of handling this way is that any type of exception will be treated this way. You really need to think about what types of exceptions are doing this and if you really want to push off processing or not. For example, if you are calling a third party system during your processing and it gives you an exception you know is transient, great. If, however, it gives you an error that you know will be a big problem then you may decide to do something else in the system besides just bailing on the message.
You could also look at the "Defer" method. This method actually will then not allow that message to be processed off the queue unless it is specifically pulled by its sequence number. You're code would have to remember the sequence number value and pull it. This isn't quite what you described though.
Another option is you can move away from the OnMessage, Event-driven style of processing messages. While this is very helpful you don't get a lot of control over things. Instead hook up your own processing loop and handle the abandon/complete on your own. You'll also need to deal some of the threading/concurrent call management that the OnMessage pattern gives you. This can be more work but you have the ultimate in flexibility.
Finally, I believe the reason the call you made to AbandonAsync passing the properties you wanted to modify didn't work is that those properties are referring to Metadata properties on the method, not standard properties on BrokeredMessage.

I actually asked this same question last year (implementation aside) with the three approaches I could think of looking at the API. #ClemensVasters, who works on the SB team, responded that using Defer with some kind of re-receive is really the only way to control this precisely.
You can read my comment to his answer for a specific approach to doing it where I suggest using a secondary queue to store messages that indicate which primary messages have been deferred and need to be re-received from the main queue. Then you can control how long you wait by setting the ScheduledEnqueueTimeUtc on those secondary messages to control exactly how long you wait before you retry.

I ran into a similar issue where our order picking system is legacy and goes into maintenance mode each night.
Using the ideas in this article(https://markheath.net/post/defer-processing-azure-service-bus-message) I created a custom property to track how many times a message has been resubmitted and manually dead lettering the message after 10 tries. If the message is under 10 retries it clones the message increments the custom property and sets the en queue of the new message.
using Microsoft.Azure.ServiceBus;
public PickQueue()
{
queueClient = new QueueClient(QUEUE_CONN_STRING, QUEUE_NAME);
}
public async Task QueueMessageAsync(int OrderId)
{
string body = JsonConvert.SerializeObject(OrderId);
var message = new Message(Encoding.UTF8.GetBytes(body));
await queueClient.SendAsync(message);
}
public async Task ReQueueMessageAsync(Message message, DateTime utcEnqueueTime)
{
int resubmitCount = (int)(message.UserProperties["ResubmitCount"] ?? 0) + 1;
if (resubmitCount > 10)
{
await queueClient.DeadLetterAsync(message.SystemProperties.LockToken);
}
else
{
Message clone = message.Clone();
clone.UserProperties["ResubmitCount"] = ++resubmitCount;
await queueClient.ScheduleMessageAsync(message, utcEnqueueTime);
}
}

This question asks how to implement exponential backoff in Azure Functions. If you do not want to use the built-in RetryPolicy (only available when autoComplete = false), here's the solution I've been using:
public static async Task ExceptionHandler(IMessageSession MessageSession, string LockToken, int DeliveryCount)
{
if (DeliveryCount < Globals.MaxDeliveryCount)
{
var DelaySeconds = Math.Pow(Globals.ExponentialBackoff, DeliveryCount);
await Task.Delay(TimeSpan.FromSeconds(DelaySeconds));
await MessageSession.AbandonAsync(LockToken);
}
else
{
await MessageSession.DeadLetterAsync(LockToken);
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string