Changing the size of status message in MS CRM Workflows - dynamics-crm-2011

We are having some workflows crash due to a workflow status message being too large. The messages are set when we stop workflow with status cancelled and then we set the status message.
Is there a way to increase the size of this field?

I don't believe that can be customised. You are probably better off using a shorter message.

My solution was to create a plugin that takes in the message and limits it to 2000 adding ... on the end.

Related

Docusign Bulk Send Api Queue/Sent/Failed count is not accurate

Recently I am heavily dealing with Docusign Api. Especially Bulk Send Rest api method since we have requirements to send 30K envelope in 3-4 hours. Given the Api Rule Limits, led us to leverage bulk send feature.
Since Bulk send has some limitation, like it has its own queue mechanism where queue size can not exceed 2000, I am implementing my solution by respecting to this limit.
To do that, I divided my bulk recipient file (30 K recipient) into 30 CSV file.
Then I initiate for each loop and inside the loop I am controlling queue size if the queued item count became 0 for the batch. During my many tests even though all email has been reached to my inbox, I have never seen queued property to become 0. If it would become 0, then I would send the next batch. But I could never do that
Below is the screenshot I took from ApiExplorer.
If I look deeper for Trial 3 to see what are those 24 queued items as seen below.
I am getting following response.
As you can see from the latest screenshot, even though queued property indicates that there are some pending items, resultSetSize property shows 0 although I just queried queued items.
For this reason I am not able to build my logic based on sent, queued, failed property value. I thought, I could rely on them to successfully build my logic. If not, how can I overcome this problem ? Any help would be appreciated.
Thank you in advance

Azure Application Insights Alerts work only once

I am testing Azure Application Insights alert functionality. It seems to be either buggy or I don't know how to use it.
If I create a new alert, based on the metric 'Server Exceptions', it seems to work once then never again. Once it fires, it seems to go into a state of 'Active' where there is an orange triangle with an !. See the image below. I created a new one, that I haven't triggered, and as can be seen in the image it has a green circle with a tick.
This sort of implies to me that an alert won't fire again until one 'acknowledges' the alert, which is not a bad idea, but I can't see how to do that.
Edit :
I have just tried to use the 'Exception Rate' as suggested, but I think the minimum threshold to fire the alert would be an average of 1 exception per second over a 5 minute period.
I must say it seems strange that my use-case isn't handled. I have a light weight Web API service that is so simple it should never fail but it could, and as a result if an exception occurs I want to receive an alert straight away.
Alert is supposed to resolve and state is supposed to get back to green when the condition of the alert is no longer fulfilled.
This is exceptionally hard to achieve with "Count" metrics because they go up and up and almost never down. It means that, once fired, the alert won't resolve because the value of the metric stays over the threshold all the time.
You can try to set an alert on the "Rate" metric instead and you should see that the state is returning to green when the "Rate" is within the limits you set.
This is now fixed. Please let us know if you see any issues. Some things to keep in mind:
Alert rules are evaluated on a sliding window: an alert would trigger/resolve based on how the condition evaluates on a sliding window from the instant a sample arrives.
A caveat to the above for exception count based alert rules: we will resolve an alert if there are no exceptions reported for the time window configured in the rule.
Note: this is different from metrics based rules – lack of data does not result in the alert being resolved for those.
"Server exception" metric works as OP expects now in 2018. My use case below:
For the goal of getting an email whenever an Exception happened.
Use "Server exception" metric.
That metric is smart enough to auto-resolve after waiting the period's length of time after the initial alert, if the error has not occurred again.
So you'll have the initial "Alert", then 5 minutes later of no Exceptions, it returns a "Healthy" state.
And since it auto-resolved, if the error happens again tomorrow it will do the "Alert" again.
Note this was using App Insights with a Function App. The Function App Failure metric had problems and wasn't reliable for this (Azure kept logging 0.2 Exception/s and thinking that was over the 1 in 5 min threshold...)

ServiceStack Message queue .outq max size is 100?

I'm setting up a message queue using ServiceStack-v3 that looks like this
ClaimImport -> Validation -> Success
I've added hundreds of ClaimImports with no problem, the .inq count is correct. The issue is I want to see how many claims were imported by checking the ClaimsImport.outq. It never seems to go past 101. Is there some other way I could check this, or is this max limit intentional?
This is the default limit added on RedisMessageQueueClient.MaxSuccessQueueSize. The purpose of the .outq is to be a rolling log of recently processed messages.
Clients can subscribe to the QueueNames.TopicOut to get notified when a message is published to the .outq.

Windows Azure staging <--> production causing conflicts & errors on table storage

We had a terrible problem/experience yesterday when trying to swap our staging <--> production role.
Here is our setup:
We have a workerrole picking up messages from the queue. These messages are processed on the role. (Table Storage inserts, db selects etc ). This can take maybe 1-3 seconds per queue message depending on how many table storage posts he needs to make. He will delete the message when everything is finished.
Problem when swapping:
When our staging project went online our production workerrole started erroring.
When the role wanted to process queue messsage it gave a constant stream of 'EntityAlreadyExists' errors. Because of these errors queue messages weren't getting deleted. This caused the queue messages to be put back in the queue and back to processing and so on....
When looking inside these queue messages and analysing what would happend with them we saw they were actually processed but not deleted.
The problem wasn't over when deleting these faulty messages. Newly queue messages weren't processed as well while these weren't processed yet and no table storage records were added, which sounds very strange.
When deleting both staging and producting and publishing to production again everything started to work just fine.
Possible problem(s)?
We have litle 2 no idea what happened actually.
Maybe both the roles picked up the same messages and one did the post and one errored?
...???
Possible solution(s)?
We have some idea's on how to solve this 'problem'.
Make a poison message fail over system? When the dequeue count gets over X we should just delete that queue message or place it into a separate 'poisonqueue'.
Catch the EntityAlreadyExists error and just delete that queue message or put it in a separate queue.
...????
Multiple roles
I suppose we will have the same problem when putting up multiple roles?
Many thanks.
EDIT 24/02/2012 - Extra information
We actually use the GetMessage()
Every item in the queue is unique and will generate unique messages in table Storage. Little more information about the process: A user posts something and will have to be distributed to certain other users. The message generate from that user will have a unique Id (guid). This message will be posted into the queue and picked up by the worker role. The message is distributed over several other tables (partitionkey -> UserId, rowkey -> Some timestamp in ticks & the unique message id. So there is almost no chance the same messages will be posted in a normal situation.
The invisibility time out COULD be a logical explanation because some messages could be distributed to like 10-20 tables. This means 10-20 insert without the batch option. Can you set or expand this invisibility time out?
Not deleting the queue message because of an exception COULD be a explanation as well because we didn't implement any poison message fail over YET ;).
Regardless of the Staging vs. Production issue, having a mechanism that handles poison messages is critical. We've implemented an abstraction layer over Azure queues that automatically moves messages over to a poison queue once they've been attempted to be processed some configurable amount of times.
You clearly have a fault on handling double messages. The fact that your ID is unique doesn't mean that the message will not be processed twice in some occasions like:
The role dying and with partially finished work, so the message will re-appear for processing in the queue
The role crashing unexpected, so the message ends up back in the queue
The FC migrating moving your role and you don't have code to handle this situation, so the message ends up back in the queue
In all cases, you need code that handles the fact that the message will re-appear. One way is to use the DequeueCount property and check how many times the message was removed from a Queue and received for processing. Make sure you have code that handles partial processing of a message.
Now what probably happened during swapping was, when the production environment became the staging and staging became production, both of them were trying to receive the same messages so they were basically competing each other fro those messages, which is probably not bad because this is a known pattern to work anyway but when you killed your old production (staging) every message that was received for processing and wasn't finished, ended up back in the Queue and your new production environment picked the message for processing again. Having no code logic to handle this scenario and a message was that partially processed, some records in the tables existed and it started causing the behavior you noticed.
There are a few possible causes:
How are you reading the queue messages? If you are doing a Peek Message then the message will still be visible to be picked up by another role instance (or your staging environment) before the message is deleted. You want to make sure you are using Get Message so the message is invisible until it can be deleted.
Is it possible that your first role crashed after doing the work for the message but prior to deleting the message? This would cause the message to become visible again and get picked up by another role instance. At that point the message will be a poison message which will cause your instances to constantly crash.
This problem almost certainly has nothing to do with Staging vs Production, but is most likely caused by having multiple instances reading from the same queue. You can probably reproduce the same problem by specifying 2 instances, or by deploying the same code to 2 different production services, or by running the code locally on your dev machine (still pointing to Azure storage) using 2 instances.
In general you do need to handle poison messages so you need to implement that logic anyways, but I would suggest getting to the root cause of this problem first, otherwise you are just going to run into a lot more problems later on.
With queues you need to code with idempotency in mind and expect and handle the ‘EntityAlreadyExists’ as a viable response.
As others have suggested, causes could be
Multiple message in the queue with the same identifier.
Are peeking for the message and not reading it form the queue and so not making them invisible.
Not deleting the message because an exception was thrown before you can delete them.
Taking too long to process the message so it cannot be deleted (because invisibility was timed out) and appears again
Without looking at the code I am guessing that it is either the 3 or 4 option that is occurring.
If you cannot detect the issue with a code review, you may consider adding time based logging and try/catch wrappers to get a better understanding.
Using queues effectively, in a multi-role environment, requires a slightly different mindset and running into such issues early is actually a blessing in disguise.
Appended 2/24
Just to clarify, modifying the invisibility time out is not a generic solution to this type of problem. Also, note that this feature although available on the REST API, may not be available on the queue client.
Other options involve writing to table storage in an asynchronous manner to speed up your processing time, but again this is a stop gap measures which does not really address the underlying paradigm of working with queues.
So, the bottom line is to be idempotent. You can try using the table storage upsert (update or insert) feature to avoid getting the ‘EntitiyAlreadyExists’ error, if that works for your code. If all you are doing is inserting new entities to azure table storage then the upsert should solve your problem with minimal code change.
If you are doing updates then it is a different ball game all together. One pattern is to pair updates with dummy inserts in the same table with the same partition key so as to error out if the update occurred previously and so skip the update. Later after the message is deleted, you can delete the dummy inserts. However, all this adds to the complexity, so it is much better to revisit the architecture of the product; for example, do you really need to insert/update into so many tables?
Without knowing what your worker role is actually doing I'm taking a guess here, but it sounds like when you have two instances of your worker role running you are getting conflicts while trying to write to an Azure table. It is likely to be because you have code that looks something like this:
var queueMessage = GetNextMessageFromQueue();
Foo myFoo = GetFooFromTableStorage(queueMessage.FooId);
if (myFoo == null)
{
myFoo = new Foo {
PartitionKey = queueMessage.FooId
};
AddFooToTableStorage(myFoo);
}
DeleteMessageFromQueue(queueMessage);
If you have two adjacent messages in the queue with the same FooId it is quite likely that you'll end up with both of the instances checking to see if the Foo exists, not finding it then trying to create it. Whichever instance is the last to try and save the item will get the "Entity already exists" error. Because it errored it never gets to the delete message part of the code and therefore it becomes visible back on the queue after a period of time.
As others have said, dealing with poison messages is a really good idea.
Update 27/02
If it's not subsequent messages (which based on your partition/row key scheme I would say it's unlikely), then my next bet would be it's the same message appearing back in the queue after the visibility timeout. By default if you're using .GetMessage() the timeout is 30 seconds. It has an overload which allows you to specify how long that time frame is. There is also the .UpdateMessage() function that allows you to update that timeout as you're processing the message. For example you could set the initial visibility to 1 minute, then if you're still processing the message 50 seconds later, extent it for another minute.

Active vs. Running Workflow

At SharePoint Saturday in Lisle, IL this weekend, Robert Bogue said there's a difference between active and running workflows. I've looked on the web, but can someone clarify?
If I can have up to millions of active workflows on the server, why can I only have 15 or so running workflows per server?
Yes, there is a difference:
"Running" Workflows are all which currently are doing something (i.e. executing an activity).
"Active" Workflows are simply all which are "running" but currently are not doing anything - e.g. waiting for OnItemChanged or DelayActivity.
The key to understand this is WorkflowEventDeliveryThrottle (here for SP2007, because the documentation for 2010 doesn't exist). The standard value for this is property is 15. That means that there are only 15 concurrent workflow which can run at the same time. After this limit is reached the workflows get queued to the OWSTimer which executes the workflows after some arbitrary time (I think the workflow timer job is set to every 5 minutes).
This Throttle can be changed by using stsadm (AFAIK Powershell doesn't work - you can change the property via code of course setting SPWebService.WorkflowEventDeliveryThrottle):
stsadm -o setproperty -pn workflow-eventdelivery-throttle -pv "20"
Now the maximum number of "running" workflows (better would be "maximum number of workflow events that can be processes simultaneously") would be 20. See some other SO post where someone plays with the parameter.
There is a nice technical blog post to understand Workflow Event Processing: About the “workflow-eventdelivery-throttle” parameter.
Similar to the throttle is the WorkflowEventDeliveryBatchSize which denotes the maximum number of workflow events that are processed in a batch.
Summary:
You can have thousands of active workflows, e.g. all waiting for the workflow item to be changed. They are not running, not finished - simply active.
There is a limited number of workflow events that can be processed at the same time (you called it "running" workflows)
You could also have thousands of running workflows, e.g. all of them might get triggered by a delay activity set to 5 minutes, but only a limited number of them is running simultaneously, the rest of them gets queued for later execution.

Resources