I'd like to know about specific problems you - the SO reader - have solved using Workflow Engines and what libraries/frameworks you used if you didn't roll your own. I'd also like to know when a Workflow Engine wasn't the best choice and if/how you chose something simpler, like a TaskList/WorkList/Task-Management type application using state machines.
Questions:
What problems have you used workflow engines to solve?
What libraries/frameworks did you use?
When did a simpler State Machine/Task Management like system suffice?
Bonus: How did/do you make the distinction between Task Management and Workflow Engine?
I'm looking for first-hand experiences.
Some of the resources I've checked out:
Ruote and State Machine != Workflow Engine
StonePath and Docs
Creating and Managing Worklist Task Plans with Oracle
Design and Implementation of a Workflow Engine - Thesis
What to use Windows Workflow Foundation For
JBoss jBPM Docs
I'm biased as well, as I am the main author of StonePath.
I have developed workflow applications for the U.S. State Department, the Geneva Centre for Humanitarian Demining, several fortune 500 clients, and most recently the Washington DC Public School System. Every time I have seen a 'workflow engine' that tried to be the one master reference for business processes, I have seen an organization fighting itself to work around the tool. This may be due to the fact that these solutions have always been vendor/product driven, and then end up with a tactical team of 'consultants' constantly feeding the app... but because of this, I tend to react negatively when I hear the benefits of process-based tools that promise to 'centralize the workflow definitions in one place and make them repeatable'.
I very much like Ruote - I have been following that project for some time and should I need that kind of solution, it will be the next tool I'll be willing to try. StonePath has a very different purpose than ruote
Ruote is useful to Ruby in general,
StonePath is aimed at Rails, the web framework written in Ruby.
Ruote is about long-lived business processes and their associated definitions (Note - active development on ruote ceased).
StonePath is about managing State-based workflow and tasking.
Frankly, I think the distinction from the outside looking in might be subtle - many times the same kinds of business processes can be represented either way - the state-and-task-based model tends to map to my mental model though.
Let me describe the highlights of a state-based workflow.
States
Imagine a workflow revolving around the processing of something like a mortgage loan or a passport renewal. As the document moves 'around the office', it travels from state to state.
If you are responsible for the document, and your boss asked you for a status update you'd say things like
"It is in data entry"...
"We are checking the applicant's credentials now"...
"we are awaiting quality review"...
"We are done"... and so on.
These are the states in a state-based workflow. We move from state to state via transitions - like "approve", "apply", kickback", "deny", and so on. These tend to be action verbs. Things like this are modeled all the time in software as a state machine.
Tasks
The next part of a state/task-based workflow is the creation of tasks.
A Task is a unit of work, typically with a due date and handling instructions, that connects a work item (the loan application or passport renewal, for instance), to a users "in box".
Tasks can happen in parallel with each other or sequentially
Tasks can be created automatically when we enter states,
Create tasks manually as people realize work needs to get done
Require tasks to be completed before we can move onto a new state.
This kind of behavior is optional, and part of the workflow definition.
The rabbit hole can go a lot deeper than this, and I wrote an article about it for Issue #4 of PragPub, the Pragmatic Programmer's Magazine. Check out the repo link above for an updated PDF of that article.
In working with StonePath the last few months, I have found that the state-based model maps really well to restful web architectures - in particular, the tasks and state transitions map nicely as nested resources. Expect to see future writing from me on this subject.
I'm biased, I'm one of the authors of ruote.
variant 1) state machine attached to a resource (document, order, invoice, book, piece of furniture).
variant 2) state machine attached to a virtual resource named a task
variant 3) workflow engine interpreting workflow definitions
Now your question is tagged "BPM" we can be expanded into "Business Process management". How does that kind of management occur in each of the variant ?
In variant 1, the business process (or workflow) is scattered in the application. The state machine attached to the resource enforces some of the aspects of the workflow, but only those related to the resource. There may be other resources with their own state machine following the same business process.
In variant 2, the workflow can be concentrated around the task resource and represented by the state machine around that resource.
In variant 3, the workflow is enacted by interpreting a resource called a workflow definition (or business process definition).
What happens when the business process changes ? Is it worth having a workflow engine where business processes are manageable resources ?
Most of the state machine libraries have 1 set states + transitions. Workflow engines are, most of them, workflow definition interpreters and they allow multiple different workflows to run together.
What will be the cost of changing the workflow ?
The variants are not mutually exclusive. I have seen many examples where a workflow engine changes the state of multiple resources some of them guarded by state machines.
I also use variant 3 + 2 a lot, for human tasks : the workflow engine, at some points when running a process instance, hands a task (workitem) to a human participant (resource task is created and placed in state 'ready').
You can go a long way with variant 2 alone (the task manager variant).
We could also mention variant 0), where there is no state machine, no workflow engine, and the business process(es) are scattered and/or hardcoded in the application.
You can ask many questions, but if you don't take the time to read the answers and don't take the time to try out and experiment, you won't go very far, and will never acquire any flair for when to use this or that tool.
On a previous project I was working on i added some Workflow type rules to a set of Government Forms in the Healhcare industry.
Forms needed to be filled out by the end user , and depending on some answers other Forms were scheduled to be filled out at a later date. There were also external events that would cancel scheduled Forms or schedule new ones.
Sample Flow :
Patient Admitted -> Schedule Initial Assessment FOrm -> Schedule Quarterly Review Form -> Patient Died -> Cancel Review -> Schedule Discharge Assessment Form
Many other rules were based on things such as Patient age, where they were being admitted etc.
This was an ASP.NET app, the rules were basically a table in the database. I added scripting, so a script would run on Form completion to determine what to do next. This was a horrid design, and would have been perfect for a proper Workflow engine.
I'm one of the authors of the open source Temporal Workflow Engine we initially developed at Uber as Cadence. The difference between Temporal and the majority of the existing workflow engines is that it is developer focused and is extremely flexible and scalable (to tens of thousands updates per second and up to billions of open workflows). The workflows are written as object oriented programs and the engine ensures that the state of the workflow objects including thread stacks and local variables is fully preserved in case of host failures.
What problems have you used workflow engines to solve?
Temporal is used for practically any backend application that lives beyond a single request reply. Examples of usage are:
Distributed CRON jobs
Managing ML/Data pipelines
Reacting to business events. For example trip events at Uber. The workflow can accumulate state based on events received and execute activities when necessary.
Services Deployment to Mesos/ Kubernetes
CI Pipeline implementation
Ensuring that multiple service calls complete when a request is received. Including SAGA pattern implementation
Managing human worker tasks (similar to Amazon MTurk)
Media processing
Customer Support Ticket Routing
Order processing
Testing service similar to ChaosMonkey
and many others
The other set of use cases is based on porting existing workflow engines to run on Temporal. Practically any existing engine workflow specification language can be ported to run on Temporal. This way a single backend service can power multiple domain specific workflow systems.
What libraries/frameworks did you use?
Temporal is a self contained service written in Go with Go, Java, PHP, and Typescript client side SDKs (.NET and Python are coming in 2022). The only external dependency is storage. Cassandra, MySQL and, PostgreSQL are supported. Elasticsearch can be used for advanced indexing.
Temporal also support asynchronous cross region (using AWS terminology) replication.
When did a simpler State Machine/Task Management like system suffice?
Open source Temporal service can be self hosted or temporal.io cloud offering can be used. So the overhead of building any custom state machine/task management is always higher than using Temporal. Outside the company the service and storage for it need to be set up. If you already have an SQL database the service deployment is trivial through a docker image. The docker is also used to run a local Temporal service for development on a personal computer or laptop.
I am one of the authors of Imixs-Workflow. Imixs-Workflow is an open source workflow engine based on BPMN 2.0 and fully integrated into the Java EE technology stack.
I develop workflow engines by myself since more than 10 years. I will try to answer your question in short:
> What problems have you used workflow engines to solve?
My personal goal when I started to think about workflow engines was to avoid hard codding the business logic within my application. Many things in a business application can be reused so it makes sense to keep them configurable. For example:
sending out a notification
view open tasks
assigned a task to a person
describing the current task
From this function list you can see I am talking about human-centric workflows. In short: A human-centric workflow engine answers the questions: Who is responsible for a task and who needs to be informed next? And these are the typical questions in business requirements.
>What libraries/frameworks did you use?
5 years ago we started reimplementing Imixs-Workflow engine focusing on BPMN 2.0. BPMN is the common standard for process modeling. And the surprising thing for me was that we were suddenly able to describe even highly complex business processes that could be visualized and executed. I recommend everyone to use BPMN for modeling business processes.
> When did a simpler State Machine/Task Management like system suffice?
A simple state machine is sufficient if you just want to track the status of a business object. This is the case when you begin to introduce the 'status' attribute into your object model. But in case you need business processes with responsibilities, logging and flow control, then a state machine is no longer sufficient.
> Bonus: How did/do you make the distinction between Task Management and Workflow Engine?
This is exactly the point where many workflow engines mentioned here differ. For a human-centric workflow you typically need a task management to distribute tasks between human actors. For a process automation, this point is not so relevant. It is sufficient if the engine performs certain tasks. Task management and workflow engines can not be compared because task management is always a function of a workflow engine.
Check rails_workflow gem - I think this is close to what you searching.
I have an experience with using Activiti BPMN 2.0 engine for handling high-performance and high-throughput data transfer processes in an infrastructure of network nodes. The basic task was to allow configuration and monitoring of such transfer processes and control each network node (ie. request node1 to send a data file to node2 via specific transport layer).
There could be thousands of processes running at a time and overall tens or low hundreds of thousands processes per day.
There were bunch of different process definitions but it was not necessarily required that an operator of the system could create custom workflows. So the primary use case for the BPM engine itself was to be robust, scalable and allow monitoring of each process flow.
In the end it basically worked but what we learned from that project was that a BPMN platform, or rather the Activiti engine specifically, was not the best bet for such a high-throughput system.
The main challenges were task execution prioritization, DB locking, execution retries to name the few concerning the BPM itself. So we had to develop custom handling of these, for example:
Handling of retries in the BPM for cases when a node had no free worker for given task, or when the node was not running at all.
Execution of parallel transfer tasks in a single process and synchronization of the results (success/failure).
I don't know if other BPMN engines would be more suitable for such scenario since BPMN is mostly intended for long-running business tasks involving user interaction where performance is probably not the same issue as was in our case.
I rolled my own workflow engine to support phased processing of documents - cataloging, sending for image processing (we work with redaction sw), if needed sending to validation, then release and finally shipping back to the client. In our case we have a truckload of documents to process so sometimes we need to run each service separately to control delivery and resources usage. Simple in concept but high performance and distributed processing needed, and we could't find any off the shelf product that fit the bill for us.
Related
Currently, our product is a web application with SQL Server as DBMS, ASP.NET backend, and classic HTML/JavaScript/CSS frontend. The product is actively developed and each month we have to deploy a new version of it to production.
During this deployment, we update all the components listed above (apply some SQL scripts, update binaries, and client files) but we deploy only the delta (set of files which were changed since the last release). It has some benefits like we do not reset custom data/configs/client adjustments.
Now we are going to move inside clouds like Azure, AWS, etc. Adjust product architecture to be compliant with the Docker/Kubernetes and provide the product as SaaS.
And now the question itself: "Which approach of deployment is recommended in the clouds?" Can we keep applying the delta only? Or we have to reorganize the process to always deploy from scratch?
If there are some Internet resources I have missed, please share.
This question is extremely broad but maybe some clarification could steer you in the right direction anyway:
Source code deployments (like applying delta's) and container deployments are two very different directions in the sense that the tooling you invest in during the entire SLDC CAN differ substantially. Some testing pipelines/products focus heavily (or exclusively) on working with one or the other. There will be tools that can handle both of course.
They also differ in the problems they're attempting to solve and come with some pro's and con's:
Source Code Deployments/Apply Diffs:
Good for small teams and quick deployments as they're simple to understand and setup.
Starts to introduce risk when you need to upgrade the Host OS or application dependencies
Starts to introduce risk when the Host's in production begin to drift (have more differing files then expected) more dramatically over time
Slack has a good write up of their experience here.
Container deployments
Provides isolation from the application (developer space) and the Host OS (sysadmin/ops space). This usually means they can work with each other independently.
Gives an "artifact" that won't change between deployments, ie the container tagged v1 will always be the same unless you do something really funky. You can't really guarantee this
The practice of isolating stateless components makes autoscaling those components very easy, and you can eventually spend more time on the harder ones (usually stateful).
Introduces a new abstraction with new concerns that your team will have to mature into. Testing pipelines, dev tooling, monitoring/loggin architectures might all need to be adjusted over time and that comes with cost and risk.
Stateful containers is hardly a solved problem (ie shoving an existing database in a container can be a surprising challenge).
In order to work with Kubernetes, you need to have a containerized application. That doesn't mean you need to containerize your entire product over night. Splitting out the front end to deploy with cloudfront/s3, and containerizing a stateless app will get your feet wet.
Some books that talk about devops philosophies (in which this transition plays a part)
The Devops Handbook
Accelerate
Effective Devops
SRE book
I am working on a monolith system. All of it's code is in one repository (Web API and background workers). System is written in Nodejs and MongoDB (Mongoose) is used as a data store. My goal is to set a new path how project should evolve. At first I was wondering if I could move towards microservices based architecture.
Monolith architecture creates some problems:
If my background workers needs to scale. I have to deploy all the project to the server despite only using a small fraction of it.
All system must be redeployed when code changes. What if payment processor calls webhook while system is being redeployed?
Using microsevices advantages are quite obvious:
Smaller code base for individual microservice. Easier to reason about it.
Ability to select programming tools best for particular use case.
Easier to scale.
Looking at the current code I noticed that Mongoose ODM (Object Document Mapper) models are used across all the project to create, query and update models in database. As a principle of a good programming all such interactions with database should be abstracted. Business logic should not leak into other system layers. I could do that by introducing REPOSITORY pattern (Domain Driven Design). While code is still being shared across web api and it's background workers it is not a hard task to do.
If i decide to extract repositories into standalone microservices than all bunch of problems arise:
Some sort of query language must be introduced to accommodate complex search queries.
Interface must provide a way to iterate over search results (cursor based navigation) without returning all database documents over network.
Since project is in it's early stage and I am the only developer, going to microservices based architecture seems like an overkill. Maybe there are other approaches I should consider?
Extracting business logic and interaction with database into separate repository and sharing among services to avoid complex communication protocols between services?
Based on my experience with working in Microservices for last few years, it seems like an overkill in current scenario but pays off in long-term.
Based on the information stated above, my thoughts are:
Code Structure - Microservices Architecture (MSA) applying in above context means not separating DAO, Business Logic etc. rather is more on the designing system as per business functions. For example, if it is an eCommerce application, then you can shipping, cart, search as separate services, which can further be divided into smaller services. Read it more about domain-driven design here.
Deployment Unit - Keeping microservices apps as an independent deployment unit is a key principle. Hence, keep a vertical slice of the application and package them as Docker Image with Application Code, App Server (if any), Database and OS (Linux etc.)
Communication - With MSA, communication between services become a key and hence general practice is to remain with the message-oriented approach for communication (read about the reactive system and reactive programming for more insight).
PaaS Solution - There are multiple PaaS solutions available, which you can apply so that you don't need to worry about all the other aspects like container management, container orchestration, auto-scaling, configuration management, log management and monitoring etc. See following PaaS solutions:
https://www.nanoscale.io/ by TIBCO
https://fabric8.io/ - by RedHat
https://openshift.io - by RedHat
Cloud Vendor Platforms - AWS, Azure & Google Cloud all of them have specific support for Microservices App from the deployment perspective, which we can use as an alternative solution if you don't want to deploy PaaS solution in your organization.
Hope these pointers will have in understanding the overall landscape so that you can structure your architecture for future need.
I am working on a monolith system... My goal is to set a new path how project should evolve. At first I was wondering if I could move towards microservices based architecture.
In what ways do you need to evolve the project? Will it be mostly bugfixes, adding features, improving performance and/or scalability? Do you anticipate other developers collaborating in the future? Are you currently having maintenance issues? The answers to these questions (and many more) should be considered in guiding your choices.
You seem to be doing your homework around the pros and cons of a microservice architecture, so if you haven't asked yourself why you're even doing this in the first place, now would be good time to do so.
Maybe there are other approaches I should consider?
There's always the good old don't-break-what's-going ;)
Background
We are looking at porting a 'monolithic' 3 tier Web app to a microservices architecture. The web app displays listings to a consumer (think Craiglist).
The backend consists of a REST API that calls into a SQL DB and returns JSON for a SPA app to build a UI (there's also a mobile app). Data is written to the SQL DB via background services (ftp + worker roles). There's also some pages that allow writes by the user.
Information required:
I'm trying to figure out how (if at all), Azure Service Fabric would be a good fit for a microservices architecture in my scenario. I know the pros/cons of microservices vs monolith, but i'm trying to figure out the application of various microservice programming models to our current architecture.
Questions
Is Azure Service Fabric a good fit for this? If not, other recommendations? Currently i'm leaning towards a bunch of OWIN-based .NET web sites, split up by area/service, each hosted on their own machine and tied together by an API gateway.
Which Service Fabric programming model would i go for? Stateless services with their own backing DB? I can't see how Stateful or Actor model would help here.
If i went with Stateful services/Actor, how would i go about updating data as part of a maintenance/ad-hoc admin request? Traditionally we would simply login to the DB and update the data, and the API would return the new data - but if it's persisted in-memory/across nodes in a cluster, how would we update it? Would i have to expose this all via methods on the service? Similarly, how would I import my existing SQL data into a stateful service?
For Stateful services/actor model, how can I 'see' the data visually, with an object Explorer/UI. Our data is our Gold, and I'm concerned of the lack of control/visibility of it in the reliable services models
Basically, is there some documentation on the decision path towards which programming model to go for? I could model a "listing" as an Actor, and have millions of those - sure, but i could also have a Stateful service that stores the listing locally, and i could also have a Stateless service that fetches it from the DB. How does one decide as to which is the best approach, for a given use case?
Thanks.
What is it about your current setup that isn't meeting your requirements? What do you hope to gain from a more complex architecture?
Microservices aren't a magic bullet. You mainly get four benefits:
You can scale and distribute pieces of your overall system independently. Service Fabric has very sophisticated tools and advanced capabilities for this.
You can deploy and upgrade pieces of your overall system independently. Service Fabric again has advanced capabilities for this.
You can have a polyglot system - each service can be written in a different language/platform.
You can use conflicting dependencies - each service can have its own set of dependencies, like different framework versions.
All of this comes at a cost and introduces complexity and new ways your system can fail. For example: your fast, compile-time checked in-proc method calls now become slow (by comparison to an in-proc function call) failure-prone network calls. And these are not specific to Service Fabric, btw, this is just what happens you go from in-proc method calls to cross-machine I/O - doesn't matter what platform you use. The decision path here is a pro/con list specific to your application and your requirements.
To answer your Service Fabric questions specifically:
Which programming model do you go for? Start with stateless services with ASP.NET Core. It's going to be the simplest translation of your current architecture that doesn't require mucking around with your data layer.
Stateful has a lot of great uses, but it's not necessarily a replacement for your RDBMS. A good place to start is hot data that can be stored in simple key-value pairs, is accessed frequently and needs to be low-latency (you get local reads!), and doesn't need to be datamined. Some examples include user session state, cache data, a "snapshot" of the most recent items in a data stream (like the most recent stock quote in a stream of stock quotes).
Currently the only way to see or query your data is programmatically directly against the Reliable Collection APIs. There is no viewer or "management studio" tool. You have to write (and secure) an API in each service that can display and query data.
Finally, the actor model is a very niche model. It serves specific purposes but if you just treat it as a data store it will not work for you. Like in your example, a listing per actor probably wouldn't work because you can't query across that list, or even have multiple users reading the same listing simultaneously.
We need to build a couple applications that require fairly advanced workflow functionality. The plan is to store the data in SQL Server, use Windows Workflow Foundation as the workflow engine, and build the frontend using an RIA technology such as Flex or Silverlight.
We already have Sharepoint 2007 set up, and some of us (including me) have a little bit of experience creating custom Sharepoint workflows that work with data in Sharepoint lists.
My question is, would it make sense to use Sharepoint for the workflow, while the actual data is stored outside of Sharepoint in a separate database? We need the task, authentication, and email functionality of Sharepoint, but our data model is a bit complex so we'd rather not store the data in Sharepoint. We'd rather not start from scratch with Workflow Foundation, because Sharepoint already gives us 90% of the functionality we need.
Any thoughts / advice?
I think that this is a great example for use of SharePoint as a platform. I dont see any conceptual problems using it in the way that you describe. I see SharePoint as a development platform. One thing you might want to keep in mind, is if you want to make the workflow continiue on events happening in the seperate database, you might have to update for instance the workflow tasks item from an external program.
Your use case is a perfect fit and one that SharePoint adds great value to. I would highly recommend using SharePoint to host your workflows.
I have developed many SharePoint hosted WF workflows and the only real problem that I ever experienced was making calls to long running web services (asynchronous operations) as SharePoints WF host has some limitations on the type of external providers it can listen for events from.
The solution that I developed (which was a bit of a hack at first but ended up being of some value to my customers) was to create a service proxy (WCF) that sat outside of SharePoint and would route calls to remote services and wait for their response. In parallel to making that asynchronous call a parallel activity would create a SharePoint task associated with the asynchronous operation. Then the WF would stop on a OnTaskCompleted activity which causes the WF resources to be released and the state to be persisted to SQL. As the long running operation would event back status updates or completion notification the external service would update the related SharePoint task. Once the task is marked completed the WF is dehydrated and continues executing. The neat thing about this approach was that I could then create a dashboard that showed the status of all the long running processes going on outside of SharePoint. Lastly I packaged all of this stuff up into a composite activity so that it didn't clutter up my pretty workflow diagrams.
SharePoint is ideally suited for this scenarion. I would suggest using a Business Data Catalog (BDC) to access external data sources. It provides a tremendouse benefit primarily by making your datasource searchable as well as providing OOB web parts to display the data with master child relation ships, filtering and a rich API.
I would caution against making workflows too complex and instead break up the process into stages using smaller workflows, InfoPath and user actions to facilitate the entire process. this is where SharePoint really shines as you can interject visibility of the process stages to others in the organization using dashboards (if it makes sense for your scenario) as well as collaboration, approvals ... the list goes on.
I agree that SP can provide a nice WF engine, but let me ask this... are you storing anything IN SharePoint? (tasks, data sources, etc)
I ask because it may be as easy (and more appropriate) to run your own WF engine. If you are running all native WF functionality, and just need an engine, you can write a quick console app that can start workflows.
If you are using SP for anything beyond WF, then I absolutely agree to use SP.
Do you do automated testing on a complex workflow system like K2?
We are building a system with extensive integration between Sharepoint 2007 and K2. I can't even imagine where to start with automated testing as the workflow involves multiple users interacting with Sharepoint, K2 workflows and custom web pages.
Has anyone done automated testing on a workflow server like K2? Is it more effort than it's worth?
I'm having a similar problem testing workflow-heavy MOSS-based application. Workflows in our case are based on WWF.
My idea is to mock pretty much everything that you can't control from unit tests - documents storage, authentication, user rights and actions, sharepoint-specific parts of workflows for sharepoint (these mocks should be thoroughly tested to mirror behavior of real components).
You use inversion of control to make code choose which component to use at runtime - real or mock.
Then you can write system-wide tests to test workflows behavior - setting up your own environment, checking how workflow engine reacts. These tests are too big to call them unit-tests, still it is automated testing.
This approach seems to work on trivial cases, but I still have to prove it is worthy to use in real-world workflows.
Here's the solution I use. This is a simple wrapper around the runtime that allows executing single activity, simplifies passing the parameters, blocks the invoking thread until the workflow or activity is done, and translates / rethrows exceptions if any. Since my workflow only sends or waits for messages through a custom workflow service, I can mock out the service to expect certain messages from workflow and post certain messages to it and here I'm having real unit-tests for my WF! The credit for technology goes to Michael Kennedy.
If you are going to do unit testing, Typemock Isolator is the only tool that can currently mock SharePoint objects.
And by the way, Richard Fennell is working on a workflow mocking solution here.
We've just today written an application that monitors our K2 worklist, picks up certain tasks from it, fills in some data and submits the tasks for completion. This is allowing us to perform automated testing, find regressions, and run through as many different paths of the workflow in a fraction of the time that it would take people to do it. I'd imagine a similar program could be written to pretend to be sharepoint.
As for the unit testing of the workflow items themselves, we have a dll referenced from k2 which contains all of our line rule and processing logic. We don't have any code in the k2 workflows themselves, it is all referenced from these dlls. This allows us to easily write unit tests on them to test all of the individual line rules.
I've done automated integration testing on K2 workflows using the K2ROM API (probably SourceCode.Workflow.Client if you're using K2 blackpearl).
Basically you start a process on a test server with a known folio (I generate a GUID), then use the management API to delete it afterwards. I wrote helper methods like AssertAtClientActivity (basically calls ProvideWorkItem with criteria).
Use the IsSynchronous parameter to StartProcessInstance, WorklistItem.Finish, etc. so that relevant method calls will not return until the process instance has reached a stable state.
Expect tests to be slow and to occasionally fail. These are not unit tests.
If you want to write unit tests against other systems, you'll probably want to wrap the K2 API.
Consider looking at Windows Workflow 4 and the new workflow features in SharePoint 2010. You may not need K2.