Is complete regression testing achievable with Behavior Driver Development. (Jbehave/Cucmber)

Is complete regression testing achievable with Behavior Driver Development. (Jbehave/Cucmber) - cucumber

Can we achieve regression tesing coverage with BDD using JBehave/Cucumber?
Please share your inputs that the complete regression testing is achievable with Behavior Driver Development. (Jbehave/Cucmber).

In all but the most trivial products, it's impossible to perform complete regression testing.
Consider these acceptance criteria:
Items can be replaced or refunded.
This leads to two scenarios; one where we refund the item, and one where we return it. Now let's add a bit more to that:
Items are put in stock when returned or refunded, unless faulty.
Now we have four scenarios:
The one where we replace the item and it's faulty
The one where we refund the item and it's faulty
The one where we replace the item and put it back in stock
The one where we refund the item and put it back in stock.
Now let's add the criteria that a receipt must be in date. We need to check that refunds and replacements are both refused, but also that the item doesn't accidentally go back into stock, nor that any fault label is printed. So now we have eight scenarios.
Now let's think about the scenarios where we have a discount, and the ones where we can't scan the barcode, so we manually input the number, and the ones where the customer lost the receipt so we have to look it up using his loyalty card, and the ones where he paid by gift certificate...
Every scenario could, if the code was poorly designed, affect every other scenario. The number of potential combinations becomes exponential, very quickly.
We hope that the code is well-designed, and that the different aspects of behaviour are well-encapsulated. We hope that all the scenarios had been considered. However, if that was the case, we wouldn't be accidentally changing behaviour we didn't mean to, and we wouldn't need regression testing at all. So we know that at least some of the time, in most teams, changes to one scenario do affect changes in another.
Thinking about the responsibility of each piece of code can help to reduce this, which is why most teams practice both BDD and TDD (or BDD at a class level).
Additionally, it's impossible to ensure that every scenario has been thought of up-front, especially since every software project involves something new (or you wouldn't be doing it).
The only thing we can do is get confidence that the code works.
BDD is pretty good at giving us confidence. Not only does it help people to understand what the code does - so they are less likely to make mistakes and write bugs - but it also helps with automating the scenarios, so that there's less work for the testers, and they can focus more on looking for scenarios nobody's thought of yet (exploratory testing).
So, BDD can definitely help with regression testing... but nothing, not even BDD, can perform complete regression test coverage.

Related

How do we gather and document non-functional requirements in Agile

I know in waterfall, they are gathered and documented at an early stage of SDLC, I believe very first stage. Therefore, they are captured and documented before development and testing even starts.
But I am confused how is that done in Agile?
If I understand correctly, user stories should be written with acceptance criteria which capture non-functional requirements. But in Agile, we pick project, create it, and start working on it right away.
So, my guess is that someone (perhaps product owner) goes through user stories and collects acceptance criteria into a formatted document which then becomes Non-Functional-Requirements document?

First, to answer your question, I must be clear that no Agile frameworks or methodologies attempt to define everything that a team might need to do (especially Scrum) so there is nothing wrong with adding extra artifacts or practices that the team finds useful as long as they aren't contradicting a defined practice.
There are a few places I typically see non-functional requirements recorded. Here are a few of the most common ones:
Definition of Done
The definition of done contains standards for quality that should be applied across all backlog items that come through. Often times this includes things like "n% unit test coverage of code", "code and configuration changes have been peer reviewed", and "all automated regression tests have been run and pass". I've sometimes seen broader non-functional requirements like "no changes cause the application load time to exceed X ms".
Architectural Design Documents
You can still have these in Agile. Rather than establishing the finished architecture at the beginning of the project, they introduce constraints that the architecture has to stay within. As the project progresses and architectural decisions are made or changed, these documents are updated to reflect that information. Examples of constraints may include "System X is considered to be the authoritative source of customer personal data" or "Details needed for payment processing should never be available to a public-facing server in order to reduce attack opportunities on that data."
Product Chartering
Depending on the project, "starting right away" is a bit fluid. On very large projects or products, it is not uncommon to take a few days (in my experience, 1 - 3 is a good number) to charter the project. This would include identifying personas, making sure business stakeholders and team members have a shared understanding of the vision, talk through some expected user experiences and problems at a high level, etc. It is very common that non-functional needs come out here and should be recorded either in the DoD, existing architectural documents, or in some cases, in backlog items. One good example of this happening is something called a trade-off matrix. When building a tradeoff matrix, we talk about constraints on the project like performance, adaptability, feature set, budget, time, etc. We identify one as a primary constraint, two as secondary, and all others are considered tertiary. This isn't a hard-and-fast rule, but it establishes an general understanding of how trade-offs on non-functional needs will be decided in the work.
Backlog Items
Ok, last one. Not all backlog items have to be User Stories. If you have an actionable non-functional requirement (set up a server, reconfigure a firewall, team needs to convert to a new version of the IDE) there is nothing that stops you from creating a backlog item for this. It isn't a User Story, but that's ok. I will warn that most teams find a correlation between the number of items in the backlog that are User Stories and their ability to effectively deliver value and adapt to changes along the way, so don't get carries away. But I'd rather see a team put in a non-US in their backlog than try to pass off those things as user stories like "As a firewall, I want to be updated, so we don't get h#XX0rD" <- real backlog item I saw.
As a final note: remember that in Agile, we strive to adapt to change, so don't worry about getting the DoD or architectural document perfect the first time. It can change as you learn more.

Will different website A/B tests interfere with either test's results?

I have a question about running an A/B test against different pages on a website and if I should worry about them interfering with either test's results. Not that it matters, but I'm using Visual Website Optimizer to do the testing.
For example, if I have two A/B tests running on different pages in the order placement flow, should I worry about the tests having an effect on one anothers goal conversion rate for the same conversion goal? For example, I have two tests running on a website, one against the product detail page and another running on the shopping cart. Ultimately I want to know if a variation of either page affects the order placement conversion rate. I'm not sure if I should be concerned with the different test's results interfering with one another if they are run at the same time.
My gut is telling me we don't have to worry about it, as the visitors on each page will be distributed across each variation of the other page. So the product detail page version A visitors will be distributed across the A and B variations of the cart, therefore the influence of the product detail page's variation A on order conversion will still be measured correctly even though the visitor sees different versions of the cart from the other test. Of course, I may be completely wrong, and hopefully someone with a statistics background can answer this question more precisely.
The only issue I can think of, is a combination between one page's variation and another page's variation worked together better than other combinations. But this seems unlikely.
I'm not sure if I'm explaining the issue clearly enough, so please let me know if my question makes sense. I searched the web and Stackoverflow for an answer, but I'm not having any luck finding anything.

I understand your problem and there is no quick answer to it and it depends on the types of test you are running. There are times that A/B tests on different pages influence each other, specially if they are within the same sequence of actions, e.g. checkout.
A simple example, if on your first page, variation A says "Click here to view pricing" and variation B says "Click here to get $500 cash". You may find that click through on B is higher and declare that one successful. Once the user clicks, on the following page, there are asked to enter their credit card details, with variations being "Pay" button being either green or red. In a situation like this, people from variation A might have a better chance of actually entering their CC details and converting as opposed to variation B who may feel cheated.
I have noticed when websites are in their seminal stages and they are trying to get a feel of what customers respond to well, drastic changes are made these multivariate tests are more important. When there is some stability and traffic, however, the changes tend to be very subtle and overall message and flow are the same and A/B tests become more micro refinements. In those cases, there might be less value in multi page cross testings (does background colour on page one means anything three pages down the process? probably not!).
Hope this answer helps!

Giving up Agile, Switching to waterfall - Is this right? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am working in an Agile environment and things have gone to the state where the client feels that they would prefer Waterfall due to the failures (that's what they think) of the current Agile scenario. The reason that made them think like this would be the immense amount of design level changes that happened during the end stages of the sprints which we (developers) could not complete within the time they specified.
As usual, we both were blaming each other. From our perspective, the changes said at the end were too many and design/code alterations were too much. Whereas from the client's perspective, they complain that we (developers) are not understanding the requirements fully and coming up with solutions that were 'not' what they intended in the requirement. (like they have asked us to draw a tiger, and we drew a cat).
So, the client felt (not us) that Agile process is not correct and they want to switch to a Waterfall mode which IMHO would be disastrous. The simple reason being their satisfaction levels in a Agile mode itself were not enough, then how are they going to tolerate the output after spending so much time during the design phase of a Waterfall development?
Please give your suggestions.

First off - ask yourself are you really doing Agile? If you are then you should have already delivered a large portion of usable functionality to the client which satisfied their requirements in the earlier sprints. In theory, the "damage" should be limited to the final sprint where you discovered you needed large design changes. That being the case you should have proven your ability to deliver and now need a dialogue with the client to plan the changes now required.
However given your description I suspect you have fallen into the trap of just developing on a two week cycle without actually delivering into production each time and have a fixed end date in mind for the first proper release. If this is the case then you're really doing iterative waterfall without the requirements analysis/design up front - a bad place to be usually.
Full waterfall is not necessarily the answer (there's enough evidence to show what the problems are with it), but some amount of upfront planning and design is generally far preferable in practice to the "pure" Agile ethos of emergent architecture (which fits with a Lean approach actually). Big projects simply cannot hope to achieve a sensible stable architectural foundation if they just start hacking at code and hope it'll all come good some number of sprints down the line.
In addition to the above another common problem with "pure" Agile is client expectation management. Agile is sold as this wonderful thing that means the client can defer decisions, change their mind and add new requirements as they see fit. HOWEVER that doesn't mean the end date / budget / effort required remains fixed, but people always seem to miss that part.

The agile development methodologies are particularly appropriate when you have unclear requirements and when you may need to make design changes at later stages in your project. Waterfall is a less appropriate approach in this case. The waterfall approach is appropriate for projects which are well understood and when the requirements are unlikely to change during the project's lifetime. It doesn't sound like that is the case here.
How long are your sprints? An alternative approach might be to decrease the sprint length - at least at the start of the project. Deliver new versions to the customer more often and discuss the changes with the customer. If you aren't doing what they want this will become apparent more quickly so less time will be wasted on implementing solutions that don't meet the customer's requirements.

I'm not sure what kind of shop you run, so it's hard for me to come up with good recommendations. I can offer two guiding principles though:
If you have bad communication with the customer, no development methodology will save you.
It's none of the diner's business how a chef organizes the kitchen, as long as the meal is tasty.

It sounds like you have serious project management and architecture/design issues, and it sounds like your communications have also broken down. Fundamentally I don't think changing your dev methodology is going to fix any of that, and is therefore the wrong thing to be doing (though it may restore some client confidence).
I would be especially concerned about moving towards waterfall since you are now choosing to essentially capture the requirements just once (which we know you have a problem with) with no capacity for input. That rigidity is good for inflexible delivery targets, but it's completely inappropriate here where you have changes all the time - that's agile!
Short term I'd step back and double check your requirements at this stage with them. Renegotiate and confirm your current state in relation to those.
Medium term, I'd open up more communications with the client - try and get them involved in a daily scrum for a while (until you restore confidence, then you can be more flexible).
Long term, you have to be worried about how your PM's and senior devs have managed to get you into this position. If the client is being unreasoanable that's one thing (but it's still up to the PM to manage that, so you're not absolved). It's not reasonable to complain about having too many changes, that just means you screwed up in determining requirements (which is a dialogue, not a monologue) or that you have to have more numerous, but probably shorter sprints.
Above all, I can't see moving towards waterfall is possibly correct. It doesn't fix anything directly and I can only see it exacerbating the problems you've already highlighted.
Caveat: I'm not really capable of a balanced view on waterfall since I've never seen it work effectively and imho it's just completely outdated for enterprise projects.

Agile development does not save you from the burden of actually coming up with a design which both you and the customer understand similarily. Agile just makes it possible to come up with the design in smaller increments and not all at once. And, in the case of a difficult customer, coming up with a proper design takes time.
So, I would spend more effort in sitting down with the customer, with a whiteboard, going over what is it that they actually want. I don't think it really matters in this case if the development process is agile or waterfall.

Agile or waterfall are just words. There are only things that work, and things that don't.
Software development seems virtual to many people and they don't understand why it's hard to change a small thing they request.
Your customers should understand that building a software is just like building a house : when you have built all the foundations and walls, it's hard to change all the house final plan, and room design.
Some practices helps avoid this kind of problem : data modeling, data dictionary, data flow diagrams... the goal being to know every requirement in complete detail. Cutting your product in many independant blocks help starting coding while continuing designing or specifying other parts of your final product.
See Steve McConnell book : "Rapid Software Development : taming wild software schedule" for all the practices that work.

The reason that made them think like this would be the immense amount of design level changes that happened during the end stages of the sprints which we (developers) could not complete within the time they specified.
Scrum is in a way a "short waterfall", and you should be isolated from changing requirements for the sprint duration. It seems that this is not happening! Therefore, don't see you will gain anything from switching to traditional waterfall, but you should stick to freezing requirements for the sprint duration.
Maybe your iterations are too long?
(I assume you follow Scrum, since you mention sprints).
Talk to your clients and agree the following:
- Shorter iterations, up to 3 weeks max.
- No changes in requirements during the iteration.
- Features are planned at the beginning of the iteration
- Every iteration ends with deliverable: fully functional software with all features that are fully operational
- Iteration length does not change. Unfinished features are left for the next iteration (or maybe discarded if client changes his mind).
- Number of "feature points" you can deliver in a single iteration should be based on the team metric, not client insistence. This is your "capacity".
- Client decides what features (but not how many of them) are planned for the iteration
Another thing you should ask yourself is why there are so many "design level changes" in your application. By now, you should have basic architecture and design in place. Maybe you should review the actual design and try to impose some design guidelines and implement some patterns. For example, in a typical enterprise web app, you will probably end up using something like DAO. When you add new features, you create new DAO, but basic architecture and design will not change.
It seems however, that you are not delivering what the client wants. In that case, it is of outermost importance to deliver working product to the client, so he could provide sensible feedback for the next iteration.
Regarding
"we (developers) could not complete
within the time they specified."
The client should not be the one to specify the iteration time-frame. Iteration length should be always the same. The requirements that enter into the iteration should be obtain as a result of client prioritization, but the amount of requirements that is planned for the iteration should be based on the estimation that team performs and number of "points" you are able to deliver during iteration.

For me it sounds as if there was no "Big Plan[TM]" in the agile project. Using an agile process does not mean that there is no long term plan, it is more about to deal with the increasing uncertainty in the farer future. For example there should be a release plan with the planned features for all releases in the next 2 months (and a lesser detailed plan with features for the releases after that), so it is clear to the customer when to expect a feature, and when there is a possibility change requirements.
Also to me it seems that there was not (enough) customer involvement in the process. I know that this is a very problematic point, but it helps a lot if the current progress can be discussed with the customer at the end of each iteration. As #Mark Byers already wrote, the more feedback you can get from your customer the better you are.
Also try to not assign blame, as this keeps people to block. Try to use the inspect-and-adopt approach to get a better process instead.

It's not clear what sort of design changes you mean. Graphical design? User experience design? Code design?
In any event, the best solution is more, and earlier, discussions with the client. Jointly develop explicit, concrete examples that satisfy the client's requirements. You can turn these examples into regression tests to ensure that you continue to satisfy them.
Also, continue the discussions as you progress. Show your output as it is available--don't wait until near the end of the sprint. And work on the part most likely to generate problems first. Also look at ways to make it easier to change the things you're finding often change.
The point is to get the client more involved, even to the iteration of a design. Perhaps you'll want to have some discussions focused only on the design.

Your client does not know about how to develop software, or how to manage the software development process. Don't expect the client to provide meaningful instruction on these matters. As a special case, the client does not really know what terms such as 'waterfall' and 'agile' mean; don't expect them to provide meaningful input on your development methodology. Moreover, the client will not really care about these details, as long as the requirements are met within the agreed budget and timeframe. Don't expect them to care, and don't confuse them with lots of inadequate builds and irrelevant information on your internal process.
Here is what the client does care about, and is trying to talk to you about (partly using your own technical jargon): their requirements, their disappointed expectations, and the way you communicate with them. On these matters, the client is the absolute authority. Interpret what they are saying as being about your relationship and the product, not as usable commentary on internal process. Don't cloud the water with your internal deadlines and processes, discuss progress and expectations and the relationship. (If they insist on talking about internals you can remap the terms: e.g. what they understand as being 'the next release' may be internally known as 'the next major release', or whatever).
It sounds to me like the client may want a higher threshold before they get asked for feedback or play with a bad build. It's worth verifying if this is true. If so, you should honor that - and still use agile methods internally if that is what your team feels is best. If they say "waterfall," you may be able to interpret that internally as meaning "we set a deadline for requirements, and then we don't allow more features to be added for a while." Discuss with the client whether it will suit them to have a requirements deadline followed by this sort of freeze.
Someone on your team needs to be the client advocate, and sit on top of the client's issues and fight for them. This advocate must not be sidelined, nor can they take the team's side against the client; they should be the proxy-boss. Then you can separate the internal process communication (team to advocate) from the external communication (advocate to client). The advocate can in some measure insulate the client from the chatter and the builds they don't appreciate, without artificially imposing a certain sort of management or scheduling on your internal process.
To clarify, I do not at all think that you should be secretive or distant with the client, but you should (A) listen to what the client is saying about the relationship and how you are communicating and honor that, (B) keep that separate from internal development process, which should be managed in whatever way will ultimately meet client's expectations.

Fire the client. Even if it is your fault for not understanding what they mean, waterfall would give them 1 chance to give you feedback instead of a chance at the end of each sprint. Some people/clients are literally so stupid that they are not worth working for. Fire them, or tell them that you're using Waterfall without actually switching.

Obvious problem here is communication with customer. If you really want to do agile you have to communicate with customer on daily basics. Only customer should be able to make decision. If you communicate with customer only during mid spring and at the end of the sprint it is natural that later on you will found problems in your application. Also features implemented in sprint has to be accepted and tested by customer. Until that features are not completed.
I'm writing this because I have similar problem on my current project but I know where we failed.

If the communication issue between the Team and the Customer is not fixed, the situation could be worse with waterfall, if the customer only sees the product once it is complete (tunnel effect).
You commented changes from sprints 6-7 started to cause rework of tasks achieved in earlier sprints. Those changes should have been detected earlier - during the Sprint Review.
If there is a misunderstanding in a feature description, and the Team does not implement what the customer is expecting, this should be detected no later than the Sprint where the feature is implemented, and ideally fixed in the current Sprint.
If the customer changed it's mind, the new ideas shall be added to the Product Backlog, prioritized and selected for a Sprint, as any other backlog item. This should not been deemed as rework.
Do you deliver the software to the customer after each sprint, or are you just demoing it ?
The origin of the miscommunication could be at the Sprint Planning: the Team should only commit on Backlog Item that are clearly defined. The definition of the items should comprises the acceptance criteria. Is the customer the Product Owner, and is it the Product Owner ?

Remote debugging of a development process is sufficiently difficult that I would hesitate to offer any opinion about what you should do. It seems to me noone outside your team can plausibly have enough information to make a very useful judgement about that.
A lesser jump to a conclusion would be to make a guess as to what went wrong. From your description, it sounds like early deliverables, which you thought were progress in the bank, ended up being majorly reworked.
One common cause of that is the late discovery/creation of 'all' requirements, things that are supposed to be true about everything in the scope of the project. These can be pretty fatal if taken seriously: something as simple as 'all dialog boxes must be resizable' is, for example, apparently beyond the capability of Microsoft to retrofit to Windows.
A classic account of this kind of failure (albeit in a non-agile project) can be found here
"Once they saw the product of the code we wrote, then they would say, 'Oh, we've got to change this. That isn't what I meant,'" said SAIC's Reynolds. "And that's when we started logging change request after change request after change request."
For example, according to SAIC engineers, after the eight teams had completed about 25 percent of the VCF, the FBI wanted a "page crumb" capability added to all the screens. Also known as "bread crumbs," a name inspired by the Hansel and Gretel fairy tale, this navigation device gives users a list of URLs identifying the path taken through the VCF to arrive at the current screen. This new capability not only added more complexity, the SAIC engineers said, but delayed development because completed threads had to be retrofitted with the new feature.
The key phrase there is 'all the screens'. In the face of changes of that nature, then, unless you have some pre-existing tool support you can just switch on (changing all background colours really should be trivial), you are in trouble. The progress you think you had made up to that point will have retroactively turned out to be illusory.
The only known approach to such issues is to get them right first time. If that fails, live with having them wrong.

A lot of shops add Agile trimmings to make themselves "look Agile" to customers who expect it. Maybe you just need to add some Waterfall trimmings, and show them the product once every 2 sprints.

I believe your client is wrong to move to waterfall. It's curing the symptom, not the disease.
The problem you describe is one of communication - the client wants a tiger, you're giving them a cat.
The waterfall model includes many steps to verify that the requirements as written are being delivered - but it doesn't ensure that the written requirements are what the business meant.
I would look at techniques like impact mapping, behaviour-driven development (BDD) and story mapping to improve communication.

Two questions regarding Scrum [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I have two related question regarding Scrum.
Our company is trying to implement it and sure we are jumping over hoops.
Both question are about "done means Done!"
1) It's really easy to define "Done" for tasks which are/have
- clear test acceptance criterias
- completely standalone
- tested at the end by testers
What should be done with tasks like:
- architecture design
- refactoring
- some utility classes development
The main issue with it, that it's almost completely internal entity
and there is no way to check/test it from outside.
As example feature implementation is kind of binary - it's done (and
passes all test cases) or it's not done (don't pass some test cases).
The best thing which comes to my head is to ask another developer to review
that task. However, it's any way doesn't provide a clear way to determine
is it completely done or not.
So, the question is how do you define "Done" for such internal tasks?
2) Debug/bugfix task
I know that agile methodology doesn't recommend to have big tasks. At least
if task is big, it should be divided on smaller tasks.
Let say we have some quite large problem - some big module redesign (to
replace new outdate architecture with new one). Sure, this task is divided
on dozens of small tasks. However, I know that at the end we will have
quite long session of debug/fix.
I know that's usually the problem of waterfall model. However, I think
it's hard to get rid of it (especially for quite big changes).
Should I allocate special task for debug/fix/system integrations
and etc?
In the case, if I do so, usually this task is just huge comparing to
everything else and it's kind of hard to divide it on smaller tasks.
I don't like this way, because of this huge monolith task.
There is another way. I can create smaller tasks (associated with bugs),
put them in backlog, prioritize and add them to iterations at the end
of activity, when I will know what are the bugs.
I don't like this way, because in such case the whole estimation will became
fake. We estimate the task, mark it ask complete at any time. And we will
open the new tasks for bugs with new estimates. So, we will end up with
actual time = estimate time, which is definitely not good.
How do you solve this problem?
Regards,
Victor

For the first part " architecture design - refactoring - some utility classes development" These are never "done" because you do them as you go. In pieces.
You want to do just enough architecture to get the first release going. Then, for the next release, a little more architecture.
Refactoring is how you find utility classes (you don't set out to create utility classes -- you discover them during refactoring).
Refactoring is something you do in pieces, as needed, prior to a release. Or as part of a big piece of functionality. Or when you have trouble writing a test. Or when you have trouble getting a test to pass and need to "debug".
Small pieces of these things are done over and over again through the life of the project. They aren't really "release candidates" so they're just sprints (or parts of sprints) that gets done in the process of getting to a release.

"Should I allocate special task for debug/fix/system integrations and etc?"
Not the same way you did with a waterfall methodology where nothing really worked.
Remember, you're building and testing incrementally. Each sprint is tested and debugged separately.
When you get to a release candidate, you might want to do some additional testing on that release. Testing leads to bug discovery which leads to backlog. Usually this is high-priority backlog that needs to be fixed before the release.
Sometimes integration testing reveals bugs that become low-priority backlog that doesn't need to be fixed before the next release.
How big is that release test? Not very. You've already tested each sprint... There shouldn't be too many surprises.

I would argue that if an internal activity has a benefit to the application (which all backlog items within scrum should have), done is the benefit is realized. For instance, "Design architecture" is too generic to identify the benefit of an activity. "Design architecture for user story A" identifies the scope of your activity. When you've created an architecture for story A, you're done with that task.
Refactoring should likewise be done in context of achieving a user story. "Refactor Customer class to enable multiple phone numbers to support Story B" is something that can be identified as done when the Customer class supports multiple phone numbers.

Third Question "some big module redesign (to replace new outdate architecture with new one). Sure, this task is divided on dozens of small tasks. However, I know that at the end we will have quite long session of debug/fix."
Each sprint creates something that can be released. Maybe it won't be, but it could be.
So, when you have major redesign, you have to eat the elephant one small piece at a time. First, look at the highest value -- most important -- biggest return to the users that you can do, get done, and release.
But -- you say -- there is no such small piece; each piece requires massive redesign before anything can be released.
I disagree. I think you can create a conceptual architecture -- what it will be when you're done -- but not implement the entire thing at once. Instead you create temporary interfaces, bridges, glue, connectors that will get one sprint done.
Then you modify the temporary interfaces, bridges and glue so you can finish the next sprint.
Yes, you've added some code. But, you've also created sprints that you can test and release. Sprints which are complete and any one can be a candidate release.

Sounds like you're blurring the definition of user story and task. Simply:
User stories add value. They're
created by a product owner.
Tasks are activities undertaken to create that
value. They're created by the
engineers.
You nailed key parts of the user story by saying they must have clear acceptance criteria, they're standalone, and they can be tested.
Architecture, design, refactoring, and utility classes development are tasks. They're what's done to complete a user story. It's up to each development shop to set different standards for these, but at our company, at least one other developer must have looked at the code (pair programming, code reading, code review).
If you have user stories which are "refactor class X" and "design feature Y", you're on the wrong track. It may be necessary to refactor X or design Y before you write code, but those could be tasks necessary to accomplish the user story "create new login widget".

We've run into similar issues with "behind-the-scenes" code. By "behind-the-scenes" I mean, has no apparent or testable business value.
In those cases, we've decided to define the developers of that portion of the code were the true "users". By creating sample applications and documentation that developers could use and test we had some "done" code.
Usually with scrum though, you would be looking for a piece of business functionality that used a piece of code to determine "done".

For technical tasks such as refactoring, you can check if the refactoring was really done, e.g. call X does no more have any f() method, or no more foobar() function.
There should be Trust towards the team and inside the team as well. Why do you want to review if the task is actually done ? did you encounter situations where someone claim a task were done ans it wasn't ?
For your second question, you should first really strive to break it into several smaller stories (backlog items). For instance, if you are re-architecturing the system, see if the new and the old architecture can coexist the time to do the portation of all your components from one to the other.
If this is really not possible, then this shall be done separately of the rest of the sprint backlog items, and not integrated before it is "done done". If the sprint ends before the completion of all the tasks of the item, then you have to estimate the remaining amount of work and replan it for the next iteration.
Here are twenty ways to split a story that could help having several smaller backlog items, with really is the recommended and safest way.

How do you measure if an interface change improved or reduced usability?

For an ecommerce website how do you measure if a change to your site actually improved usability? What kind of measurements should you gather and how would you set up a framework for making this testing part of development?

Multivariate testing and reporting is a great way to actually measure these kind of things.
It allows you to test what combination of page elements has the greatest conversion rate, providing continual improvement on your site design and usability.
Google Web Optimiser has support for this.

Similar methods that you used to identify the usability problems to begin with-- usability testing. Typically you identify your use-cases and then have a lab study evaluating how users go about accomplishing certain goals. Lab testing is typically good with 8-10 people.
The more information methodology we have adopted to understand our users is to have anonymous data collection (you may need user permission, make your privacy policys clear, etc.) This is simply evaluating what buttons/navigation menus users click on, how users delete something (i.e. changing quantity - are more users entering 0 and updating quantity or hitting X)? This is a bit more complex to setup; you have to develop an infrastructure to hold this data (which is actually just counters, i.e. "Times clicked x: 138838383, Times entered 0: 390393") and allow data points to be created as needed to plug into the design.

To push the measurement of an improvement of a UI change up the stream from end-user (where the data gathering could take a while) to design or implementation, some simple heuristics can be used:
Is the number of actions it takes to perform a scenario less? (If yes, then it has improved). Measurement: # of steps reduced / added.
Does the change reduce the number of kinds of input devices to use (even if # of steps is the same)? By this, I mean if you take something that relied on both the mouse and keyboard and changed it to rely only on the mouse or only on the keyboard, then you have improved useability. Measurement: Change in # of devices used.
Does the change make different parts of the website consistent? E.g. If one part of the e-Commerce site loses changes made while you are not logged on and another part does not, this is inconsistent. Changing it so that they have the same behavior improves usability (preferably to the more fault tolerant please!). Measurement: Make a graph (flow chart really) mapping the ways a particular action could be done. Improvement is a reduction in the # of edges on the graph.
And so on... find some general UI tips, figure out some metrics like the above, and you can approximate usability improvement.
Once you have these design approximations of user improvement, and then gather longer term data, you can see if there is any predictive ability for the design-level usability improvements to the end-user reaction (like: Over the last 10 projects, we've seen an average of 1% quicker scenarios for each action removed, with a range of 0.25% and standard dev of 0.32%).

The first way can be fully subjective or partly quantified: user complaints and positive feedbacks. The problem with this is that you may have some strong biases when it comes to filter those feedbacks, so you better make as quantitative as possible. Having some ticketing system to file every report from the users and gathering statistics about each version of the interface might be useful. Just get your statistics right.
The second way is to measure the difference in a questionnaire taken about the interface by end-users. Answers to each question should be a set of discrete values and then again you can gather statistics for each version of the interface.
The latter way may be much harder to setup (designing a questionnaire and possibly the controlled environment for it as well as the guidelines to interpret the results is a craft by itself) but the former makes it unpleasantly easy to mess up with the measurements. For example, you have to consider the fact that the number of tickets you get for each version is dependent on the time it is used, and that all time ranges are not equal (e.g. a whole class of critical issues may never be discovered before the third or fourth week of usage, or users might tend not to file tickets the first days of use, even if they find issues, etc.).

Torial stole my answer. Although if there is a measure of how long it takes to do a certain task. If the time is reduced and the task is still completed, then that's a good thing.
Also, if there is a way to record the number of cancels, then that would work too.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string