Voting economy: balancing credits properly - economics

Many websites today (including stackoverflow) and games allow people to perform voting, give feedback, enable additional features etc, according to a score: eg. reputation, or MMORPG credits.
As a programmer that will probably need to implement a community based website in the near future, I am interested in knowing about the existence of basic algorithms and decisions to be made so that everything is balanced. For example, the fact that one vote up grants 10 reputations and one down grants -2 was arbitrary or properly weighted ? How to decide the price of a given item and the rewards in a MMORPG, so that everything is balanced? I guess that WoW designers relied on their experience, but I am also sure that this experience can be found somewhere written down. Although this is a social problem, the pricing of a given feature and the reward for a given task are technical/mathematical ones, as you need to give a value to each feature according to some mathematical criteria (although not easy to devise, I guess)
Of course, this question could bring us far in terms of theory of economics, but I am sort of hoping that there are well defined and known simplified patterns and rules for this issue. I just don't know the keywords to query for.

Probably the most important thing to point out here is that this is a social problem not a technical one.
By that I mean that you could use the exact same system as SO on an MMORPG and it would flop or have really undesirable side effects. Whether a system works or not depends on the community you drop it into and the intended purpose. It can also depend on some luck whether people latch onto it or not. You may get early negative behaviour that sets the tone for future negativity and discourages positive involvement. Or it could go completely the other way.
There is no magic formula that made the vote/rep weighting what it is on SO other than long discussions about how to encourage certain behaviour and then some testing and fine-tuning. For example, a downvote costs 1 rep and is -2 rep to the recipient. The guiding principle was that downvotes should cost. After that, it was trial by error.
You might want to read The Value of Downvoting, or, How Hacker News Gets It Wrong and Vote Fraud for some of Jeff's and Joel's thoughts on that subject. Joel's Tech Talk on Stackoverflow at Google is also enlightening.

Voting is actually a very difficult problem. There are so many models of voting, and they all produce different results. For example, choosing your one favorite candidate versus ranking candidates produces a different result. Choosing your LEAST favorite candidate produces a different result. Organizing choices into good/bad produces different results.
Balancing then becomes something that can be done by asking the community. It's very difficult to balance games of that magnitude, simply because even your most exhaustive tests wont cover all of the cases. Having a properly established forum where users can give their opinions as well as having testers who watch out for balancing issues is probably the best way to go.
Oh, and if you want an abstract about the voting problem I mentioned, it's here:
http://www.cs.rochester.edu/~lane/computational-politics.html

Related

Are transformer-based language models overfitting on the paraphrase identification task? What tools overcome this?

I've been working on a sentence transformation task that involves paraphrase identification as a critical step: if we are confident enough that the state of the program (a sentence repeatedly modified) has become a paraphrase of a target sentence, stop transforming. The overall goal is actually to study potential reasoning in predictive models that can generate language prior to a target sentence. The approach is just one specific way of reaching that goal. Nevertheless, I've become interested in the paraphrase identification task itself, as it's received some boost from language models recently.
The problem I run into is when I manipulate sentences from examples or datasets. For example, in this HuggingFace example, if I negate either sequence or change the subject to Bloomberg, I still get a majority "is paraphrase" prediction. I started going through many examples in the MSRPC training set and negating one sentence in a positive example or making one sentence in a negative example a paraphrase of the other, especially when doing so would be a few word edit. I found to my surprise that various language models, like bert-base-cased-finetuned-mrpc and textattack/roberta-base-MRPC, don't change their confidences much on these sorts of changes. It's surprising as these models claim an f1 score of 0.918+. The dataset is clearly missing a focus on negative examples and small perturbative examples.
My question is, are there datasets, techniques, or models that deal well when given small edits? I know that this is an extremely generic question, much more than is typically asked on StackOverflow, but my concern is in finding practical tools. If there is a theoretical technique, then it might not be suitable as I'm in the category of "available tools define your approach" rather than vice-versa. So I hope that the community would have a recommendation on this.
Short answer to the question: yes, they are overfitting. Most of the important NLP data sets are not actually well-crafted enough to test what they claim to test, and instead test the ability of the model to find subtle (and not-so-subtle) patterns in the data.
The best tool I know for creating data sets that help deal with this is Checklist. The corresponding paper, "Beyond Accuracy: Behavioral Testing of NLP models with CheckList" is very readable and goes into depth on this type of issue. They have a very relevant table... but need some terms:
We prompt users to evaluate each capability with
three different test types (when possible): Minimum Functionality tests, Invariance, and Directional Expectation tests... A Minimum Functionality test (MFT), is a collection of simple examples (and labels) to check a
behavior within a capability. MFTs are similar to
creating small and focused testing datasets, and are
particularly useful for detecting when models use
shortcuts to handle complex inputs without actually
mastering the capability.
...An Invariance test (INV) is when we apply
label-preserving perturbations to inputs and expect
the model prediction to remain the same.
A Directional Expectation test (DIR) is similar,
except that the label is expected to change in a certain way. For example, we expect that sentiment
will not become more positive if we add “You are
lame.” to the end of tweets directed at an airline
(Figure 1C).
I haven't been actively involved in NLG for long, so this answer will be a bit more anecdotal than SO's algorithms would like. Starting with the fact that in my corner of Europe, the general sentiment toward peer review requirements for any kind of NLG project are higher by several orders of magnitude compared to other sciences - and likely not without reason or tensor thereof.
This makes funding a bigger challenge, so wherever you are, I wish you luck on that front. I'm not sure of how big of a deal this site is in the niche, but [Ehud Reiter's Blog][1] is where I would start looking into your tooling ideas.
Maybe even reach out to them/him personally, because I can't think of another source that has an academic background and a strong propensity for practical applications of NLG, at least based on the kind of content they've been putting out over the years.
Your background, environment/funding, and seniority level/control you have over the project will eventually compose your vector decision for you. I's just how it goes on the bleeding edge of anything. What I will add, though, is not to limit yourself to a single language or technology in this phase because of those precise reasons you've mentioned. I'd recommend the same in terms of potential open source involvement but if your profile information is accurate, that probably won't happen, no matter what you do and accomplish.
But yeah, in the grand scheme of things, your question is far from too broad, in my view. It identifies a rather unmistakable problem pattern that not all branches of science are as lackadaisical to approach as NLG-adjacent fields seem to be right now. In that regard, it's not broad enough and will need to be promulgated far and wide before community-driven tooling will give you serious options on a micro level.
Blasphemy, sure, but the performance is already stacked against you As for the question potentially being too broad, I'd posit it is not broad enough, so long as we collectively remain in a "oh, I was waiting for you to start doing something about it" phase.
P.S. I'd eliminate any Rust and ECMAScript alternatives prior to looking into Python, blapshemous as this might sound to a 2021 data scientist
. Some might ARight nowccounting forr the ridicule this would receive xou sltrsfx hsbr s fszs drz zhsz s mrnzsl rcrtvidr, sz lrsdz
due to performance easons.
[1]: https://ehudreiter.com/2016/12/18/nlg-vs-templates/

nlp: alternate spelling identification

Help by editing my question title and tags is greatly appreciated!
Sometimes one participant in my corpus of "conversations" will refer to another participant using a nickname, usually an abbreviation or misspelling, but hereafter I'll just say "nicknames". Let's say I'm willing to manually tell my software whether or not I think various possible nicknames are in fact nicknames, but I want software to come up with a list of possible matches between the handle's that identify people, and the potential nicknames. How would I go about doing that?
Background on me and then my corpus: I have no experience doing natural language processing but I'm a competent data analyst with R. My data is produced by 70 teams, each forecasting the likelihood of 100 distinct events occurring some time in the future. The result that I have 70 x 100 = 7000 text files, containing the stream of forecasts participants make and the comments they include with their forecasts. I'll paste a very short snip of one of these text files below, this one had to do with whether the Malian government would enter talks with the MNLA:
02/12/2013 20:10: past_returns answered Yes: (50%)
I hadn't done a lot of research when I put in my previous
placeholder... I'm bumping up a lot due to DougL's forecast
02/12/2013 19:31: DougL answered Yes: (60%)
Weak President Traore wants talks if MNLA drops territorial claims.
Mali's military may not want talks. France wants talks. MNLA sugggests
it just needs autonomy. But in 7 weeks?
02/12/2013 10:59: past_returns answered No: (75%)
placeholder forecast...
http://www.irinnews.org/Report/97456/What-s-the-way-forward-for-Mali
My initial thoughts: Obviously I can start by providing the names I'm looking to match things up with... in the above example they would be past_returns and DougL (though there is no use of nicknames in the above). I wouldn't think it'd be that hard to get a computer to guess at minor misspellings (though I wouldn't personally know where to start). I can imagine that other tricks could be used, like assuming that a string is more likely to be a nickname if it is used much much more by one team, than by other teams. A nickname is more likely to refer to someone who spoke recently than someone who spoke long ago, or not at all on regarding this question. And they should be used in sentences in a manner similar to the way the full name/screenname is typically used in the corpus. But I'm interested to hear about simple approaches, as well as ones that try to consider more sophisticated techniques.
This could get about as complicated as you want to make it. From the semi-linguistic side of things, research topics would include Levenshtein Distance (for detecting minor misspellings of known names/nicknames) and Named Entity Recognition (for the task of detecting names/nicknames in the first place). Actually, NER's worth reading about, but existing systems might not help you much in your domain of forum handles and nicknames.
The first rough idea that comes to mind is that you could run a tokenized version of your corpus against an English dictionary (perhaps a dataset compiled from Wiktionary or something like WordNet) to find words that are candidates for names, then filter those through some heuristics (do they start with the same letters as known full names? Do they have a low Levenshtein distance from known names? Are they used more than once?).
You could also try some clustering or supervised ML algorithms against the non-word tokens. That might reveal some non-"word" tokens that often occur in the same threads as a given username; again, heuristics could help rule out some false positives.
Good luck; sounds like a fun problem - hope I mentioned at least one thing you hadn't already thought of.

Giving up Agile, Switching to waterfall - Is this right? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am working in an Agile environment and things have gone to the state where the client feels that they would prefer Waterfall due to the failures (that's what they think) of the current Agile scenario. The reason that made them think like this would be the immense amount of design level changes that happened during the end stages of the sprints which we (developers) could not complete within the time they specified.
As usual, we both were blaming each other. From our perspective, the changes said at the end were too many and design/code alterations were too much. Whereas from the client's perspective, they complain that we (developers) are not understanding the requirements fully and coming up with solutions that were 'not' what they intended in the requirement. (like they have asked us to draw a tiger, and we drew a cat).
So, the client felt (not us) that Agile process is not correct and they want to switch to a Waterfall mode which IMHO would be disastrous. The simple reason being their satisfaction levels in a Agile mode itself were not enough, then how are they going to tolerate the output after spending so much time during the design phase of a Waterfall development?
Please give your suggestions.
First off - ask yourself are you really doing Agile? If you are then you should have already delivered a large portion of usable functionality to the client which satisfied their requirements in the earlier sprints. In theory, the "damage" should be limited to the final sprint where you discovered you needed large design changes. That being the case you should have proven your ability to deliver and now need a dialogue with the client to plan the changes now required.
However given your description I suspect you have fallen into the trap of just developing on a two week cycle without actually delivering into production each time and have a fixed end date in mind for the first proper release. If this is the case then you're really doing iterative waterfall without the requirements analysis/design up front - a bad place to be usually.
Full waterfall is not necessarily the answer (there's enough evidence to show what the problems are with it), but some amount of upfront planning and design is generally far preferable in practice to the "pure" Agile ethos of emergent architecture (which fits with a Lean approach actually). Big projects simply cannot hope to achieve a sensible stable architectural foundation if they just start hacking at code and hope it'll all come good some number of sprints down the line.
In addition to the above another common problem with "pure" Agile is client expectation management. Agile is sold as this wonderful thing that means the client can defer decisions, change their mind and add new requirements as they see fit. HOWEVER that doesn't mean the end date / budget / effort required remains fixed, but people always seem to miss that part.
The agile development methodologies are particularly appropriate when you have unclear requirements and when you may need to make design changes at later stages in your project. Waterfall is a less appropriate approach in this case. The waterfall approach is appropriate for projects which are well understood and when the requirements are unlikely to change during the project's lifetime. It doesn't sound like that is the case here.
How long are your sprints? An alternative approach might be to decrease the sprint length - at least at the start of the project. Deliver new versions to the customer more often and discuss the changes with the customer. If you aren't doing what they want this will become apparent more quickly so less time will be wasted on implementing solutions that don't meet the customer's requirements.
I'm not sure what kind of shop you run, so it's hard for me to come up with good recommendations. I can offer two guiding principles though:
If you have bad communication with the customer, no development methodology will save you.
It's none of the diner's business how a chef organizes the kitchen, as long as the meal is tasty.
It sounds like you have serious project management and architecture/design issues, and it sounds like your communications have also broken down. Fundamentally I don't think changing your dev methodology is going to fix any of that, and is therefore the wrong thing to be doing (though it may restore some client confidence).
I would be especially concerned about moving towards waterfall since you are now choosing to essentially capture the requirements just once (which we know you have a problem with) with no capacity for input. That rigidity is good for inflexible delivery targets, but it's completely inappropriate here where you have changes all the time - that's agile!
Short term I'd step back and double check your requirements at this stage with them. Renegotiate and confirm your current state in relation to those.
Medium term, I'd open up more communications with the client - try and get them involved in a daily scrum for a while (until you restore confidence, then you can be more flexible).
Long term, you have to be worried about how your PM's and senior devs have managed to get you into this position. If the client is being unreasoanable that's one thing (but it's still up to the PM to manage that, so you're not absolved). It's not reasonable to complain about having too many changes, that just means you screwed up in determining requirements (which is a dialogue, not a monologue) or that you have to have more numerous, but probably shorter sprints.
Above all, I can't see moving towards waterfall is possibly correct. It doesn't fix anything directly and I can only see it exacerbating the problems you've already highlighted.
Caveat: I'm not really capable of a balanced view on waterfall since I've never seen it work effectively and imho it's just completely outdated for enterprise projects.
Agile development does not save you from the burden of actually coming up with a design which both you and the customer understand similarily. Agile just makes it possible to come up with the design in smaller increments and not all at once. And, in the case of a difficult customer, coming up with a proper design takes time.
So, I would spend more effort in sitting down with the customer, with a whiteboard, going over what is it that they actually want. I don't think it really matters in this case if the development process is agile or waterfall.
Agile or waterfall are just words. There are only things that work, and things that don't.
Software development seems virtual to many people and they don't understand why it's hard to change a small thing they request.
Your customers should understand that building a software is just like building a house : when you have built all the foundations and walls, it's hard to change all the house final plan, and room design.
Some practices helps avoid this kind of problem : data modeling, data dictionary, data flow diagrams... the goal being to know every requirement in complete detail. Cutting your product in many independant blocks help starting coding while continuing designing or specifying other parts of your final product.
See Steve McConnell book : "Rapid Software Development : taming wild software schedule" for all the practices that work.
The reason that made them think like this would be the immense amount of design level changes that happened during the end stages of the sprints which we (developers) could not complete within the time they specified.
Scrum is in a way a "short waterfall", and you should be isolated from changing requirements for the sprint duration. It seems that this is not happening! Therefore, don't see you will gain anything from switching to traditional waterfall, but you should stick to freezing requirements for the sprint duration.
Maybe your iterations are too long?
(I assume you follow Scrum, since you mention sprints).
Talk to your clients and agree the following:
- Shorter iterations, up to 3 weeks max.
- No changes in requirements during the iteration.
- Features are planned at the beginning of the iteration
- Every iteration ends with deliverable: fully functional software with all features that are fully operational
- Iteration length does not change. Unfinished features are left for the next iteration (or maybe discarded if client changes his mind).
- Number of "feature points" you can deliver in a single iteration should be based on the team metric, not client insistence. This is your "capacity".
- Client decides what features (but not how many of them) are planned for the iteration
Another thing you should ask yourself is why there are so many "design level changes" in your application. By now, you should have basic architecture and design in place. Maybe you should review the actual design and try to impose some design guidelines and implement some patterns. For example, in a typical enterprise web app, you will probably end up using something like DAO. When you add new features, you create new DAO, but basic architecture and design will not change.
It seems however, that you are not delivering what the client wants. In that case, it is of outermost importance to deliver working product to the client, so he could provide sensible feedback for the next iteration.
Regarding
"we (developers) could not complete
within the time they specified."
The client should not be the one to specify the iteration time-frame. Iteration length should be always the same. The requirements that enter into the iteration should be obtain as a result of client prioritization, but the amount of requirements that is planned for the iteration should be based on the estimation that team performs and number of "points" you are able to deliver during iteration.
For me it sounds as if there was no "Big Plan[TM]" in the agile project. Using an agile process does not mean that there is no long term plan, it is more about to deal with the increasing uncertainty in the farer future. For example there should be a release plan with the planned features for all releases in the next 2 months (and a lesser detailed plan with features for the releases after that), so it is clear to the customer when to expect a feature, and when there is a possibility change requirements.
Also to me it seems that there was not (enough) customer involvement in the process. I know that this is a very problematic point, but it helps a lot if the current progress can be discussed with the customer at the end of each iteration. As #Mark Byers already wrote, the more feedback you can get from your customer the better you are.
Also try to not assign blame, as this keeps people to block. Try to use the inspect-and-adopt approach to get a better process instead.
It's not clear what sort of design changes you mean. Graphical design? User experience design? Code design?
In any event, the best solution is more, and earlier, discussions with the client. Jointly develop explicit, concrete examples that satisfy the client's requirements. You can turn these examples into regression tests to ensure that you continue to satisfy them.
Also, continue the discussions as you progress. Show your output as it is available--don't wait until near the end of the sprint. And work on the part most likely to generate problems first. Also look at ways to make it easier to change the things you're finding often change.
The point is to get the client more involved, even to the iteration of a design. Perhaps you'll want to have some discussions focused only on the design.
Your client does not know about how to develop software, or how to manage the software development process. Don't expect the client to provide meaningful instruction on these matters. As a special case, the client does not really know what terms such as 'waterfall' and 'agile' mean; don't expect them to provide meaningful input on your development methodology. Moreover, the client will not really care about these details, as long as the requirements are met within the agreed budget and timeframe. Don't expect them to care, and don't confuse them with lots of inadequate builds and irrelevant information on your internal process.
Here is what the client does care about, and is trying to talk to you about (partly using your own technical jargon): their requirements, their disappointed expectations, and the way you communicate with them. On these matters, the client is the absolute authority. Interpret what they are saying as being about your relationship and the product, not as usable commentary on internal process. Don't cloud the water with your internal deadlines and processes, discuss progress and expectations and the relationship. (If they insist on talking about internals you can remap the terms: e.g. what they understand as being 'the next release' may be internally known as 'the next major release', or whatever).
It sounds to me like the client may want a higher threshold before they get asked for feedback or play with a bad build. It's worth verifying if this is true. If so, you should honor that - and still use agile methods internally if that is what your team feels is best. If they say "waterfall," you may be able to interpret that internally as meaning "we set a deadline for requirements, and then we don't allow more features to be added for a while." Discuss with the client whether it will suit them to have a requirements deadline followed by this sort of freeze.
Someone on your team needs to be the client advocate, and sit on top of the client's issues and fight for them. This advocate must not be sidelined, nor can they take the team's side against the client; they should be the proxy-boss. Then you can separate the internal process communication (team to advocate) from the external communication (advocate to client). The advocate can in some measure insulate the client from the chatter and the builds they don't appreciate, without artificially imposing a certain sort of management or scheduling on your internal process.
To clarify, I do not at all think that you should be secretive or distant with the client, but you should (A) listen to what the client is saying about the relationship and how you are communicating and honor that, (B) keep that separate from internal development process, which should be managed in whatever way will ultimately meet client's expectations.
Fire the client. Even if it is your fault for not understanding what they mean, waterfall would give them 1 chance to give you feedback instead of a chance at the end of each sprint. Some people/clients are literally so stupid that they are not worth working for. Fire them, or tell them that you're using Waterfall without actually switching.
Obvious problem here is communication with customer. If you really want to do agile you have to communicate with customer on daily basics. Only customer should be able to make decision. If you communicate with customer only during mid spring and at the end of the sprint it is natural that later on you will found problems in your application. Also features implemented in sprint has to be accepted and tested by customer. Until that features are not completed.
I'm writing this because I have similar problem on my current project but I know where we failed.
If the communication issue between the Team and the Customer is not fixed, the situation could be worse with waterfall, if the customer only sees the product once it is complete (tunnel effect).
You commented changes from sprints 6-7 started to cause rework of tasks achieved in earlier sprints. Those changes should have been detected earlier - during the Sprint Review.
If there is a misunderstanding in a feature description, and the Team does not implement what the customer is expecting, this should be detected no later than the Sprint where the feature is implemented, and ideally fixed in the current Sprint.
If the customer changed it's mind, the new ideas shall be added to the Product Backlog, prioritized and selected for a Sprint, as any other backlog item. This should not been deemed as rework.
Do you deliver the software to the customer after each sprint, or are you just demoing it ?
The origin of the miscommunication could be at the Sprint Planning: the Team should only commit on Backlog Item that are clearly defined. The definition of the items should comprises the acceptance criteria. Is the customer the Product Owner, and is it the Product Owner ?
Remote debugging of a development process is sufficiently difficult that I would hesitate to offer any opinion about what you should do. It seems to me noone outside your team can plausibly have enough information to make a very useful judgement about that.
A lesser jump to a conclusion would be to make a guess as to what went wrong. From your description, it sounds like early deliverables, which you thought were progress in the bank, ended up being majorly reworked.
One common cause of that is the late discovery/creation of 'all' requirements, things that are supposed to be true about everything in the scope of the project. These can be pretty fatal if taken seriously: something as simple as 'all dialog boxes must be resizable' is, for example, apparently beyond the capability of Microsoft to retrofit to Windows.
A classic account of this kind of failure (albeit in a non-agile project) can be found here
"Once they saw the product of the code we wrote, then they would say, 'Oh, we've got to change this. That isn't what I meant,'" said SAIC's Reynolds. "And that's when we started logging change request after change request after change request."
For example, according to SAIC engineers, after the eight teams had completed about 25 percent of the VCF, the FBI wanted a "page crumb" capability added to all the screens. Also known as "bread crumbs," a name inspired by the Hansel and Gretel fairy tale, this navigation device gives users a list of URLs identifying the path taken through the VCF to arrive at the current screen. This new capability not only added more complexity, the SAIC engineers said, but delayed development because completed threads had to be retrofitted with the new feature.
The key phrase there is 'all the screens'. In the face of changes of that nature, then, unless you have some pre-existing tool support you can just switch on (changing all background colours really should be trivial), you are in trouble. The progress you think you had made up to that point will have retroactively turned out to be illusory.
The only known approach to such issues is to get them right first time. If that fails, live with having them wrong.
A lot of shops add Agile trimmings to make themselves "look Agile" to customers who expect it. Maybe you just need to add some Waterfall trimmings, and show them the product once every 2 sprints.
I believe your client is wrong to move to waterfall. It's curing the symptom, not the disease.
The problem you describe is one of communication - the client wants a tiger, you're giving them a cat.
The waterfall model includes many steps to verify that the requirements as written are being delivered - but it doesn't ensure that the written requirements are what the business meant.
I would look at techniques like impact mapping, behaviour-driven development (BDD) and story mapping to improve communication.

Agile - User Story Definitions [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm writing a small app for my friend's business, and thought I'd take the opportunity to brush up on some Agile Project Management training I did at the start of the year.
I (and I think, my current organisation!) have always struggled with gathering requirements in the form of User Stories, which take the form:
As a [User Type] I want [feature] so that [some benefit]
I'm always tempted to miss out the beginning and end, and just leave the feature - but this then just becomes requirements gathering the old way!
But I don't want to just make it fit, so that I can say 'I'm doing Agile'.... for example, if I know that the user is to be presented with a list of items, then the reason is self-evident, is it not?
e.g.
As a [Store Manager] I want [to see a list of Stock Items] so that ... ?
Is it normal practice to leave out the [so that] clause?
We used to miss it out as well. And by leaving it out we missed a lot.
To understand the feature properly and not just do the thing right but DO THE RIGHT THING it is key to know WHY the feature, and for that the next key is WHO (the role)
In DDD terms, stakeholder. Stakeholders can be different, everyone who cares. From programmers and db admins to all the types of users.
So, first understand, who is the stakeholder, then you know 50% of WHY he cares, then the benefit, and then it is already almost obviously WHAT to implement.
Try to not just write "as a user". Specify. "as store manager", or even "as the lead of the shift responsible for closing the day", i need....so that....
Maybe you can implement something different which will give the same stakeholder even better benefit!!!
Try, To Achieve [Business Value] As [User] I need [Feature].
The goal is to focus on the value the feature delivers. It helps you think in vertical slices, which reduces pure "technical tasks" that aren't visible. It's not an easy transition, but when you start thinking vertically you start really being able to reduce the waste in your process.
Another way is to thinking of the acceptance tests that your customer could write to ensure the feature would work. It's a short jump to then using something like FitNesse to automated those tests.
No, it's actually not obvious - there are a lot of reasons to want to see a list, a lot of things you might want to with it - scan it for some info, get an overview, print it, copy and paste it into a word document etc. And what exactly it is will give you valuable hints on reasonable implementation details - formatting of the list, exact content; or even a hint that a different feature might be a better idea to satisfy that need. Don't be surprised to find out that the reason actually is "so that I can count the number of entries"...
Of course, this might in fact not apply to you. My actual point in fact is that there are reasons that people came up with this template - and there are also reasons that a lot of experienced people don't actually use it. And when you are new to the practice, you are not in a good position to assess all the pros and cons of following a practice, so I'd highly recommend to simply try to follow it closely for some time. You might be surprised by the usefulness of it - or not, in which case you still learned something and can drop it with a clear concise... :)
User Stories is another way of saying you need to interview your users to find out what they want and what problems they are trying to solve. That the heart of having this in agile development. If the form is not working for your then take a step back and try a different approach that feels more natural to you or better suited to your capabilities as a writer.
In short don't feel like you have to be in a straight jacket. The important thing is that you follow the spirit of the methodology.
In this specific case you want to get a list of what problems the user has, why they are problems, and what they think will help them.
I think you should really try to get a reason defined, even if it may seem obvious. If you can't come up with a reason then why build the feature in the first place? Also the reason may point out other deficiencies in the design that could trigger improvements in other areas.
I often categorize my stories by the user/persona that it primarily relates to, thus I don't put the user's identity in the story title. My stories also are bigger than some agile methodologies suggest. Usually, I start with a title. I use it for planning purposes. Once I get close to actually working on that story, I flesh it out with some details -- basic idea, constraints, assumptions, related stories -- so that I capture more of the information that I know about it. I also keep my stories in a wiki, not on note cards. I understand the trade-off -- i.e., I may spend too much time on details before I need them, but I am able to capture and share it with, typically, off-site customers easily.
The bottom line for me is that Agile is a philosophy, rather than a specification. There are particular implementations that may (strongly) suggest that you do things a certain way and may be non-negotiable on some items. For example, it's hard to say you're doing XP if you don't pair program. In general, though, I would say that most agilists would say that you ought to do those things that work for you, in the way that they work for you -- as long as they are consistent with the general principles, you can still call yourself agile. The general principles would include things like release early/release often, unit testing, short iterations, acknowledge that change will happen, delay detailed planning until you are ready to implement, ...
Bottom line for me: if the stories work for you without the user and rationale -- as long as you understand who the user is and why they want something -- do it however you want. Just don't require a complete specification before you start implementing.

How do you measure if an interface change improved or reduced usability?

For an ecommerce website how do you measure if a change to your site actually improved usability? What kind of measurements should you gather and how would you set up a framework for making this testing part of development?
Multivariate testing and reporting is a great way to actually measure these kind of things.
It allows you to test what combination of page elements has the greatest conversion rate, providing continual improvement on your site design and usability.
Google Web Optimiser has support for this.
Similar methods that you used to identify the usability problems to begin with-- usability testing. Typically you identify your use-cases and then have a lab study evaluating how users go about accomplishing certain goals. Lab testing is typically good with 8-10 people.
The more information methodology we have adopted to understand our users is to have anonymous data collection (you may need user permission, make your privacy policys clear, etc.) This is simply evaluating what buttons/navigation menus users click on, how users delete something (i.e. changing quantity - are more users entering 0 and updating quantity or hitting X)? This is a bit more complex to setup; you have to develop an infrastructure to hold this data (which is actually just counters, i.e. "Times clicked x: 138838383, Times entered 0: 390393") and allow data points to be created as needed to plug into the design.
To push the measurement of an improvement of a UI change up the stream from end-user (where the data gathering could take a while) to design or implementation, some simple heuristics can be used:
Is the number of actions it takes to perform a scenario less? (If yes, then it has improved). Measurement: # of steps reduced / added.
Does the change reduce the number of kinds of input devices to use (even if # of steps is the same)? By this, I mean if you take something that relied on both the mouse and keyboard and changed it to rely only on the mouse or only on the keyboard, then you have improved useability. Measurement: Change in # of devices used.
Does the change make different parts of the website consistent? E.g. If one part of the e-Commerce site loses changes made while you are not logged on and another part does not, this is inconsistent. Changing it so that they have the same behavior improves usability (preferably to the more fault tolerant please!). Measurement: Make a graph (flow chart really) mapping the ways a particular action could be done. Improvement is a reduction in the # of edges on the graph.
And so on... find some general UI tips, figure out some metrics like the above, and you can approximate usability improvement.
Once you have these design approximations of user improvement, and then gather longer term data, you can see if there is any predictive ability for the design-level usability improvements to the end-user reaction (like: Over the last 10 projects, we've seen an average of 1% quicker scenarios for each action removed, with a range of 0.25% and standard dev of 0.32%).
The first way can be fully subjective or partly quantified: user complaints and positive feedbacks. The problem with this is that you may have some strong biases when it comes to filter those feedbacks, so you better make as quantitative as possible. Having some ticketing system to file every report from the users and gathering statistics about each version of the interface might be useful. Just get your statistics right.
The second way is to measure the difference in a questionnaire taken about the interface by end-users. Answers to each question should be a set of discrete values and then again you can gather statistics for each version of the interface.
The latter way may be much harder to setup (designing a questionnaire and possibly the controlled environment for it as well as the guidelines to interpret the results is a craft by itself) but the former makes it unpleasantly easy to mess up with the measurements. For example, you have to consider the fact that the number of tickets you get for each version is dependent on the time it is used, and that all time ranges are not equal (e.g. a whole class of critical issues may never be discovered before the third or fourth week of usage, or users might tend not to file tickets the first days of use, even if they find issues, etc.).
Torial stole my answer. Although if there is a measure of how long it takes to do a certain task. If the time is reduced and the task is still completed, then that's a good thing.
Also, if there is a way to record the number of cancels, then that would work too.

Resources