What is salami attack - security

I searched online and I could come up with this jibberish definition
Salami attack refers to a series of many small actions, often
performed by clandestine means, that as an accumulated whole produces
a much larger action or result that would be difficult or unlawful to
perform all at once
I could understand that it's a series of small attacks
What does clandestine means refer to?
How does the attack takes place?
And what are the countermeasures for it?

An example may help.
There are some relatively famous cases, of e.g. employees installing software or hardware and manipulating transactions with such miniscule amounts, that no one identifies the occurrence of fraud for lengthy amounts of time.
One relatively known example is a case, where chips were installed on gasoline pumps, overcharging customers only ever so slightly - this is also what clandestine refers to. The stolen sum becomes very large over time, but individual transactions are so small that they are barely noticeable.
To transfer this to a non-IT example:
Not salami:
Bank robbery, stealing 2 million directly.
Salami:
Being a bank employee and stealing miniscule amounts over a large period of time accumulating 2 million.
Further reading:
http://www.mekabay.com/nwss/116p--salami_fraud.pdf

Related

Procedural modelling classical Chinese visions of political order

The problem I’m dealing with at the moment involves a system described in the Guanzi. A large section of the book is about how governments should work to extract a surplus from the economy which they can redistribute to ensure the loyalty of existing followers and gain new ones. Under this system, whoever can redistribute the most wealth becomes the overall leader. However, he also has to out-compete the other individuals in the system: they are all busy trying to establish their own redistribution networks.
The result is a series of pyramid-shaped redistribution networks, both independent and nested.
Simplified visual representation of the expected outcome
These are dynamic across time and space. Gaining resources lets you acquire more followers, which in turn gives you access to more resources. There is also a random component involved: a bad harvest or a war may wipe out your resources. If one leader runs out of resources (whether as a result of a disaster or because he redistributed them too generously among his followers), he will either be supplanted by a follower or his network will collapse and its members leave to join other networks.
I think it is possible to model this algorithmically.
We can assume that willingness to share resources is innate.
Generosity = propensity score
An individual acquires followers as a function of both the surplus resources he possesses and his willingness to share them.
Followers[tn] = Surplus[t-1] * Generosity
It is worth noting that growth is endogenous in this model. It is a product of whatever economic growth coefficient is deemed realistic given technology and natural resources (a), as well as of the previous cycle’s surplus and the number of followers an individual has, on the basis that these constitute factors of production. (Note: I'm not interested in getting actual monetary values out of this, just modelling the relationships. I understand that if you plugged real numbers into it people would end up redistributing more than they own.)
Growth = a (Surplus[t-1] * Followers[t-1])
At T=0 the surplus enjoyed by each individual in the system must be generated randomly.
Surplus[t0] = randomly generated number
Followers generate additional resources for their leader, but they also need to be remunerated, meaning that they simultaneously deplete their leader’s resources, proportional to his generosity propensity score. A random component must also be included, as mentioned above, to account for famines, bumper crops, wars etc.
Surplus[tn] = Random Component (Surplus[t-1] + Growth) – (Followers[t-1] * Generosity)
Once these relationships have been defined, then the algorithm is relatively simple:
T1:
Each individual checks the Surplus*Generosity score of the nearest individual who is not already following him. If Individual A’s SG > Individual B’s SG, then Individual B moves closer to Individual A and becomes his follower. (Note: If individual B has followers of his own, he carries them with him. Also: Followers automatically re-check their leader's SG in every round, since he is the closest individual to them. They will leave his network to become free agents once more if his SG drops below their own.)
Otherwise, he does nothing.
T2 :
Each individual’s stats (Followers, Surplus) are recalculated based on the new situation.
Step 1 is repeated.
T3 :
Repeat previous step
One would expect the individuals with the optimal generosity score to build the biggest networks, as they acquire followers without completely depleting their resources.
I suspect – but am not sure – that this model’s characteristics are similar to those of an L-system model.
Individuals are programmed with a simple instruction: “If the person closest to you has a higher S*G score than you do, approach and follow him.”
On the basis of this the individuals form structures (from the perspective of the individual with the optimal S*G score, they appear to cluster around him in a semi-structured way)
These structures grow with every successive time period
They collapse after depleting their own resources, or when a random disaster strikes.
After a collapse, the process automatically begins again.
However, I'm not a maths or a computing guy (I'm a Chinese philosophy guy) so I'm not sure if I'm just being fooled by a superficial resemblance or not. Is this a genuine example of string rewriting or am I just convincing myself it is because you get tree-like structures out of it? Is this even a model that can work at all? Have I totally messed up my equations? (I haven't done this since high school, so it's highly probable.)
All help is gratefully received.

ab testing for product features

If Yelp wanted to understand if ratings helped users pick a listing, and we use the CTR as the success metric to run the ab test, how do we know that a significant change in CTR is due to just the ratings and not other parts of the listing like the reviews?
Do we have to do some kind of user segmentation instead of randomly assigning users before running the ab test?
Randomization takes care of all other variables but the treatment. Test on statistical significance takes care of the choice between the treatment and chance. It's only when you can't do a randomized trial, that you need to control for other differentiators.
You generally want to trust randomization for most experiments. Randomization is an unbiased process that with enough users, controls for all possible confounding factors, both known (eg. age, gender and OS) and unknown (eg. personality, hair color and sophistication), making comparisons between test and control groups balanced and fair. Since both groups are exposed and measured simultaneously, A/B testing also corrects for temporal and seasonal effects. Statistically significant differences between the test and control groups can be directly attributed to the change being tested. I wrote more about this in a blog post.
Going with a custom user segmentation is usually reserved for rare instances where randomization can be expected to produce unbalanced groups. This is generally rare, but an example is if you split a room of 100 people into two groups but Bill Gates and Elon Musk are in this room. Depending on what metric you want to measure, they can severely mess things up. Randomization will stick both billionaires in the same group half the time. This is a scenario where it's worth doing a custom segmentation and enforcing that they end up in different groups. But this sort of thing is generally rare and rarely affects binary metrics like CTR.

Diff Algorithm for Legislation

As part of an ambitious project, I am attempting to better understand the legislative text that is written into bills introduced in the U.S. Congress. I have electronic versions of recent bills, and am attempting to implement an algorithm that would compare a bill with prior bills, looking for similarities. The hypothesis is that many bills that fail end up getting co-opted into other bills.
Obviously, this is a large task. Many questions exist regarding difference engines, but my issue is slightly different. Many times bills are introduced that package several ideas together. So the difference engine would need to compare portions of bills, not the entire bills.
Any recommendations on difference algorithms or a method to go about doing this? I have access to serious computational power, but do keep in mind that I will be using a dataset of about 100,000 bills.
Take a look at Simian - Similarity Analyser. It works for plain text as well as code.
Very interesting idea. I would start by looking into longest common subsequence algorithms, and see about adapting them to (1) report any sequence over some threshold, say, 20 words, and (2) see if you can get them to handle a bit of fuzziness, in case a word or two gets changed. I'd suggest looking at the diff code to start.

Estimation technique for small business owners with small budget [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
Summary
We are start up and we provide software development services. We develop windows, web, services and mobile applications. We were aware of agile and we are scrum certified developers . We do user story based estimation and task planning. No issues.
Issue
We are approached by many small customers. Customers says very high level features or few words about the concept of their dream project. They asks for Effort Estimation and Cost Estimation. Mostly they are interested in Cost.
For each customer we did create the user stories and estimated the user stories and based on story points, we estimated the effort in days and we convert the days to the cost based on hourly rate. We involve the team of 3 or 4 people and get the estimation done. We spend at least 20 to 30 hours of team total time for estimation. (Team of 4 discussing for 5-6 hours)
The problem is that many customers would never turn back. We do not want to spend 20-30 hours of team effort. We don't want to use the exact user story estimation that we follow for contract signed project.
Question
What could be done in order to provide approximate estimate for small customers with small business?
I don't know there is a solution, other than to find 'better' customers. It sounds like you're doing it right to me. Non-technical customers often want you to spend 30min on the phone with them and then give them a price for the whole thing, so it's good you take the time over it properly. However then you often waste your time.
Maybe you need to say 'no' to customers who you don't think are serious. Or charge for the time spent doing highly skilled estimation work.
By 'better' customers I mean bigger companies, who are more experienced with software (and also probably have bigger budgets). The downside is more paperwork - you are much more 'free' dealing with small firms but also more at risk.
You don't have to stick to a fixed-price contract, for requirements that are vague you should look at doing time and materials.
Basically you need to spread the risk of the cost of overrun between you and the client.
A hybrid option might be to do some T&M proof of concept work then fixed price for the rest when you understand it better.
Alternatively, if you client has a pot of money then use your agile strengths to work with the customer to incrementally deliver functionality until they run out of money.
"We do not want to spend 20-30 hours of team effort."
Then don't.
If your estimating method is too costly, stop doing it.
"Customers ... are interested in Cost."
Then get them a cost more quickly. Do less work. Don't use a team of 3-4 for 20-30 hours. Have one person do it quickly.
One person can create a spreadsheet with stories, story points, priority, hours and cost. That's the project backlog for Scrum. That's the estimate. That's enough to start a conversation. It's not a fixed-price estimate.
A simple spreadsheet with stories, story points, price and priority is all you need. Then you can work with the customer to adjust priorities to determine how many of those stories they can actually afford to buy.
If they want a fixed price, you simply need to review each story summary to see if the points are right. You already have the spreadsheet, and the priorities, and the formula to compute price.
The unfortunate short answer is you are going to have to significantly reduce your estimation costs. The only way to do that is to reduce the number of people to one and use a formula approach.
Take a best guess. Put reasonable assumptions in the contract. If you do a decent job, some will be a little high, some will be a little low and it will average out in the end. If you are off by too much, track changes within the project and charge for them. The key will be in the assumptions placed in the contract, usually in the form of a statement of work.
This is normal business, from small projects to enterprise projects. Just a matter of scale. It doesn't have to be done this way, but it often is.
This reminds me of the challenge between time, money, people and overall quality. Some people can see this easily and others may struggle with the idea. Part of the key point here is to understand is what kind of expectations do you want to set and what kind of leeway do you have with the customer. For example, how are bugs in the software or overall support factored into the project.
You may want to consider how much work do you want to do upfront and on what kind of scale are you spending the 20-30 hours estimating a project. Comparing the cost of that much time spent generating an estimate, which # $40/hour is $800-1,200 by the way, to what the revenue from the project would be is something to consider. If the entire project is $400, then was it worth spending twice that coming up with an estimate? On the flip side, for million dollar projects, it may well make sense to spend that kind of money.
My suggestion would be to see if there is a cookie-cutter approach that could be taken for their projects so that there isn't as much variability to the projects if that is possible.
Theres two sides of estimation, first creating the initial 'ball park' figure this should be done relatively quickly, it should be emphasised its a ball park figure and its the start of your conversation with your customer, its not a contract. Second you do your more detailed team based estimation.
This is how this consulting company does it - Ball Park Estimating
Create a template spreadsheet with times and costs for typical pieces of work, look at the initial requirement and update your template. Start the conversation with this is the ball park but we will need to work together to confirm a final price and get a more accurate estimate. This more accurate estimation process will take x hours and cost x dollars.

with agile estimating, is it true some say to choose intervals like 1/2 to 1.5 days only? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
with agile estimating, is it true some say to choose intervals like 1/2 to 1.5 days only?
It tends to be a good rule of thumb (agile or not) that your tasks should be broken down into at most 1 - 2 day increments.
The idea is that if you have larger chunks than that then you haven't broken the task down enough and you will more likely miss the estimate and miss it by larger amounts of time than had you broke it down. Often when you break it down you discover your initial estimate was off and since you have broken the task down into more concrete tasks your estimate is now more accurate, more trackable and meaningful.
For tasks that are coming up on your to do list soon you should pay attention to this but for long range planning where you haven't necessarily thought out the feature in detail I think larger estimates / tasks not broken out for the feature is OK.
Here's a link to Joel Spolsky talking about this. Take a look at item #5 about half way down the page.
http://www.joelonsoftware.com/articles/fog0000000245.html
In my experience, any estimate that's longer than 2 days is probably hiding serious work that should be broken down further. Such estimates have a very high probability of going over. Try to break everything down into smaller chunks so that no individual chunk costs more than 1-2 days.
There are advantages to keeping the estimates short. It forces you to break up large tasks into small, discrete tasks that can be measured and discussed quickly, which helps promote the entire Agile development process.
That being said, I almost never keep a "rule" as a hard and fast rule with things like this. I'd say this is a good guideline, however.
My team consists of junior programmers (university students) and we've found that it's generally easier if we break all the large tasks down into a bunch of smaller ones. It involves more forward-thinking but in the end we are more productive and can it's easier to evaluate our progress. It also brings a sense of achievement when you have something completed at the end of the day.
I would agree with that guideline. Anytime I have ever taken on a 5 day task, it has degenerated to a three week nightmare. Large estimates indicate you didn't learn enough about the problem up front to know what is involved, because if you had, you could have found ways to break it up better.
I don't agree. If a team's iterations are two week long, the 10 days mean that 1 day would be spent for iteration close (show & tell), iteration planning and tasking or planning poker.
When playing planning poker, a team either geometric or Fibonacci progressions for estimates. For example, cards would contain values such as 1, 2, 4, 8, 16 or 1, 2, 3, 5, 8, 13. Each number reflects the number of days of development for a pair of programmers.
For each card, once discussion has occurred, each member simultaneously plays the card that reflects their estimate. If the majority of the team converges on the same estimate, the estimate is accepted. If there is much variation in the estimates, further discussion occurs (members explain the reason for their estimates) and another round of voting takes place. This occurs until consensus is reached.
If a number greater than 8 is picked, then the card is deemed to be too big and the card is refactored into at least 2 smaller cards. The reason being is that such a large estimate indicates the card is too big to be completed in a single iteration and any estimate is very likely to be inaccurate.
Using such a method brings commitment from the team members to delivery all they have committed to and for a new team the estimate become so accurate that carry over of cards soon become a low risk.
A very good post about agile estimation and planning you can find on the blog of agile42: Just enough, just in time
A lot of good answers here, so I'll play devil's advocate and approach it from a different side.
There's a possible problem with breaking down things into very small estimates (# of hours) when doing things such as release planning. David Anderson discusses it in his (excellent) book Agile Management for Software Engineering.
Basically, the idea is that with a task that is very small, a developer will pad his estimate by a fair bit (say, turning a half hour into an hour, or doubling it) because of a certain amount of ego that would be bruised if the developer failed to complete such a small task in the estimated time. These local buffers add up quite a bit and result in a global buffer that's far bigger than it needs to be.
This is less of a problem if you stick with .5 days as a min - it's basically assumed that there's some buffer in there, so you won't need to pad it any more.
I feel there is a bit of mix of information and overlapping in this thread... allow me to make my point :-)
1) The Fibonacci sequence, that is very much use through the Planning Poker technique from Mike Cohn, is about estimating "Complexity" of User Stories, which are - as Cam said - normally written on cards, and entail more than one task, at least all of those which will be needed to make a Story shippable (Ken Schwaber, Alistar Cockburn, Mike Cohn...)
2) The tasks that are included to complete a Story, are normally estimated in Ideal Hours or Pomodori (Francesco Cirillo, "The Pomodoro technique"). If you estimate in Ideal Hours normally the rule of thumb is to keep them between 1/2 day (3 ideal hours) and 2 days (12 ideal hours) of work. The reason for this is that doing so the team will have more qualitative status information by having at least every two days a team member reporting a Task as done, which is much more "valuable" than a 60% done. If you use Pomodori they are implicitly "timeboxed" to 25 min. each
The reason to keep tasks small comes basically from the "Empirical Process Control Theory" for which through transparency and regular inspection & adaption, you can better check the progress of your work, by quantifying it. The goal of having smaller tasks is to be able to clearly describe and envision in details what will be actually done, without adding too much of "guessing" given to the natural uncertainty deriving from having to predict "the future". Moreover defining an outcome and a shorter timebox allow people to keep the focus with enough "sense of urgency" to make it a challenging and motivating experience.
I would also catch up the point of the "motivation" and "ego" - from Chris - by adding that a good way to have people committed and motivated is to define the expected outcome of a task, so to be able to measure the results upon completion, and celebrate the success. This idea is encapsulated in the Pomodoro Technique, but can be achieved also using ideal hours for estimation. Another interesting part of the Pomodoro Technique is that "breaks" are considered "First Class Citizens" and planned regularly, which is very important especially in creative and brain intensive activities :-)
What do you think?
Best
ANdreaT

Resources