Procedural modelling classical Chinese visions of political order - modeling

The problem I’m dealing with at the moment involves a system described in the Guanzi. A large section of the book is about how governments should work to extract a surplus from the economy which they can redistribute to ensure the loyalty of existing followers and gain new ones. Under this system, whoever can redistribute the most wealth becomes the overall leader. However, he also has to out-compete the other individuals in the system: they are all busy trying to establish their own redistribution networks.
The result is a series of pyramid-shaped redistribution networks, both independent and nested.
Simplified visual representation of the expected outcome
These are dynamic across time and space. Gaining resources lets you acquire more followers, which in turn gives you access to more resources. There is also a random component involved: a bad harvest or a war may wipe out your resources. If one leader runs out of resources (whether as a result of a disaster or because he redistributed them too generously among his followers), he will either be supplanted by a follower or his network will collapse and its members leave to join other networks.
I think it is possible to model this algorithmically.
We can assume that willingness to share resources is innate.
Generosity = propensity score
An individual acquires followers as a function of both the surplus resources he possesses and his willingness to share them.
Followers[tn] = Surplus[t-1] * Generosity
It is worth noting that growth is endogenous in this model. It is a product of whatever economic growth coefficient is deemed realistic given technology and natural resources (a), as well as of the previous cycle’s surplus and the number of followers an individual has, on the basis that these constitute factors of production. (Note: I'm not interested in getting actual monetary values out of this, just modelling the relationships. I understand that if you plugged real numbers into it people would end up redistributing more than they own.)
Growth = a (Surplus[t-1] * Followers[t-1])
At T=0 the surplus enjoyed by each individual in the system must be generated randomly.
Surplus[t0] = randomly generated number
Followers generate additional resources for their leader, but they also need to be remunerated, meaning that they simultaneously deplete their leader’s resources, proportional to his generosity propensity score. A random component must also be included, as mentioned above, to account for famines, bumper crops, wars etc.
Surplus[tn] = Random Component (Surplus[t-1] + Growth) – (Followers[t-1] * Generosity)
Once these relationships have been defined, then the algorithm is relatively simple:
T1:
Each individual checks the Surplus*Generosity score of the nearest individual who is not already following him. If Individual A’s SG > Individual B’s SG, then Individual B moves closer to Individual A and becomes his follower. (Note: If individual B has followers of his own, he carries them with him. Also: Followers automatically re-check their leader's SG in every round, since he is the closest individual to them. They will leave his network to become free agents once more if his SG drops below their own.)
Otherwise, he does nothing.
T2 :
Each individual’s stats (Followers, Surplus) are recalculated based on the new situation.
Step 1 is repeated.
T3 :
Repeat previous step
One would expect the individuals with the optimal generosity score to build the biggest networks, as they acquire followers without completely depleting their resources.
I suspect – but am not sure – that this model’s characteristics are similar to those of an L-system model.
Individuals are programmed with a simple instruction: “If the person closest to you has a higher S*G score than you do, approach and follow him.”
On the basis of this the individuals form structures (from the perspective of the individual with the optimal S*G score, they appear to cluster around him in a semi-structured way)
These structures grow with every successive time period
They collapse after depleting their own resources, or when a random disaster strikes.
After a collapse, the process automatically begins again.
However, I'm not a maths or a computing guy (I'm a Chinese philosophy guy) so I'm not sure if I'm just being fooled by a superficial resemblance or not. Is this a genuine example of string rewriting or am I just convincing myself it is because you get tree-like structures out of it? Is this even a model that can work at all? Have I totally messed up my equations? (I haven't done this since high school, so it's highly probable.)
All help is gratefully received.

Related

What is salami attack

I searched online and I could come up with this jibberish definition
Salami attack refers to a series of many small actions, often
performed by clandestine means, that as an accumulated whole produces
a much larger action or result that would be difficult or unlawful to
perform all at once
I could understand that it's a series of small attacks
What does clandestine means refer to?
How does the attack takes place?
And what are the countermeasures for it?
An example may help.
There are some relatively famous cases, of e.g. employees installing software or hardware and manipulating transactions with such miniscule amounts, that no one identifies the occurrence of fraud for lengthy amounts of time.
One relatively known example is a case, where chips were installed on gasoline pumps, overcharging customers only ever so slightly - this is also what clandestine refers to. The stolen sum becomes very large over time, but individual transactions are so small that they are barely noticeable.
To transfer this to a non-IT example:
Not salami:
Bank robbery, stealing 2 million directly.
Salami:
Being a bank employee and stealing miniscule amounts over a large period of time accumulating 2 million.
Further reading:
http://www.mekabay.com/nwss/116p--salami_fraud.pdf

Detecting damaged car parts

I am trying to build a system that on providing an image of a car can assess the damage percentage of it and also find out which parts are damaged in the car.
Is there any possible way to do this using Python and open-cv or tensorflow ?
The GitHub repositories I found that were relevant to my work are these
https://github.com/VakhoQ/damage-car-detector/tree/master/DamageCarDetector
https://github.com/neokt/car-damage-detective
But what they provide is a qualitative output( like they say the car damage is high or low), I wanted to print out a quantitative output( percentage of damage ) along with the individual part names which are damaged
Is this possible ?
If so please help me out.
Thank you.
To extend the good answers given by #yves-daoust: It is not a trivial task and you should not try to do it at once with one single approach.
You should question yourself how a human with a comparable task, i.e. say an expert who reviews these cars after a leasing contract, proceeds with this. Then you have to formulate requirements and also restrictions for your system.
For instance, an expert first checks for any visual occurences and rates these, then they may check technical issues which may well be hidden from optical sensors (i.e. if the car is drivable, driving a round and estimate if the engine is running smoothly, the steering geometry is aligned (i.e. if the car manages to stay in line), if there are any minor vibrations which should not be there and so on) and they may also apply force (trying to manually shake the wheels to check if the bearings are ok).
If you define your measurement system as restricted to just a normal camera sensor, you are somewhat limited within to what extend your system is able to deliver.
If you just want to spot cosmetic damages, i.e. classification of scratches in paint and rims, I'd say a state of the art machine vision application should be able to help you to some extent:
First you'd need to detect the scratches. Bear in mind that visibility of scratches, especially in the field with changing conditions (sunlight) may be a very hard to impossible task for a cheap sensor. I.e. to cope with reflections a system might need to make use of polarizing filters, special effect paints may interfere with your optical system in a way you are not able to spot anything.
Secondly, after you detect the position and dimension of these scratches in the camera coordinates, you need to transform them into real world coordinates for getting to know the real dimensions of these scratches. It would also be of great use to know the exact location of the scratch on the car (which would require a digital twin of the car - which is not to be trivially done anymore).
After determining the extent of the scratch and its position on the car, you need to apply a cost model. Because some car parts are easily fixable, say a scratch in the bumper, just respray the bumper, but scratch in the C-Pillar easily is a repaint for the whole back quarter if it should not be noticeable anymore.
Same goes with bigger scratches / cracks: The optical detection model needs to be able to distinguish between scratches and cracks (which is very hard to do, just by looking at it) and then the cost model can infer the cost i.e. if a bumper needs just respray or needs complete replacement (because it is cracked and not just scratched). This cost model may seem to be easy but bear in mind this needs to be adopted to every car you "scan". Because one cheap damage for the one car body might be a very hard to fix damage for a different car body. I'd say this might even be harder than to spot the inital scratches because you'd need to obtain the construction plans/repair part lists (the repair handbooks / repair part lists are mostly accessible if you are a registered mechanic but they might cost licensing fees) of any vehicle you want to quote.
You see, this is a very complex problem which is composed of multiple hard sub-problems. The easiest or probably the best way to do this would be to do a bottom up approach, i.e. starting with a simple "scratch detector" which just spots scratches in paint. Then go from there and you easily see what is possible and what is not

ab testing for product features

If Yelp wanted to understand if ratings helped users pick a listing, and we use the CTR as the success metric to run the ab test, how do we know that a significant change in CTR is due to just the ratings and not other parts of the listing like the reviews?
Do we have to do some kind of user segmentation instead of randomly assigning users before running the ab test?
Randomization takes care of all other variables but the treatment. Test on statistical significance takes care of the choice between the treatment and chance. It's only when you can't do a randomized trial, that you need to control for other differentiators.
You generally want to trust randomization for most experiments. Randomization is an unbiased process that with enough users, controls for all possible confounding factors, both known (eg. age, gender and OS) and unknown (eg. personality, hair color and sophistication), making comparisons between test and control groups balanced and fair. Since both groups are exposed and measured simultaneously, A/B testing also corrects for temporal and seasonal effects. Statistically significant differences between the test and control groups can be directly attributed to the change being tested. I wrote more about this in a blog post.
Going with a custom user segmentation is usually reserved for rare instances where randomization can be expected to produce unbalanced groups. This is generally rare, but an example is if you split a room of 100 people into two groups but Bill Gates and Elon Musk are in this room. Depending on what metric you want to measure, they can severely mess things up. Randomization will stick both billionaires in the same group half the time. This is a scenario where it's worth doing a custom segmentation and enforcing that they end up in different groups. But this sort of thing is generally rare and rarely affects binary metrics like CTR.

Crowdsourcing reliability measurements - spam/fraud detection

I'd like to collect some kind of geographical information from website users - for given set of data they will mark checkbox indicating whether place has or has not given property. Are there any tools/frameworks for detecting fraud or spam submissions based on whole colected data set (and possibly other info)? I'd like to get filtered, more reliable data.
Not sure if that's exactly what you're asking for, but here are some tips from my experience using Amazon Turk:
There are several academic papers dealing with such problems. here is a good one.
In addition, based on the following general recommendations, I've created a custom procedure which worked on my data:
a. Include an open question, and filter out cases where it wasn't answered. It's harder to answer such a question automatically, and it might also be more time-consuming, thus less attractive, for a fraudster.
b. If possible, don't use a binary scale (i.e. a checkbox), but some grade (e.g. 1-4 or 1-6). This would give you more data to work with.
c. If available, filter out cases where the time spent in filling your form was too short. (especially useful if you include that open question)
d. If you have multiplicity of inputs per user, check for repetitive answers, and for users which consistently give far-from-average answers.
If each user submits only a single "form", consider putting more than a single element/question in it, so you'll get multiple submissions per-user.
e. If you have only a single submission per user or user-id, your options are more limited. I can suggest filtering out outliars, (e.g. data points farther than 3 standard deviations from the average), in case you have enough data.
f. After all the filtering, check the agreement or disagreement in your data (e.g. by checking what proportion of your data points fall within x standard deviations from the average). In case of agreement, use the average; in case of disagreement, collect some more data.
Hope it helps,

How do you measure if an interface change improved or reduced usability?

For an ecommerce website how do you measure if a change to your site actually improved usability? What kind of measurements should you gather and how would you set up a framework for making this testing part of development?
Multivariate testing and reporting is a great way to actually measure these kind of things.
It allows you to test what combination of page elements has the greatest conversion rate, providing continual improvement on your site design and usability.
Google Web Optimiser has support for this.
Similar methods that you used to identify the usability problems to begin with-- usability testing. Typically you identify your use-cases and then have a lab study evaluating how users go about accomplishing certain goals. Lab testing is typically good with 8-10 people.
The more information methodology we have adopted to understand our users is to have anonymous data collection (you may need user permission, make your privacy policys clear, etc.) This is simply evaluating what buttons/navigation menus users click on, how users delete something (i.e. changing quantity - are more users entering 0 and updating quantity or hitting X)? This is a bit more complex to setup; you have to develop an infrastructure to hold this data (which is actually just counters, i.e. "Times clicked x: 138838383, Times entered 0: 390393") and allow data points to be created as needed to plug into the design.
To push the measurement of an improvement of a UI change up the stream from end-user (where the data gathering could take a while) to design or implementation, some simple heuristics can be used:
Is the number of actions it takes to perform a scenario less? (If yes, then it has improved). Measurement: # of steps reduced / added.
Does the change reduce the number of kinds of input devices to use (even if # of steps is the same)? By this, I mean if you take something that relied on both the mouse and keyboard and changed it to rely only on the mouse or only on the keyboard, then you have improved useability. Measurement: Change in # of devices used.
Does the change make different parts of the website consistent? E.g. If one part of the e-Commerce site loses changes made while you are not logged on and another part does not, this is inconsistent. Changing it so that they have the same behavior improves usability (preferably to the more fault tolerant please!). Measurement: Make a graph (flow chart really) mapping the ways a particular action could be done. Improvement is a reduction in the # of edges on the graph.
And so on... find some general UI tips, figure out some metrics like the above, and you can approximate usability improvement.
Once you have these design approximations of user improvement, and then gather longer term data, you can see if there is any predictive ability for the design-level usability improvements to the end-user reaction (like: Over the last 10 projects, we've seen an average of 1% quicker scenarios for each action removed, with a range of 0.25% and standard dev of 0.32%).
The first way can be fully subjective or partly quantified: user complaints and positive feedbacks. The problem with this is that you may have some strong biases when it comes to filter those feedbacks, so you better make as quantitative as possible. Having some ticketing system to file every report from the users and gathering statistics about each version of the interface might be useful. Just get your statistics right.
The second way is to measure the difference in a questionnaire taken about the interface by end-users. Answers to each question should be a set of discrete values and then again you can gather statistics for each version of the interface.
The latter way may be much harder to setup (designing a questionnaire and possibly the controlled environment for it as well as the guidelines to interpret the results is a craft by itself) but the former makes it unpleasantly easy to mess up with the measurements. For example, you have to consider the fact that the number of tickets you get for each version is dependent on the time it is used, and that all time ranges are not equal (e.g. a whole class of critical issues may never be discovered before the third or fourth week of usage, or users might tend not to file tickets the first days of use, even if they find issues, etc.).
Torial stole my answer. Although if there is a measure of how long it takes to do a certain task. If the time is reduced and the task is still completed, then that's a good thing.
Also, if there is a way to record the number of cancels, then that would work too.

Resources