Comparing password length distributions of two different policies

Comparing password length distributions of two different policies - statistics

I have set A of passwords created under policy A and set B of passwords created under policy B. Maximum password length in both conditions is 15. The number of passwords in each set is nearly 70.
I found that 50% passwords in set A are of length 8 and no password in set A has length 11 and 13.
While in set B passwords length vary from 8 to 15. The proportion of passwords in set B having length 8 to 15 doesn't seem to vary much.
Clearly policy B is winner, as the password length is relatively unpredictable.
How do I capture this intuition more formally. Is there any statistical measure to do so? I have the password length distribution for set A as well as set B.
Please help.

Related

Given column + rows of a terminal / tty, how to calculate min/max number of bytes that can fit

Say we get the current columns and rows of a terminal with node.js:
console.log('rows:', process.stdout.rows);
console.log('columns:', process.stdout.columns);
is there a way to calculate the number of bytes that can fit in the terminal window? I mean I would guess that it's rows*columns, but I really have no idea.
My guess is that rows*columns is the max number of bytes that can fit, but in reality, it's probably less, it wouldn't be exact.

The maximum number depends on the nominal size of the window (rows times columns) as well as the way the character cells are encoded. The node application assumes everything is encoded as UTF-8, so that means each cell could be 4 bytes (see this answer for example).
Besides that, you have to allow for a newline at the end of each row (unless you're relying upon line-wrapping the whole time). A newline is a single byte.
So...
(1 + columns) * rows * 4
as a first approximation.
If you take combining characters into account, that could increase the estimate, but (see this answer) the limit on that is not well defined. In practice, those are rarely used in European characters, but are used in some Asian characters (ymmv).

Create exchanges with bounded random parameters and fixed sum to be used in Montecarlo

I have to run a montecarlo where, for some products, certain exchanges are relate to each other in the sense that my process can take as input any of the products in different (bounded) proportions but with fixed sum.
Example:
my product a takes as inputs a total of 10 kg of x,y, and z alltogheter and x has a uniform distribution that goes from 0 to 4 kg, y from 1 to 6 and z from 3 to 8 with their sum that must be equal to 10. So, every iteration I would need to get a random number for my three exchanges within their bounds making sure that their sum is always 10.
I have seen that in stats_array it is possible to set the bounds of the distributions and thus create values in a specified interval but this would not ensure that the sum of my random vector equals the fixed sum of 10.
Wondering if there is already a (relatively) straightforward way to implemented this in bw2
Otherwise the only way I see this feasible is to create all the uncertainity parameters with ParameterVectorLCA, tweak the value in the array for those products that must meet the aforementioned requirements (e.g with something like this or this) and then use this array with modified parameters to re-run my MC .

We are working on this in https://github.com/PascalLesage/brightway2-presamples, but it isn't ready yet. I don't know of any way to do this currently without hacking something together by subclassing the MonteCarloLCA.

Excel group cells in a table based on their combined total value

i would like a list of client names where together they have a combined amount of 1000. so, say if jim and tod's combined amount of money <= 1000 and jim, tod, jill >= 1000 then list jim and tod in a cell, then in the next cell if jill, joy, and pat <=1000 and jill, joy, pat, and tam >= 1000 then list jill, joy, and pat and so fourth until all of the clients are in a list.
Is this possible? I am learning and am not sure where to start so i would greatly appreciate if someone can help point me in the right direction to solve this problem?

Assuming your criterion for a group is that the money sums to less than or equal to 1000, then this is straightforward. Simply accumulate the Money amount down the list of names and start a new group and (reset the accumulator) whenever the cumulative amount exceeds 1000.
This gives you the group number for each name (see column D in picture below). A separate problem is then to list the names for each group number. In the picture, I have allowed for a maximum of 5 names per group but if real data indicates this is insufficient then allowing more is straightforward.
The set of groups obtained using this approach is dependent on the ordering of the rows of input data - change this ordering and the result is a different set of groups.
Perhaps a more interesting and challenging problem is to define a set of groups which meet not only the <=1000 criterion but also other criteria such as: minimise number of groups overall and equalise, as far as possible, the total money allocated to each group. But that is a very different problem!

How to obtain Incremental standard deviations from a set of standard deviations?

I have a data set containing three columns, first column represents number of trials, second column represents experimental values, and the third column represents corresponding standard deviation.
With each experiment there is an increment in my experimental values. To get the incremental values, I hold my first value as the reference value and subtract this reference value from each subsequent value and use them to create fourth column of these incremental values.
My problem begins right from here. How do I create a new set of incremental standard deviations for the incremental experimental values I got? My apology if the problem is not well defined but hopefully someone will eventually be able to help me out. Many thanks!
Below is my data set,
Trial Mean SD Incr Mean Incre SD
1 45.311 4.668 0
2 56.682 2.234 11.371
3 62.197 2.266 16.886
4 70.550 4.751 25.239
5 80.528 4.412 35.217
6 87.453 4.542 42.142
7 89.979 2.185 44.668
8 96.859 3.476 51.548

To be clear, for other readers, your incremental mean is actually the difference between trial 1 and the other trials.
Variances add directly when you subtract (or add) independent normal distributions. So you first want to convert that standard deviation to a variance by squaring it, and then you can add the variances, and then you can take the square root to turn it back into a standard deviation. Note when using this kind of Pythagorean combination, you are assuming that trial 1 is independent from the trials, so for example, you cannot do things like have some sample in both trials.
Logically this makes sense that your so called "incremental SD" will always be greater than the individual SDs, since the uncertainty of both distributions contributes towards the uncertainty of the difference.

probability logic statistics

I am not sure whether this is the right place to ask this question.
As this is more like a logic question.. but hey no harm in asking.
Suppose I have a huge list of data (customers)
and they all have a data_id
Now I want to select lets say split the data in ratio lets say 10:90 split.
Now rather than stating a condition that (example)
the sum of digits is even...go to bin 1
the sum of digits is odd.. go to bin 2
or sum of last three digits are x then go to bin 1
sum of last three digits is not x then go to bin 2
Now this might result in uneven data collection..sometimes it might be able to find the data.. more (which is fine) but sometimes it might not be able to find enough data
Is there a way (probabilistically speaking)
which says.. sample size is always greater than x%
Thanks

You want to partition your data by a feature that is uniformly distributed. Hash functions are designed to have this property ... so if you compute a hash of your customer ID, and then partition by the first n bits to get 2^n bins, each bin should have approximately the same number of items. (You can then select, say, 90% of your bins to get 90% of the data.) Hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Comparing password length distributions of two different policies - statistics

Related

Given column + rows of a terminal / tty, how to calculate min/max number of bytes that can fit

Create exchanges with bounded random parameters and fixed sum to be used in Montecarlo

Excel group cells in a table based on their combined total value

How to obtain Incremental standard deviations from a set of standard deviations?

probability logic statistics

Categories

Resources