Minimum cost to group same characters in a string - string

I got stuck in a problem. The overall problem statement is big. I have solved the other pieces of it.
Got stuck in one piece.
Given a string containing some dashes('-') and some character lets say ('A'). Also, we are given with cost C to shift a character to its adjacent place. We need to find minimum cost such that all 'A' characters are grouped.
Example1: A-A--A---A and cost = 10
Minimum cost to group all 'A's would be: 80
Example2: AAAA------A and cost = 10
Minimum cost to group all 'A's would be: 60

Hint: for the cost to be minimum possible, one of the median As (2nd or 3rd of 4 in your first example, 3rd of 5 in your second example) can be left in place. Using this, you can compute the cost in O(n), where n is either the length of the string or the number of As, whichever is your input format.

I don't think this problem needs dynamic-programming.
You only need to move all A's towards the median A because this is the least total distance between all A's.
Just make sure not to move the media A. If the A at the median is moved to the right, each of the A's to its left will have to move one more step and each of the A's to its right will have to move one step less. This should cancel out, but you already added one unneeded step.

Related

Calculating number of hours spent per product, with diminishing effort

I want to calculate the number of work hours it takes to produce X. The first X takes 20 hours, but for each X it takes 20% less time. However, it will always take a minimum of 2 hours.
Any help is appreciated.
In Excel, this is really easy: 20% less means you are calculating 80% of the value, which in fact means that you are multiplying the value with 0.8.
As the value can't go below 2, you can simply take the maximum between the calculated value and 2, using the formula:
=MAX(2,0.8*A1)
The result looks as follows:
Have fun!
In order to calculate the sum, you can use the simple formula =SUM(A$1:A2) up to the end, as you can see in following screenshot:
An individual term of a geometric series is given by
The sum of a geometric series is given by
where in your case a=20 and r=0.8
You can show by taking logs or by trial and error that in your particular case you have
0.8^10 = 0.107374
so when n=11 you can see that the time has diminished to just over 2 hours. After that each rep takes 2 hours. So you have
=a*(1-r_^MIN(C2,11))/(1-r_)+MAX(0,C2-11)*2
for the total.
If you just want the time per item, it's
=IF(C2<=11,a*r_^(C2-1),2)
where a and r_ are named ranges for a and r, and the values of N are in column C.

How to sum total hours in a row while skipping certain values?

I study wildlife and currently, I am doing an analysis regarding how long my focal species goes off of the mountain (its main habitat) and into human settlements.
Here is a picture with the data: data
Anyways, as you can see there are three coloured columns. Yellow is data, green is time, and blue is whether the animal is on or off the mountain (with red being when the animal is off).
As you can see, this one particular animal went off on several occasions. In this case, he went off the mountain three times but stayed off at various lengths. As I have thousands of data points, I essentially would like to determine how long each "off the mountain" event lasted. That is, since I consider every time the animal went off the mountain to be a separate event, I would like to determine how long the animal was off the mountain for each excursion, separately. In this case, the animal went off three times and I would like to total those three events individually.
So, as stated, an event would be every single occasion that the animal left the mountain, stayed there (for however long), and eventually made its way back up.
Any help would be greatly appreciated.
The simplest way would be just to count how many consecutive "off" periods there are in a particular run following an "on" period then multiply by 3 hours 20 minutes which you could do like this (starting in (say) K2)
=IF(AND(G1="On",G2="Off"), MATCH("On",G3:G$100,0)*TIME(3,20,0)*24,0)
You could take it further by looking at the individual times of the fixes as well to get an upper and lower limit (e.g. for the first excursion it could be between 3 hours 20 minutes and 10 hours 40 minutes roughly).
Upper limit
=IF(AND(G1="On",G2="Off"), (INDEX(J3:J$100,MATCH("On",G3:G$100,0))-J1)*24,0)
Lower limit
=IFERROR(IF(AND(G1="On",G2="Off"), (INDEX(J3:J$100,MATCH("On",G3:G$100,0)-1)-J2)*24,0),0)
where my column J contains a datetime value formed by adding the date and time in columns A and B together.
This raises an issue about what happens when the animal is still off-mountain at the end of its data (currently gives #N/A because MATCH is unable to find a cell containing "On"). Would need to decide how to treat this case if it ever occurs in practice.
Note when there is only one off-mountain measurement the lower limit is zero because in theory the animal could have left immediately before the measurement and returned immediately afterwards.
EDIT
To address the above issue where the animal is still off-mountain at the end of its data (and looking at the sample data it looks as if a different animal's data is immediately following the first animal's data) you would need this
=IF(AND(G1="On",G2="Off"), IFERROR(MATCH(1,(G3:G$100="On")*(E3:E$100=E2),0),MATCH(TRUE,E3:E$100<>E2,0))*TIME(3,20,0)*24,0)
which would have to be entered as an array formula using CtrlShiftEnter
You could argue that you might need to do some averaging for an incomplete off-mountain excursion like this which would make it even more complicated, but this is an Excel answer and can't go too far into the rights or wrongs of the analysis.
I guess a good starting-point would be knowing how you gather these statistics in the first place.

How to obtain Incremental standard deviations from a set of standard deviations?

I have a data set containing three columns, first column represents number of trials, second column represents experimental values, and the third column represents corresponding standard deviation.
With each experiment there is an increment in my experimental values. To get the incremental values, I hold my first value as the reference value and subtract this reference value from each subsequent value and use them to create fourth column of these incremental values.
My problem begins right from here. How do I create a new set of incremental standard deviations for the incremental experimental values I got? My apology if the problem is not well defined but hopefully someone will eventually be able to help me out. Many thanks!
Below is my data set,
Trial Mean SD Incr Mean Incre SD
1 45.311 4.668 0
2 56.682 2.234 11.371
3 62.197 2.266 16.886
4 70.550 4.751 25.239
5 80.528 4.412 35.217
6 87.453 4.542 42.142
7 89.979 2.185 44.668
8 96.859 3.476 51.548
To be clear, for other readers, your incremental mean is actually the difference between trial 1 and the other trials.
Variances add directly when you subtract (or add) independent normal distributions. So you first want to convert that standard deviation to a variance by squaring it, and then you can add the variances, and then you can take the square root to turn it back into a standard deviation. Note when using this kind of Pythagorean combination, you are assuming that trial 1 is independent from the trials, so for example, you cannot do things like have some sample in both trials.
Logically this makes sense that your so called "incremental SD" will always be greater than the individual SDs, since the uncertainty of both distributions contributes towards the uncertainty of the difference.

How can I implement 'balanced' error spreading functionality in Excel?

I have a requirement in Excel to spread small; i.e. pennies, monetry rounding errors fairly across the members of my club.
The error arises when I deduct money from members; e.g. £30 divided between 21 members is £1.428571... requiring £1.43 to be deducted from each member, totalling £30.03, in order to hit the £30 target.
The approach that I want to take, continuing the above example, is to deduct £1.42 from each member, totalling £29.82, and then deduct the remaining £0.18 using an error spreading technique to randomly take an extra penny from 18 of the 21 members.
This immediately made me think of Reservoir Sampling, and I used the information here: Random selection,
to construct the test Excel spreadsheet here: https://www.dropbox.com/s/snbkldt6e8qkcco/ErrorSpreading.xls, on Dropbox, for you guys to play with...
The problem I have is that each row of this spreadsheet calculates the error distribution indepentently of every other row, and this causes some members to contribute more than their fair share of extra pennies.
What I am looking for is a modification to the Resevoir Sampling technique, or another balanced / 2 dimensional error spreading methodology that I'm not aware of, that will minimise the overall error between members across many 'error spreading' rows.
I think this is one of those challenging problems that has a huge number of other uses, so I'm hoping you geniuses have some good ideas!
Thanks for any insight you can share :)
Will
I found a solution. Not very elegant, through.
You have to use two matrix. In the first you get completely random number, chosen with =RANDOM() and in the second you choose the n greater value
Say that in F30 you have the first
=RANDOM()
cell.
(I have experimented with your sheet.)
Just copy a column of n (in your sheet 8) in column A)
In cell F52 you put:
=IF(RANK(F30,$F30:$Z30)<=$A52, 1, 0)
Until now, if you drag left and down the formulas, you have the same situation that is in your sheet (only less elegant und efficient).
But starting from the second row of random number you could compensate for the penny esbursed.
In cell F31 you put:
=RANDOM()-SUM(F$52:F52)*0.5
(pay attention to the $, each random number should have a correction basated on penny already spent.)
If the $ are ok you should be OK dragging formulas left and down. You could also parametrize the 0.5 and experiment with other values. With 0,5 I have a error factor (the equivalent of your cell AB24) between 1 and 2

probability logic statistics

I am not sure whether this is the right place to ask this question.
As this is more like a logic question.. but hey no harm in asking.
Suppose I have a huge list of data (customers)
and they all have a data_id
Now I want to select lets say split the data in ratio lets say 10:90 split.
Now rather than stating a condition that (example)
the sum of digits is even...go to bin 1
the sum of digits is odd.. go to bin 2
or sum of last three digits are x then go to bin 1
sum of last three digits is not x then go to bin 2
Now this might result in uneven data collection..sometimes it might be able to find the data.. more (which is fine) but sometimes it might not be able to find enough data
Is there a way (probabilistically speaking)
which says.. sample size is always greater than x%
Thanks
You want to partition your data by a feature that is uniformly distributed. Hash functions are designed to have this property ... so if you compute a hash of your customer ID, and then partition by the first n bits to get 2^n bins, each bin should have approximately the same number of items. (You can then select, say, 90% of your bins to get 90% of the data.) Hope this helps.

Resources