I have the following table (for example, the real table is 100 rows):
It has a group name column, n students in group, and score.
I would like to build a forth column of cluster, which will divide the groups into 10 decimils of approximatly equal size, while preserving the score. So if i have a total number of 80 students in all groups together, than I'll have 10 clusters which have about 8 students each one more or less. the top cluster will consist the groups with the highest grade.
I hope it makes any sense.
My problem is more an algorithmic one, I prefer to have a solution in excel/vba other than R just because I need a more dinamic solution.
I tried to do it manually by sorting the groups by score, and then summing the n of students until i get a number close to the decimil of the total number of students, but maybe there is an algorithem more precise and less frustrating than that.
Thanks
Related
I spent hours trying to look for a solution and I feel like I got close but figured asking would be the best way.
Lets say I have a table with 2 columns, column A is an item, and column B is a price for the item. This table has 12 entries. What I would like to do is generate additional tables of 6 entries that do not exceed a certain price. see below for example. The number i want these table to not exceed is 50,000.
for example the first entry could be an apple at 9,000 value. the apple is column a, and value column b.
Can someone help with a way to generate all combinations of 6 items from column a, that do not exceed a combined price of 50,000 in column b?
With 12 items you have 212-1 or 4095 possible combinations of products. These can map into the 12 bits of a 12-bit binary number. It is not difficult to write a macro to calculate the total cost of each combination and then filter the result to display results less than or equal to 50,000.
EDIT#1:
Please see:
Best possible combination sum of predefined numbers that smaller or equal NN
Listing all possible combination without repetition,VBA
i would like a list of client names where together they have a combined amount of 1000. so, say if jim and tod's combined amount of money <= 1000 and jim, tod, jill >= 1000 then list jim and tod in a cell, then in the next cell if jill, joy, and pat <=1000 and jill, joy, pat, and tam >= 1000 then list jill, joy, and pat and so fourth until all of the clients are in a list.
Is this possible? I am learning and am not sure where to start so i would greatly appreciate if someone can help point me in the right direction to solve this problem?
Assuming your criterion for a group is that the money sums to less than or equal to 1000, then this is straightforward. Simply accumulate the Money amount down the list of names and start a new group and (reset the accumulator) whenever the cumulative amount exceeds 1000.
This gives you the group number for each name (see column D in picture below). A separate problem is then to list the names for each group number. In the picture, I have allowed for a maximum of 5 names per group but if real data indicates this is insufficient then allowing more is straightforward.
The set of groups obtained using this approach is dependent on the ordering of the rows of input data - change this ordering and the result is a different set of groups.
Perhaps a more interesting and challenging problem is to define a set of groups which meet not only the <=1000 criterion but also other criteria such as: minimise number of groups overall and equalise, as far as possible, the total money allocated to each group. But that is a very different problem!
write a formula in cell E2 to calculate the average of two highest cumulative numbers among three students after converting the obtained numbers in hundred
To get the maximum you can use max and to get the second largest number you can use large. Add the two numbers and divide them by two:
=(MAX(B3:D5)+LARGE(B3:D5,2))/2
I cannot follow your question fully. Do you want the top 2 grades, even if they are from the same student? Then #Ralph answer does it. It does not however take into account the 30% values you have in row 1.
If you want the top 2 student grades (e.g. each student can only take 1 place). Do this:
Solution
I'd like to use Excel to generate a randomized lab partner list, without using VB (due to security settings on the PCs).
Parameters are as follows:
Number of students: 10-30, one worksheet per total number desired
Number of partners: Three for first two labs, and two for the other four-five.
Number of lab stations: 10
Repeats: Ideally none, but it is permissible for a student to have a repeat partner from one of the first two labs.
Excel version: 2007
To clarify, each student will have two labs where they share a lab station with up to two other students, giving a maximum lab size of 30 students. After that, they will be strictly limited to two students per station, giving a maximum of 20 students. Each student will have four of these limited labs, with there being a total of five such labs presented, to allow for either odd-numbered classes, or a class size between 21-30.
Each student is simply numbered from 1-30, so a cell could, for instance, state "5, 24" as the two students for that lab station.
True RNG is not important, and in fact, only needs to be performed once to make these matrices.
I think this is a bit tricky without using VBA, but here is one approach that is OK for small groups. I have tried it using a group of just nine so that the screen shot should be readable.
The method is basic Fisher-Yates
A Start with a group of students size n represented by a list of numbers 1 to n.
B Generate a random number r in range 1 to n
C Pick the rth element from the list
D Remove the rth element from the list
E Reduce n by 1
F Repeat from B until n=1.
In Excel:-
Fill A2:A10 and D2:L2 with numbers 1-9
Put the following in B2 and pull down:-
=RANDBETWEEN(1,10-A2)
Put this in C2 and pull down:-
=OFFSET(D2,0,B2-1)
Put this in D3 and pull down and across:-
=IF(D2>=$C2,E2,D2)
The ID's will be in column C so the first three would be in group 1, the next three in group 2 etc.
By the way, your question is a special case of generating non-repeating random numbers - see
Generating unique random numbers without VBA
The array formula described here does it in one step - modified slightly for this problem it would look like
=SMALL(IF(COUNTIF(C$1:C1,ROW(INDIRECT("1:9")))=0,ROW(INDIRECT("1:9"))),RANDBETWEEN(1,(9-ROWS(C$2:C2)+1)))
I am not sure whether this is the right place to ask this question.
As this is more like a logic question.. but hey no harm in asking.
Suppose I have a huge list of data (customers)
and they all have a data_id
Now I want to select lets say split the data in ratio lets say 10:90 split.
Now rather than stating a condition that (example)
the sum of digits is even...go to bin 1
the sum of digits is odd.. go to bin 2
or sum of last three digits are x then go to bin 1
sum of last three digits is not x then go to bin 2
Now this might result in uneven data collection..sometimes it might be able to find the data.. more (which is fine) but sometimes it might not be able to find enough data
Is there a way (probabilistically speaking)
which says.. sample size is always greater than x%
Thanks
You want to partition your data by a feature that is uniformly distributed. Hash functions are designed to have this property ... so if you compute a hash of your customer ID, and then partition by the first n bits to get 2^n bins, each bin should have approximately the same number of items. (You can then select, say, 90% of your bins to get 90% of the data.) Hope this helps.