Loop to select rows until I reach a target number - python-3.x

I have two dataframes, which I join to see the active people. There are people who stop being active and I use one of the dataframes to fill the other one.
mass pnr freq
1 [40666303, 68229102, 35784905, 47603805] 4
54 [17182402] 1
234 [07694901, 35070201, 36765601] 3
The other table looks the same I just need to select enough people to reach my target of 7500-7600 people (40666303 - this is one person and 'freq' is the number of people in the list. It doesn't matter what is the 'mass', I just need when the sum of 'freq' is between 7500 and 7600 the process to stop. Now I need 400 people, but next month it is possible to need 20 people, it differs every month. Basically, my code now removes the non-active people and when it removes them i need to replace them with active. The first run of the process I used this code to select the initial 7500 people:
target = 7500
freq_sum = sum(mass_grouped3['freq'])
new_mass_not_in_whitelist1['records_to_select'] = [math.ceil(int((el * target ) / freq_sum )) for el in new_mass_not_in_whitelist1['freq']]
But now, with this code I am not getting the desired sum of people to fill the missing gap of 400 people. Also, it would be good to not select only the first rows, but maybe every other or some random condition. What can I change to work the way I explained?

Related

[Excel][Math] Formula to generate numbers between two limits

I am trying to fill a column in Excel with values that are generated randomly but the sum of all the values total to the end.
Example:
Starting value = 320
Final value = 350
Need to generate 2 more values which have random difference between them but total to 350 at the end, as in: 2nd val = 12, 3rd val = 18.
The script/formula should generate different values when run next (for another table etc.). It may generate, for the same starting and final values, 15 and 15 or 8 and 22 for the 2nd and 3rd values respectively etc.
Basically what the formula should do is: Find the difference between the starting and final values then randomly add a number to create the 2nd entry. Now the third entry should follow the same pattern but the value generated should end up totaling to the final.
The example is only for 2 values but I'm working on tables ranging from 15-30+ values.
I don't know if Excel can do it, or if there's a mathematical formula that will work here.
Thanks in advance for all the help!
You need something like this:
That gives:
The trick is to generate a list of uniform random numbers between 0 and 1 and then scale those numbers up the total that you're looking for.

Counting if part of string is within interval

I am currently trying to check if a number in a comma-separated string is within a number interval. What I am trying to do is to check if an area code (from the comma-separated string) is within the interval of an area.
The data:
AREAS
Area interval
Name
Number of locations
1000-1499
Area 1
?
1500-1799
Area 2
?
1800-1999
Area 3
?
GEOLOCATIONS
Name
Areas List
Location A
1200, 1400
Location B
1020, 1720
Location C
1700, 1920
Location D
1940, 1950, 1730
The result I want here is the number of unique locations in the "Areas list" within the area interval. So Location D should only count ONCE in the 1800-1999 "area", and the Location A the same in the 1000-1499 location. But location B should count as one in both 1000-1499 and one in 1500-1799 (because a number from each interval is in the comma-separated string in "Areas list"):
Area interval
Name
Number of locations
1000-1499
Area 1
2
1500-1799
Area 2
3
1800-1999
Area 3
2
How is this possible?
I have tried with a COUNTIFS, but it doesnt seem to do the job.
Here is one option using FILTERXML():
Formula in C2:
=SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]"))
Where:
"<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>" - Is the part where we construct a valid piece of XML. The theory here is that we use three axes here. Each t-node will be named a literal 1 to make sure that once we return them with xpath we can sum the result. The outer x-nodes are there to make sure Excel will handle the inner axes correctly. If you are curious to know how this xml-syntax looks at the end, it's best to step through using the 'Evaluate Formula' function on the Data-tab;
//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]")) - Basically means that we collect all t-nodes where the count of child s-nodes that are >= to the leftmost number and <= to the rightmost number is larger than zero. For A2 the xpath would look like //t[count(.//*[.>=1000][.<=1499])>0]")) after substitution. In short: //t - Select t-nodes, where count(.//* select all child-nodes where count of nodes that fullfill both requirements [.>=1000][.<=1499] is larger than zero;
Since all t-nodes equal the number 1, the SUM() of these t-nodes equals the amount of unique locations that have at least one area in its Areas List;
Important to note that FILTERXML() will result into an error if no t-nodes could be found. That would mean we need to wrap the FILTERXML() in an IFERROR(...., 0) to counter that and make the SUM() still work correctly.
Or, wrap the above in BYROW():
Formula in C2:
=BYROW(A2:A4,LAMBDA(a,SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(a,"-","][.<=")&"])>0]"))))
Using MMULT and TEXTSPLIT:
=LET(rng,TEXTSPLIT(D2,"-"),
tarr,IFERROR(--TRIM(TEXTSPLIT(TEXTJOIN(";",,$B$2:$B$5),",",";")),0),
SUM(--(MMULT((tarr>=--TAKE(rng,,1))*(tarr<=--TAKE(rng,,-1)),SEQUENCE(COLUMNS(tarr),,1,0))>0)))
I am in very distinguished company but will add my version anyway as byrow probably is a slightly different approach
=LET(range,B$2:B$5,
lowerLimit,--#TEXTSPLIT(E2,"-"),
upperLimit,--INDEX(TEXTSPLIT(E2,"-"),2),
counts,BYROW(range,LAMBDA(r,SUM((--TEXTSPLIT(r,",")>=lowerLimit)*(--TEXTSPLIT(r,",")<=upperLimit)))),
SUM(--(counts>0))
)
Here the ugly way to do it, with A LOT of helper columns. But not so complicated 🙂
F4= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(B4;",";"</r><r>")&"</r></m>";"//r"))
F11= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(A11;"-";"</r><r>")&"</r></m>";"//r"))
F16= =SUM(F18:F21)
F18= =IF(SUM(($F4:$O4>=$F$11)*($F4:$O4<=$G$11))>0;1;"")
G18= =IF(SUM(($F4:$O4>=$F$12)*($F4:$O4<=$G$12))>0;1;"")
H18= =IF(SUM(($F4:$O4>=$F$13)*($F4:$O4<=$G$13))>0;1;"")

Loop through a combination of numbers

I am trying to think of a way to loop through a number of combinations making sure that I go through each available combination without repeat. Let me explain. I have a set of numbers, for example
20,000
25,000
27,000
29,000
and I would like to alter this set of numbers via a loop and copy the new numbers into a different sheet so that my formulas on that sheet can calculate whatever I need them to calculate. For example, the first couple of iterations might look something like this:
1st
20,000 x 1.001
25,000 x 1
27,000 x 1
29,000 x 1
2nd
20,002 x 1.001
25,000 x 1.001
27,000 x 1
29,000 x 1
The first row of numbers should never exceed the second. So 20,000 should only go as high as 25,000.
I was able to set up a system whereby I set up a matrix and then loop through a random set of combinations using =rand() however this does not ensure I hit every combination and also repeats combinations.
Can anyone explain the math behind this and also how I would use a loop to accomplish my goal?
Thank you!
Try starting with smaller numbers.
See if this works for you.
Sub looper()
'First Array
Dim myArray(9) As Double
For i = 1 To 10
myArray(i - 1) = i
Next i
'Second Array
Dim myOtherArray(9) As Double
For i = 1 To 10
myOtherArray(i - 1) = i
Next i
'Loop through each one
For Each slot In myArray
For Each otherSlot In myOtherArray
Debug.Print (slot & " * " & otherSlot & " = " & slot * otherSlot)
Next otherSlot
Next slot
End Sub
GD user1813558,
Your question contains too little detail and is too broadly scoped to be able to provide a accurate answer.
Are your numbers arbitrary (i.e. the ones you provided are 'just'
samples) or will they be fixed as per your indicated numbers ?
Will there always be only 4 numbers ?
Is the distribution of your startnumbers (i.e. their difference
value) always as per your indication 0, +5000, +2000, +2000
Will the results of all 'loops' (or iterations) need to be copied to
a different sheet ? (i.e looping from 20.000 to 25.000 by increments
of 1.001 would require about 223 iterations, and subsequently sheets,
before the result starts exceeding 25.000 ?)
Does a new sheet need to be created for each iteration result or are they
existent or will the result be copied to the same sheet for every iteration ?
In short, please provide a more accurate question.

Formula / Macro needed for converting rows to columns based on condition

So here's my dilemma, I'm converting a csv file from one ecommerce store to another, and I need to re-arrange the data. Transpose would work, however I need information to break to a new line after a specified condition changes.
i.e.
product id imagepath rank
99999999 /image1.jpg 1
99999999 /image2.jpg 2
99999999 /image3.jpg 3
88888888 /image4.jpg 1
and I need it to be output such as:
product id imagepath imagepath2 imagepath399999999 /image1.jpg /image2.jpg /image3.jpg88888888 /image4.jpg
So essentially every time the product id changes, I need the information to break to a new line. the spreadsheet is about 1200 rows or so like this. Some prod ids have 7 images, some have one. I'm sure a FOR LOOP is needed, or something to that degree, but I'm not that well verse in macros to generate it myself, and I can't seem to find a specific example on here... HELP!
What have you tried yet? Asking to do it for you? Could you at least provide a code or any progress you made?
<Maybe create a new table for your output>
Set ProductValues = ActiveDocument.Fields("product id").GetPossibleValues
For i = 0 To ProductValues.Count - 1
ActiveDocument.Fields("product id").Select ProductValues.Item(i).Text
Set imagepath= ActiveDocument.Fields("imagepath").GetPossibleValues
For j = 0 To imagepath.Count - 1
ActiveDocument.Fields("imagepath").Select imagepath.Item(i).Text
Set rank= ActiveDocument.Fields("rank").GetPossibleValues
For k = 0 To rank.Count - 1
ActiveDocument.Fields("rank").Select rank.Item(i).Text
<if the current product_id exists on the export table>
<copy current row information in new table>
<if has already a picture, increase the table columns by 1 and add new picture url>
<or if rank give the amount of picture, you can loop through all rank position at beginning in order to find the maximum rank and intialize your table with [][your rank]>
<For export table you do mytable[0][0]=product id>
<check if product id is different from mytable[i-position] and different from product id>
<to do the picture part mytable[i-position][rank]=picture url>
Next
Next
Next
This should give you the possibility to go through all your single elements one by one, your task now to finish the rest and to create the new table.

Excel 2003: Better workaround for 10 if statements

I have a very rough workaround for 10 statements using a combination of 2 cells, as follows
Cell 1 (O2)
=IF(C2="TW2-OUT",VLOOKUP($D2,Players,8,FALSE)+VLOOKUP($D2,Players,9,FALSE),IF(C2="TW2-IN",IF($D2="","",VLOOKUP($D2,Players,10,FALSE)),IF(C2="Playing",IF($D2="","",VLOOKUP($D2,Players,8,FALSE)+VLOOKUP($D2,Players,9,FALSE)+VLOOKUP($D2,Players,10,FALSE)),IF(C2="IN1OUT2",VLOOKUP($D2,Players,9,FALSE)+VLOOKUP($D2,Players,10,FALSE),IF(C2="TW1-OUT",IF($D2="","",VLOOKUP($D2,Players,8,FALSE)),IF(C2="TW1-IN",IF($D2="","",VLOOKUP($D2,Players,9,FALSE)+VLOOKUP($D2,Players,10,FALSE)),IF(C2="TW3-OUT",VLOOKUP($D2,Players,8,FALSE)+VLOOKUP($D2,Players,9,FALSE)+VLOOKUP($D2,Players,10,FALSE),0)))))))+P2
Cell 2 (P2)
=IF(C2="TW3-IN",IF($D2="","",VLOOKUP($D2,Players,11,FALSE)),IF(C2="IN2OUT3",VLOOKUP($D2,Players,10,FALSE),IF(C2="IN1OUT3",VLOOKUP($D2,Players,9,FALSE)+VLOOKUP($D2,Players,10,FALSE),0)))
Is there a better way of doing this. I have read via a Google search about using a table approach with an array to achieve the same effect. However, in my case the status of a player determines the score of a player and this complicates things for me. Here are the 10 possible statuses (if statements) broken down as follows with how scored are calculated:
TransferStatuses Cols
Playing 8+9+10+11
TW1-IN 9+10
TW1-OUT 8
TW2-IN 10+11
TW2-OUT 8+9
TW3-IN 11
TW3-OUT 8+9+10
IN1OUT2 9
IN1OUT3 9+10
IN2OUT3 10
8 = ColK (Transfer Window 0)
9 = ColL(Transfer Window 1)
10 = ColM (Transfer Window 2)
11 = ColN(Transfer Window 3)
The 'score' array will be along the lines as follows:
=VLOOKUP(C2,$S$2:$T$11,2,FALSE)
The problem is that I don't know how to put it all together to make it work, i.e. I have to extend my formula to 300 cells but I don't know how to implement it so that the array calculates the scores correctly for each player?
Can someone help?
If I understand you correctly I would approach it like this:
Set up a matrix of binary values that specify, for each status, which columns should be added up. Use OFFSET and MATCH to look up the status for each data row and return the array/range of binary values, and SUMPRODUCT to sum it all up. See screenshot:

Resources