Excel split given number into sum of other numbers - excel

I'm trying to write formulae that will split a given number into the sum of 4 other numbers.
The other numbers are 100,150,170 and 200 so the formula would be
x = a*100+b*150+c*170+d*200 where x is the given number and a,b,c,d are integers.
My spreadsheet is set up as where col B are x values, and C,D,E,F are a,b,c,d respectively (see below).
B | C | D | E | F |
100 1 0 0 0
150 0 1 0 0
200 0 0 0 1
250 1 1 0 0
370 0 0 1 1
400 0 0 0 2
I need formulae for columns C,D,E,F (which are a,b,c,d in the formula)
Your help is greatly appreciated.

UPDATE:
Based on the research below, for input numbers greater than 730 and/or for all actually divisible input numbers use the following formulas:
100s: =CHOOSE(MOD(ROUNDUP([#number]/10;0); 20)+1;
0;1;1;0;1;1;0;1;0;0;1;0;0;1;0;0;1;0;1;1)
150s: =CHOOSE(MOD(ROUNDUP([#number]/10;0); 10)+1;
0;0;1;1;0;1;1;0;0;1)
170s: =CHOOSE(MOD(ROUNDUP([#number]/10;0); 5)+1;
0;3;1;4;2)
200s: =CEILING(([#number]-930)/200;1) +
CHOOSE(MOD(ROUNDUP([#number]/10;0); 20)+1;
4;1;2;0;2;3;1;3;1;2;4;2;3;0;2;3;0;3;0;1)
MOD(x; 20) will return numbers 0 - 19, CHOOSE(x;a;b;...) will return n-th argument based on the first argument (1=>second argument, ...)
see more info about CHOOSE
use , instead of ; based on your Windows language&region settings
let's start with the assumption that you want to preferably use 200s over 170s over 150s over 100s - i.e. 300=200+100 instead of 300=2*150 and follow the logical conclusions:
the result set can only contain at most 1 100, at most 1 150, at most 4 170s and unlimited number of 200s (i started with 9 170s because 1700=8x200+100, but in reality there were at most 4)
there are only 20 possible subsets of (100s, 150s, 170s) - 2*2*5 options
930 is the largest input number without any 200s in the result set
based on observation of the data points, the subset repeats periodically for
number = 740*k + 10*l, k>1, l>0 - i'm not an expert on reverse-guessing on periodic functions from data, but here is my work in progress (charted data points are from the table at the bottom of this answer)
the functions are probably more complicated, if i manage to get them right, i'll update the answer
anyway for numbers smaller than 740, more tweaking of the formulas or a lookup table are needed (e.g. there is no way to get 730, so the result should be the same as for 740)
Here is my solution based on lookup formulas:
Following is the python script i used to generate the data points, formulas from the picture and the 60-row table itself in csv format (sorted as needed by the match function):
headers = ("100s", "150s", "170s", "200s")
table = {}
for c200 in range(30, -1, -1):
for c170 in range(9, -1, -1):
for c150 in range(1, -1, -1):
for c100 in range(1, -1, -1):
nr = 200*c200 + 170*c170 + 150*c150 + 100*c100
if nr not in table and nr <= 6000:
table[nr] = (c100, c150, c170, c200)
print("number\t" + "\t".join(headers))
for r in sorted(table):
c100, c150, c170, c200 = table[r]
print("{:6}\t{:2}\t{:2}\t{:2}\t{:2}".format(r, c100, c150, c170, c200))
__________
=IF(E$1<740; 0; INT((E$1-740)/200))
=E$1 - E$2*200
=MATCH(E$3; table[number]; -1)
=INDEX(table[number]; E$4)
=INDEX(table[100s]; E$4)
=INDEX(table[150s]; E$4)
=INDEX(table[170s]; E$4)
=INDEX(table[200s]; E$4) + E$2
__________
number,100s,150s,170s,200s
940,0,0,2,3
930,1,1,4,0
920,0,1,1,3
910,0,0,3,2
900,1,0,0,4
890,0,1,2,2
880,0,0,4,1
870,1,0,1,3
860,0,1,3,1
850,1,1,0,3
840,1,0,2,2
830,0,1,4,0
820,1,1,1,2
810,1,0,3,1
800,0,0,0,4
790,1,1,2,1
780,1,0,4,0
770,0,0,1,3
760,1,1,3,0
750,0,1,0,3
740,0,0,2,2
720,0,1,1,2
710,0,0,3,1
700,1,0,0,3
690,0,1,2,1
680,0,0,4,0
670,1,0,1,2
660,0,1,3,0
650,1,1,0,2
640,1,0,2,1
620,1,1,1,1
610,1,0,3,0
600,0,0,0,3
590,1,1,2,0
570,0,0,1,2
550,0,1,0,2
540,0,0,2,1
520,0,1,1,1
510,0,0,3,0
500,1,0,0,2
490,0,1,2,0
470,1,0,1,1
450,1,1,0,1
440,1,0,2,0
420,1,1,1,0
400,0,0,0,2
370,0,0,1,1
350,0,1,0,1
340,0,0,2,0
320,0,1,1,0
300,1,0,0,1
270,1,0,1,0
250,1,1,0,0
200,0,0,0,1
170,0,0,1,0
150,0,1,0,0
100,1,0,0,0
0,0,0,0,0

Assuming that you want as many of the highest values as possible (so 500 would be 2*200 + 100) try this approach assuming the number to split in B2 down:
Insert a header row with the 4 numbers, e.g. 100, 150, 170 and 200 in the range C1:F1
Now in F2 use this formula:
=INT(B2/F$1)
and in C2 copied across to E2
=INT(($B2-SUMPRODUCT(D$1:$G$1,D2:$G2))/C$1)
Now you can copy the formulas in C2:F2 down all columns
That should give the results from your table

Related

How to find if a number in a row matches a number in a column, and then automatically offset the number in the column to another value

How to find if a number in a row matches a number in a column, and then automatically offset the number in the column to another value. For example:
Row 0 1 2 3 4 5
Answer x x x x x x
Column
0 100
1 340
2 500
3 266
4 455
5 800
So if "0" in the Row array matches "0" in the Column array, then show 340 and so on. I can to this with nested IF statements but is there an easier way if you have 100s of columns. Thanks

Selecting rows where a numeric column value change sign through openpyxl

I'm learning Python and openpyxl for data analysis on a large xlsx workbook. I have a for loop that can iterate down an entire column. Here's some example data:
ROW: VALUE:
1 1
2 2
3 3
4 4
5 -4
6 -1
7 -6
8 2
9 3
10 -3
I want to print out the row in which the value changes from positive to negative, and vice versa. So in the above example, row number 5, 8, and 10 would print in the console. How can I use an if statement within a for loop to iterate through a column on openpyxl?
So far I can print all of the cells in a column:
import openpyxl
wb = openpyxl.load_workbook('ngt_log.xlsx')
sheet = wb.get_sheet_by_name('sheet1')
for i in range(1, 10508, 1): # 10508 is the length of the column
print(i, sheet.cell(row=i, column=6).value)
My idea was to just add an if statement inside of the for loop:
for i in range(1, 10508, 1): # 10508 is the length of the column
if(( i > 0 and (i+1) < 0) or (i < 0 and (i+1) > 0)):
print((i+1), sheet.cell(row=i, column=6).value)
But that doesn't work. Am I formulating the if statement correctly?
It looks to me as though your statement is contradicting itself
for i in range(1, 10508, 1): # 10508 is the length of the column
if(( i greater than 0 and (i+1) less than 0) or (i less than 0 and (i+1) greater than
0)):
print((i+1), sheet.cell(row=i, column=6).value)
I wrote the > and < symbols in plain English but if i is greater than 0 then i + 1 is never less than 0 and vise versa so they will never work as both cannot be true
You need to get the sheet.cell values first, and then do the comparisons:
end_range = 10508
for i in range(1, end_range):
current, next = sheet.cell(row=i, column=6).value, sheet.cell(row=i+1, column=6).value
if current > 0 and next < 0 or current < 0 and next > 0:
print(i+1, next)
I am pretty sure there's a sign() function in the math library, but kinda overkill. You may also want to figure out what you want to do if the values are 0.
You can use a flag to check for positive and negative.
ws = wb['sheet1'] # why people persist in using long deprecated syntax is beyond me
flag = None
for row in ws.iter_rows(max_row=10508, min_col=6, max_col=6):
cell = row[0]
sign = cell.value > 0 and "positive" or "negative"
if flag is not None and sign != flag:
print(cell.row)
flag = sign
You can write the rules to select the rows where the sign has changed and put them in a generator expression without using extra memory, like this:
pos = lambda x: x>=0
keep = lambda s, c, i, v: pos(s[c][x].value)!=pos(v.value)
gen = (x+1 for x, y in enumerate(sheet['f']) if x>0 and keep(sheet, 'f', x-1, y))
Then, when you need to know the rows where the sign has changed, you just iterate on gen as below:
for row in gen:
# here you use row

selecting different columns each row

I have a dataframe which has 500K rows and 7 columns for days and include start and end day.
I search a value(like equal 0) in range(startDay, endDay)
Such as, for id_1, startDay=1, and endDay=7, so, I should seek a value D1 to D7 columns.
For id_2, startDay=4, and endDay=7, so, I should seek a value D4 to D7 columns.
However, I couldn't seek different column range successfully.
Above-mentioned,
if startDay > endDay, I should see "-999"
else, I need to find first zero (consider the day range) and such as for id_3's, first zero in D2 column(day 2). And starDay of id_3 is 1. And I want to see, 2-1=1 (D2 - StartDay)
if I cannot find 0, I want to see "8"
Here is my data;
data = {
'D1':[0,1,1,0,1,1,0,0,0,1],
'D2':[2,0,0,1,2,2,1,2,0,4],
'D3':[0,0,1,0,1,1,1,0,1,0],
'D4':[3,3,3,1,3,2,3,0,3,3],
'D5':[0,0,3,3,4,0,4,2,3,1],
'D6':[2,1,1,0,3,2,1,2,2,1],
'D7':[2,3,0,0,3,1,3,2,1,3],
'startDay':[1,4,1,1,3,3,2,2,5,2],
'endDay':[7,7,6,7,7,7,2,1,7,6]
}
data_idx = ['id_1','id_2','id_3','id_4','id_5',
'id_6','id_7','id_8','id_9','id_10']
df = pd.DataFrame(data, index=data_idx)
What I want to see;
df_need = pd.DataFrame([0,1,1,0,8,2,8,-999,8,1], index=data_idx)
You can create boolean array to check in each row which 'Dx' column(s) are above 'startDay' and below 'endDay' and the value is equal to 0. For the first two conditions, you can use np.ufunc.outer with the ufunc being np.less_equal and np.greater_equal such as:
import numpy as np
arr_bool = ( np.less_equal.outer(df.startDay, range(1,8)) # which columns Dx is above startDay
& np.greater_equal.outer(df.endDay, range(1,8)) # which columns Dx is under endDay
& (df.filter(regex='D[0-9]').values == 0)) #which value of the columns Dx are 0
Then you can use np.argmax to find the first True per row. By adding 1 and removing 'startDay', you get the values you are looking for. Then you need to look for the other conditions with np.select to replace values by -999 if df.startDay >= df.endDay or 8 if no True in the row of arr_bool such as:
df_need = pd.DataFrame( (np.argmax(arr_bool , axis=1) + 1 - df.startDay).values,
index=data_idx, columns=['need'])
df_need.need= np.select( condlist = [df.startDay >= df.endDay, ~arr_bool.any(axis=1)],
choicelist = [ -999, 8],
default = df_need.need)
print (df_need)
need
id_1 0
id_2 1
id_3 1
id_4 0
id_5 8
id_6 2
id_7 -999
id_8 -999
id_9 8
id_10 1
One note: to get -999 for id_7, I used the condition df.startDay >= df.endDay in np.select and not df.startDay > df.endDay like in your question, but you can cahnge to strict comparison, you get 8 instead of -999 in this case.

how to get a kind of "maximum" in a matrix, efficiently

I have the following problem: I have a matrix opened with pandas module, where each cell has a number between -1 and 1. What I wanted to find is the maximum "posible" value in a row that is also not the maximum value in another row.
If for example 2 rows has their maximum value at the same column, I compare both values and take the bigger one, then for the row that has its maximum value smaller that the other row, I took the second maximum value (and do the same analysis again and again).
To explain myself better consider my code
import pandas as pd
matrix = pd.read_csv("matrix.csv")
# this matrix has an id (or name) for each column
# ... and the firt column has the id of each row
results = pd.DataFrame(np.empty((len(matrix),3),dtype=pd.Timestamp),columns=['id1','id2','max_pos'])
l = len(matrix.col[[0]]) # number of columns
while next = 1:
next = 0
for i in range(0, len(matrix)):
max_column = str(0)
for j in range(1, l): # 1 because the first column is an id
if matrix[max_column][i] < matrix[str(j)][i]:
max_column = str(j)
results['id1'][i] = str(i) # I coul put here also matrix['0'][i]
results['id2'][i] = max_column
results['max_pos'][i] = matrix[max_column][i]
for i in range(0, len(results)): #now I will check if two or more rows have the same max column
for ii in range(0, len(results)):
# if two id1 has their max in the same column, I keep it with the biggest
# ... max value and chage the other to "-1" to iterate again
if (results['id2'][i] == results['id2'][ii]) and (results['max_pos'][i] < results['max_pos'][ii]):
matrix[results['id2'][i]][i] = -1
next = 1
Putting an example:
#consider
pd.DataFrame({'a':[1, 2, 5, 0], 'b':[4, 5, 1, 0], 'c':[3, 3, 4, 2], 'd':[1, 0, 0, 1]})
a b c d
0 1 4 3 1
1 2 5 3 0
2 5 1 4 0
3 0 0 2 1
#at the first iterarion I will have the following result
0 b 4 # this means that the row 0 has its maximum at column 'b' and its value is 4
1 b 5
2 a 5
3 c 2
#the problem is that column b is the maximum of row 0 and 1, but I know that the maximum of row 1 is bigger than row 0, so I take the second maximum of row 0, then:
0 c 3
1 b 5
2 a 5
3 c 2
#now I solved the problem for row 0 and 1, but I have that the column c is the maximum of row 0 and 3, so I compare them and take the second maximum in row 3
0 c 3
1 b 5
2 a 5
3 d 1
#now I'm done. In the case that two rows have the same column as maximum and also the same number, nothing happens and I keep with that values.
#what if the matrix would be
pd.DataFrame({'a':[1, 2, 5, 0], 'b':[5, 5, 1, 0], 'c':[3, 3, 4, 2], 'd':[1, 0, 0, 1]})
a b c d
0 1 5 3 1
1 2 5 3 0
2 5 1 4 0
3 0 0 2 1
#then, at the first itetarion the result will be:
0 b 5
1 b 5
2 a 5
3 c 2
#then, given that the max value of row 0 and 1 is at the same column, I should compare the maximum values
# ... but in this case the values are the same (both are 5), this would be the end of iterating
# ... because I can't choose between row 0 and 1 and the other rows have their maximum at different columns...
This code works perfect to me if I have a matrix of 100x100 for example. But, if the matrix size goes to 50,000x50,000 the code takes to much time in finish it. I now that my code could be the most inneficient way to do it, but I don't know how to deal with this.
I have been reading about threads in python that could help but it doesn't help if I put 50,000 threads because my computer doesn't use more CPU. I also tried to use some functions as .max() but I'm not able to get column of the max an compare it with the other max ...
If anyone could help me of give me a piece of advice to make this more efficient I would be very grateful.
Going to need more information on this. What are you trying to accomplish here?
This will help you get some of the way, but in order to fully achieve what you're doing I need more context.
We'll import numpy, random, and Counter from collections:
import numpy as np
import random
from collections import Counter
We'll create a random 50k x 50k matrix of numbers between -10M and +10M
mat = np.random.randint(-10000000,10000000,(50000,50000))
Now to get the maximums for each row we can just do the following list comprehension:
maximums = [max(mat[x,:]) for x in range(len(mat))]
Now we want to find out which ones are not maximums in any other rows. We can use Counter on our maximums list to find out how many of each there are. Counter returns a counter object that is like a dictionary with the maximum as the key, and the # of times it appears as the value.
We then do dictionary comprehension where the value is == to 1. That will give us the maximums that only show up once. we use the .keys() function to grab the numbers themselves, and then turn it into a list.
c = Counter(maximums)
{9999117: 15,
9998584: 2,
9998352: 2,
9999226: 22,
9999697: 59,
9999534: 32,
9998775: 8,
9999288: 18,
9998956: 9,
9998119: 1,
...}
k = list( {x: c[x] for x in c if c[x] == 1}.keys() )
[9998253,
9998139,
9998091,
9997788,
9998166,
9998552,
9997711,
9998230,
9998000,
...]
Lastly we can do the following list comprehension to iterate through the original maximums list to get the indicies of where these rows are.
indices = [i for i, x in enumerate(maximums) if x in k]
Depending on what else you're looking to do we can go from here.
Its not the speediest program but finding the maximums, the counter, and the indicies takes 182 seconds on a 50,000 by 50,000 matrix that is already loaded.

How to generate all arrangements of two values in 5 columns using Excel?

I have two values: 1 and 0. And I have 5 columns. I need to generate all possible arrangements in Excel.
For example, I have 2 columns and 2 values: 0 , 1. There are only 4 possible arrangements (with repetitions):
1 | 0
0 | 1
0 | 0
1 | 1
I need to generate all posible arrangements of 1 and 0 for 5 columns. Number of possible arrangements with repetition is defined by formula: n^k.
So, for 5 columns and 2 values it is 2^5 = 32 arrangements.
In Excel:
and so on
Is it possible to automate it without typing ones and zeros manually?
You basically want to count from 0 to 31 in binary and then split the binary result out over the columns. You can do it like this:
Column A - just the number i.e. 0, 1, 2, 3, 4
Column B - =DEC2BIN(A2,5)
Columns C to G - =MID($B2,C$1,1) and then drag down and across
For example - for the formula to get the binary digit in the correct column:

Resources