Count Values for Each Number in a cell in a Column - excel

I have an excel sheet like the following, and would like to go down each row and add 1 to each of the numbers listed under the L3 column. Eventually, I would like to output something like this:
L3s Count Attr Ids
4770 10 [370, 380, ...]
6420 8 [481, 490...]
21253 20 [580....290]
... ... ...
The count is derived by going through all of the rows, and adding 1 to each L3 number whenever it is encountered. Attr IDs are the ids that contributed to the count. Is there any simple way to accomplish this in excel without having to vba/python?
Thanks in advance!

If you have windows Excel O365, you can use the following formulas:
(Note that I made the original data into a Table)
Sorted Unique list of the L3s:
=SORT(UNIQUE(FILTERXML("<t><s>" &SUBSTITUTE(SUBSTITUTE(TEXTJOIN("</s><s>",TRUE,Table1[L3s])," ",""),",","</s><s>")&"</s></t>","//s")))
Count of the L3s
=COUNT(FILTERXML("<t><s>" &SUBSTITUTE(SUBSTITUTE(TEXTJOIN("</s><s>",TRUE,Table1[L3s])," ",""),",","</s><s>")&"</s></t>","//s[.=" & F8 &"]"))
Associate Attr IDs
="[" &TEXTJOIN(",",TRUE,FILTER(Table1[attr],ISNUMBER(FIND(","&F8&",",SUBSTITUTE(","&Table1[L3s]& ","," ","")))))&"]"

Related

[Excel][Math] Formula to generate numbers between two limits

I am trying to fill a column in Excel with values that are generated randomly but the sum of all the values total to the end.
Example:
Starting value = 320
Final value = 350
Need to generate 2 more values which have random difference between them but total to 350 at the end, as in: 2nd val = 12, 3rd val = 18.
The script/formula should generate different values when run next (for another table etc.). It may generate, for the same starting and final values, 15 and 15 or 8 and 22 for the 2nd and 3rd values respectively etc.
Basically what the formula should do is: Find the difference between the starting and final values then randomly add a number to create the 2nd entry. Now the third entry should follow the same pattern but the value generated should end up totaling to the final.
The example is only for 2 values but I'm working on tables ranging from 15-30+ values.
I don't know if Excel can do it, or if there's a mathematical formula that will work here.
Thanks in advance for all the help!
You need something like this:
That gives:
The trick is to generate a list of uniform random numbers between 0 and 1 and then scale those numbers up the total that you're looking for.

Create a text cell value based on row entries and corresponding columns

I understand this is a tough way of wording the problem I have. Please try and help me.
I want to create a Column called Orders which contains cells based on corresponding item values.
So if I have columns: FlatNo, Truffle, Pineapple, Mango, Chocochips; I want to create a column called Orders which has value:
FlatNo - A51
Mango - 1
Chocochips - 1
(if no values in the Pineapple & Truffle Columns, none show up in Orders columns)
See image
How do I do that ? Thank you in advance
You can use IF and &. & simply puts the different desired things altogether.
Hope the following formula will get you the result for column orders. I have put the number of each item ordered inside parentheses before the item.
="Flat No. "&A2&IF(ISBLANK(B2),"","-("&B2&")"&$B$1)&IF(ISBLANK(C2),"","-("&C2&")"&$C$1)&IF(ISBLANK(D2),"","-("&D2&")"&$D$1)&IF(ISBLANK(E2),"","-("&E2&")"&$E$1)
For instance the third order is shown like this: Flat No. E-23-(1)Truffle -1 Pc Rs 60-(3)Mango -1 Pc Rs 60

comparison of two excel sheet and write output

Am writing a python code using xlrd and xlwt to compare two excel sheet and writing output in third sheet.
For example
Sheet 1
nativeEMSName
HR_MEWT_XX5906_TR_I_HR10001
HR_LOHN_5811X_T01_C_X_HO55001
HR_PHKL_XX6541_TR_I_HR10001
HR_RWRI_XX3608_TR_I_HR10001
HR_KTHL_XX6382_AR_I_HR50001
ABC
HR_KURU_XX3714_TR_I_HR10001
HR_RWRI_XX1142_TR_I_HR10001
HR_SAHU_SAHUW_B01_C_X_EX10001
HR_KTHL_XX3622_TR_I_HR10001
Sheet2
nativeEMSName id
HR_KURU_XX3714_TR_I_HR10001 66
HR_PHKL_XX6541_TR_I_HR10001 999
HR_MEWT_XX5906_TR_I_HR10001 2
HR_KTHL_XX6382_AR_I_HR50001 7777
HR_KTHL_XX3622_TR_I_HR10001 4
HR_SAHU_SAHUW_B01_C_X_EX10001 3
HR_LOHN_5811X_T01_C_X_HO55001 111
HR_RWRI_XX1142_TR_I_HR10001 55
HR_RWRI_XX3608_TR_I_HR10001 888
am finding sheet1's nativeEMSName in sheet2 and write nativeEMSName and respective ID in sheeet3.
Below code is using for same
conls=0
colnd=0
for rowsr in range(sheet1.nrows):
test=sheet1.cell(rowsr,colns).value
for rowdr in range(sheet2.nrows):
test1=sheet2.cell(rowdr,colnd).value
if test==test1:
ID = sheet2.cell(rowdr, colnd +1).value
sheet3.write(rowsr,colns,ID)
sheet3.write(rowsr,colnd+1,test1)
wb.save('test.xls')
break
But the challenge is when noumber of row is like 30k in both sheet then code taking too much time to execute. I want to reduce execution time.
Any help will appreciate to optimize this code or use another way to get output in shortest time.
Your look-up is O(mxn) where m is number of rows in sheet1, n is number of rows in sheet2.
I assume there are no duplicates. I think this will work better. It is also easy to find bugs because you have data in map
1. Read sheet1 into a list
sheet1_list = [HR_MEWT_XX5906_TR_I_HR10001,
HR_LOHN_5811X_T01_C_X_HO55001,
HR_PHKL_XX6541_TR_I_HR10001,
HR_RWRI_XX3608_TR_I_HR10001, and_so_on.. ]
2. Read sheet2 (look-up sheet) into a map
sheet2_map = {
'HR_KURU_XX3714_TR_I_HR10001' : '66',
'HR_PHKL_XX6541_TR_I_HR10001' : '999',
'HR_MEWT_XX5906_TR_I_HR10001' : '2,
'HR_KTHL_XX6382_AR_I_HR50001' : '7777',
<So on..>
}
3. Loop list and find id,
Here, individual look up is O(1), and total time is O(n), n is number of entries in sheet1, hence reduces time.
for key in sheet1_list:
print(key, sheet2_map[key]) #check by print
sheet3_map[key] = sheet2_map[key] # inserts key:id, like {'HR_MEWT_XX5906_TR_I_HR10001': '2' }
4. convert_map_to_excel( sheet3_map) ,
there are libraries but, its easy, use xlwt to write xls. https://xlwt.readthedocs.io/en/latest/

Find values occurring in multiple columns in excel

I have sets of gene probes that are upregulated when put under different chemical stresses. Each column contains all of the upregulated gene probes. I have 12 columns, how do I get a list of gene probes that appear in all 12 columns?
I've been able to find similarities between two columns using the formula
=IF(ISERROR(MATCH(A2,$C$2:$C$21473,0)),"",A2)
but cant work out how to adapt it to include 12 columns
G.Ac G.As G.At G.Ac.At G.As.Ac G.As.At G.Cd G.Cu G.Ni
G.Cd.Cu G.Cd.Ni G.Ni.Cu
GENE:JGI_V11_3346220103 GENE:JGI_V11_2653050203 GENE:JGI_V11_3299790103
GENE:JGI_V11_359040103 GENE:JGI_V11_2228010103 GENE:JGI_V11_2662750203
GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303 GENE:JGI_V11_3119540303
GENE:JGI_V11_3134270203 GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303
GENE:JGI_V11_3164760203 GENE:JGI_V11_565470303 GENE:JGI_V11_2296170203
GENE:JGI_V11_2045300203 GENE:JGI_V11_2421620203 GENE:JGI_V11_2228010303
GENE:JGI_V11_2196580303 GENE:JGI_V11_3134270203 GENE:JGI_V11_3119540203
GENE:JGI_V11_1926920103 GENE:JGI_V11_1926920103 GENE:JGI_V11_1014720202
GENE:JGI_V11_478830203 GENE:JGI_V11_3168730303 GENE:JGI_V11_3311070202
GENE:JGI_V11_3216620102 GENE:JGI_V11_2653050303 GENE:JGI_V11_3300140202
GENE:JGI_V11_2653050303 GENE:JGI_V11_1159220202 GENE:JGI_V11_2024180303
GENE:JGI_V11_1926920303 GENE:JGI_V11_2196580303 GENE:JGI_V11_1159220202
GENE:JGI_V11_3164760303 GENE:JGI_V11_2228010203 GENE:JGI_V11_2341670203
GENE:JGI_V11_1938910303 GENE:JGI_V11_3026230203 GENE:JGI_V11_2449230203
GENE:JGI_V11_3134270303 GENE:JGI_V11_2235750203 GENE:JGI_V11_1981410203
GENE:JGI_V11_3251310202 GENE:JGI_V11_977750103 GENE:JGI_V11_954070203
GENE:JGI_V11_2267320203 GENE:JGI_V11_2268000303 GENE:JGI_V11_2226270101
GENE:JGI_V11_3003640303 GENE:JGI_V11_223520203 GENE:JGI_V11_2662750103
GENE:JGI_V11_2228010103 GENE:JGI_V11_3251310202 GENE:JGI_V11_3198630203
GENE:JGI_V11_3134270303 GENE:JGI_V11_1926920203 GENE:JGI_V11_287750103
GENE:JGI_V11_465160203 GENE:JGI_V11_2268000203 GENE:JGI_V11_2473230303
GENE:JGI_V11_3192220102 GENE:JGI_V11_3026230303 GENE:JGI_V11_3039310303
GENE:JGI_V11_1926920103 GENE:JGI_V11_1159220102 GENE:JGI_V11_3052790202
GENE:JGI_V11_3075830303 GENE:JGI_V11_2196580203 GENE:JGI_V11_3134280203
GENE:JGI_V11_3142970303 GENE:JGI_V11_503720303 GENE:JGI_V11_2236410103
GENE:JGI_V11_3042230103 GENE:JGI_V11_2228010203 GENE:JGI_V11_3028210101
GENE:JGI_V11_2105710303 GENE:JGI_V11_1926920303 GENE:JGI_V11_2131620103
GENE:JGI_V11_1002840203 GENE:JGI_V11_2088480203 GENE:JGI_V11_3196120102
Heres the first 8 rows of the 12 columns. There are 21473 rows in total.
Thanks
You could use an array formula like this to count how many columns a particular gene probe occurs in
=SUM(--(MMULT(TRANSPOSE(ROW(A$2:L$10000)^0),N(A$2:L$10000=A2))>0))
This is a standard way of getting column totals for a 2D array - in this case an array of true/false values corresponding to instances of an array element being equal/unequal to A2.
It is rather a brute force approach - it needs ~120K multiplications for each row. If you copy the formula down for ~10K rows, there is a delay of ~100 seconds on my computer while Excel works out the results.
Must be entered as an array formula using CtrlShiftEnter
In this dummy data C is the only value that occurs in all 12 columns.

Return a row number that matches multiple criteria in vbs excel

I need to be able to search my whole table for a row that matches multiple criteria. We use a program that outputs data in the form of a .csv file. It has rows that separate sets of data, each of these headers don't have any columns that are unique in of them self but if i searched the table for multiple values i should be able to pinpoint each header row. I know i can use Application.WorksheetFunction.Match to return a row on a single criteria but i need to search on two three or four criteria.
In pseudo-code it would be something like this:
Return row number were column A = bill & column B = Woods & column C = some other data
We need to work with arrays:
There are 2 kinds of arrays:
numeric {1,0,1,1,1,0,0,1}
boolean {TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE}
to convert between them we can use:
MATCH function
MATCH(1,{1,0,1,1,1,0,0,1},0) -> will result {TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE}
simple multiplication
{TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE}*{TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE} -> will result {1,0,1,1,1,0,0,1}
you can can check an array in the match function, entering it like in the picture below, be warned that MATCH function WILL TREAT AN ARRAY AS AN "OR" FUNCTION (one match will result in true
ie:
MATCH(1,{1,0,1,1,1,0,0,1},0)=TRUE
, YOU MUST CTR+SHIFT+ENTER !!! FOR IT TO GIVE AN ARRAY BACK!!!
in the example below i show that i want to sum the hours of all the employees except the admin per case
we have 2 options, the long simple way, the complicated fast way:
long simple way
D2=SUMPRODUCT(C2:C9,(A2=A2:A9)*("admin"<>B2:B9)) <<- SUMPRODUCT makes a multiplication
basically A1={2,3,11,3,2,4,5,6}*{0,1,1,0,0,0,0,0} (IT MUST BE A NUMERIC ARRAY TO THE RIGHT IN SUMPRODUCT!!!)
ie: A1=2*0+3*1+11*1+3*0+2*0+4*0+5*0+6*0
this causes a problem because if you drag the cell to autocomplete the rest of the cells, it will edit the lower and higher values of
ie: D9=SUMPRODUCT(C9:C16,(A9=A9:A16)*("admin"<>B9:B16)), which is out of bounds
same as the above if you have a table and want to view the results in a diferent order
the fast complicated way
D3=SUMPRODUCT(INDIRECT("c2:c9"),(A3=INDIRECT("a2:a9"))*("admin"<>INDIRECT("b2:b9")))
it's the same, except that INDIRECT was used on the cells that we want not be modified when autocompleting or table reorderings
be warned that INDIRECT sometimes give VOLATILE ERROR,i recommend not using it on a single cell or using it only once in an array
f* c* i cant post pictures :(
table is:
case emplyee hours totalHoursPerCaseWithoutAdmin
1 admin 2 14
1 him 3 14
1 her 11 14
2 him 3 5
2 her 2 5
3 you 4 10
3 admin 5 10
3 her 6 10
and for the functions to check the arrays, open the insert function button (it looks like and fx) then doubleclick MATCH and then if you enter inside the Lookup_array a value like
A2=A2:A9 for our example it will give {TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE} that is because only the first 3 lines are from case=1
Something like this?
Assuming that you data in in A1:C20
I am looking for "Bill" in A, "Woods" in B and "some other data" in C
Change as applicable
=IF(INDEX(A1:A20,MATCH("Bill",A1:A20,0),1)="Bill",IF(INDEX(B1:B20,MATCH("Woods",B1:B20,0),1)="Woods",IF(INDEX(C1:C20,MATCH("some other data",C1:C20,0),1)="some other data",MATCH("Bill",A1:A20,0),"Not Found")))
SNAPSHOT
I would use this array* formula (for three criteria):
=MATCH(1,((Range1=Criterion1)*(Range2=Criterion2)*(Range3=Criterion3)),0)
*commit with Ctrl+Shift+Enter

Resources