Excel - Counting number of occurrence in a bit string large range - excel

I have 13-bit strings data in a column, I want to count the number of cells with unique combination of bits. The total number of cells in that column is 209066. I am stuck on how to make it possible.
Since, 2^13 = 8192 cells, so that's quite hectic too , to find the unique combinations statistically, and then write in the adjacent cell, how much times that value occurs.
13 Bit String Occurrence
1001111101011
0011111010110
0111110101101
1111101011011
1111010110110
1110101101100
1101011011000
1010110110000
0101101100001
1011011000010
0110110000101
1101100001011
1011000010111
0110000101110
1100001011101
1000010111011
0000101110111
0001011101110
0010111011100
0101110111001
1011101110011
0111011100111
1110111001110
1101110011101
1011100111010
0111001110101
1110011101011
1100111010110
1001110101100
0011101011001
0111010110011
1110101100110
1101011001100
1010110011000
0101100110000
1011001100001
0110011000011
1100110000110
1001100001100
0011000011001
0110000110011
1100001100110
.......
[continued upto cell 209066]

Highlight the column with the bits
Data--> Sort&Filter --> Advanced
Enable "Copy to another location" and "Unique records only" List
Range should be where all the bits are (already highlighted)
Leave Criteria Range blank
Copy to should be a blank cell, in the same worksheet, with nothing in the rows below it.
Assume bits are in column A, Unique in column D. Put this formula in E1 and fill down along the unique values =COUNTIF($A:$A,"="&D1) `

Parse your data with Text to Columns, Fixed width, single character per field. Say in Columns A:M working on a copy. Then:
=SUM(A2:M2)
say in N2 and:
=COUNTIF(N:N,N2)
say in O2 with N2 and O2 copied down to suit.

After reading everything again, You are probably looking for something like this:
=SUM((COUNTIF(A2:A209066,A2:A209066)=1)*1)
This is an array formula and must be confirmed with ctrl+shift+enter.
Counts the each value which is unique within the range.
But you may be looking for the count of different values which is:
=SUM(IFERROR(1/COUNTIF(A2:A209066,A2:A209066),0))
This is an array formula and must be confirmed with ctrl+shift+enter.
EDIT
If you just want the count for the first value, then this may be what you need (B1 and copy down):
=IFERROR(IF(MATCH(A2,A:A,0)=ROW(),COUNTIF(A:A,A2),""),"")
(no array formula this time) ;)

Related

How to count unique/ distinct visible values in a filtered column, with considering blank cells?

As such references I had found, those are not covering "assuming blank cells same as non-blank ones" matter.
I had found this array formula: (Ref: extendoffice.com)
=SUM(IF(FREQUENCY(IF(SUBTOTAL(3,OFFSET(D2,ROW(D2:D22)-ROW(D2),,1)), IF(D2:D22<>"",MATCH("~"&D2:D22,D2:D22&"",0))),ROW(D2:D22)-ROW(D2)+1),1))
Any guides are appreciated.
Update
F22 result of calculating for Table1[Column1]
G22 result of calculating for Table1[Column2]
H22 result of calculating for Table1[Column3]
I want the Formula returns: G22=4 & H22=1
Note: My table has filtered range and I calculating visible values.
In the formula you quote the SUBTOTAL part is used to only consider visible cells.......but it also ignores blanks, so if you want to include blanks as another distinct value to be counted that's a problem.
Do you have any column that you know will be fully populated (e.g. column A)? If so you can base the SUBTOTAL part on that column and the counting distinct on the actual column in question, e.g. assuming A2:A22 will always be fully populated try this version to count distinct values in D2:D22 (including blanks):
=SUM(IF(FREQUENCY(IF(SUBTOTAL(3,OFFSET($A2,ROW($A2:$A22)-ROW($A2),,1)),MATCH("~"&D2:D22,D2:D22&"",0)),ROW(D2:D22)-ROW(D2)+1),1))
confirmed with CTRL+SHIFT+ENTER

How to report cell value to another cell on excel

I'm entering datas in A column of a table. According to the datas, I making sums from the yellow cell (actually this yellow cell is the high value of the A column). So all 10 cells, there is a sum until the end of datas.
I'm looking for automatically report the seven first grey cells (the sums)to another table. The problem is, according to the datas, high value is not at the same place so the sums are not to the same place too.
How can I do ?
Thank you for your help
MY ERROR :
And the message when I press ctrl maj enter in same time :
You might use this array formula in your report.
=INDEX($F:$F,SMALL(ROW($F$4:$F$117)+(100*(F$4:$F$117="")), ROW(F1)))&""
Bear in mind that, as an array formula, it must be confirmed with Control+Shift+Enter. Enter the formula in the row where you have Somme = 1, then copy down to 6. Note that Row(F1) is a counter. You have a similar counter (1 to 6) in F124:F130. Therefore you can replace ROW(F1) with $F124 (if that is where the "1" is) to make it easier to understand, perhaps.
The formula retrieves the value of the 1st, 2nd, 3rd etc non-blank cell in the range F4:F117. If those cells contain a formula they will be considered "blank" if their result equals "".
BTW, if you don't always have 113 results to evaluate you might consider giving a name to the range E4:E117. For example, if you name that range as "Results" then =SUM(Results) would be the same as =SUM($E$4:$E$117), but as you insert or delete rows within the named range the formula doesn't need to be amended. Use of a named range would simplify understanding your existing formula. You could do the same with column F.
Finally I find a solution to report the values from F to another table. As values positions are dependant of the MAX raw in E (every 10 cells) I make this formula :
For the first : INDEX(E4:F117;EQUIV(GRANDE.VALEUR($E$4:$E$117;1);$E$4:$E$117;0)+10;2)
For the second :
INDEX(E4:F117;EQUIV(GRANDE.VALEUR($E$4:$E$117;1);$E$4:$E$117;0)+20;2)
Etc...

Find a range of value in excel

I have two different sheets with 300,000 data in Excel.
First sheet contains:
S2_Symbol Start_Pos End Position
STE 254857 267891
PRI 748578 758962
ILA 852741 963369
VIS 789456 796325
Second:
S1_Location
789460
852898
748678
My output should be like this:
S1_Location Symbol
789460 VIS
852898 ILA
748678 PRI
I have to find that S1_location falls in which S2_location and its corresponding Symbol. I have used INDEX formula in Excel but for each cell, I have to change the reference cell manually. I couldn't do it 300,000 data.
How can I do in an in Excel or should I use a script?
This solution assumes the following:
Start and End Positions for each S2 Symbol are unique (i.e. there is no intersection between the ranges allocated to each symbol)
Data in first sheet is located at A1:D17 (adjust ranges in formulas as needed)
Data in second sheet is locate at A1:B300010 (adjust ranges in formulas as needed)
The solution requires:
To add a working column in worksheet one. Enter this formula in D2 and copy till last record.
=ROWS($A$1:$A2)
Fig. 1
Then in second worksheet enter this formula at B2 and copy till last record.
=INDEX( Sheet1!$A$1:$A$17,
SUMIFS( Sheet1!$D$1:$D$17,
Sheet1!$B$1:$B$17, "<=" & $A2, Sheet1!$C$1:$C$17, ">=" & $A2 ) )
Fig. 2
It took aprox. less than 14 seconds to copy downwards and calculate the formulas in sheet 2.
As it can be seen in figures 1 and 2 none of the tables need to be sorted.
Assuming both sheets start in A1, and First sheet ColumnB is sorted ascending, in Second sheet B2 please try:
=INDEX(First!A:A,MATCH(A2,First!B:B))
copied down to suit. It relies on inexact matching.
Assuming we have a Sheet1 like this:
note, the Sheet1is sorted by Start_Pos, End_Pos in ascending order.
and a Sheet2 like this:
Then the formula in Sheet2!B2 downwards could be:
=INDEX(Sheet1!A:A,IF(MATCH(A2,Sheet1!B:B)>IFERROR(MATCH(A2-(10^-10),Sheet1!C:C),0),MATCH(A2,Sheet1!B:B),NA()))
See MATCH: https://support.office.com/en-us/article/MATCH-function-e8dffd45-c762-47d6-bf89-533f4a37673a
The idea is: MATCH without exact matching (without parameter match_type) gets the row of the largest value which is smaller or equal the search value. So in the Start_Pos column it will get the row from which we can get the S2_Symbol. But from the End_Pos column it should get one row beforehand if the value is not outside the given ranges.
There is only one exception. If the value is exact the value in the End_Pos column, then it will return the same row as in the Start_Pos column. Considering this exception, we can search in the End_Pos column with a little bit smaller value. Thanks to Tom Sharpe for his comment.
The formula in Sheet2!D2 downwards is:
{=INDEX(Sheet1!A:A,MIN(IF($A2>=Sheet1!$B$2:$B$300000,IF($A2<=Sheet1!$C$2:$C$300000,ROW(Sheet1!$A$2:$A$300000),2^20+1))))}
this is an array formula which is exactly formulated respecting the requirements. But this is very bad in performance for using in much many cells. But using this, the Sheet1 is not required to be sorted.
Benchmark test:
Have the following Sheet1:
Formulas:
A2:A300002: ="S"&(ROW(A1)-1)*10&"-"&(ROW(A1)-1)*10+7
B2:B300002: =(ROW(A1)-1)*10
C2:C300002: =B2+7
and the following Sheet2:
Formulas:
A2:A300002: =RANDBETWEEN(0,3000007)
B2:B300002: =INDEX(Sheet1!A:A,IF(MATCH(A2,Sheet1!B:B)>IFERROR(MATCH(A2-10^-9,Sheet1!C:C),0),MATCH(A2,Sheet1!B:B),NA()))
Note the -10^-9 instead of -10^-10 in previous version. This is because we have only 16 digits precision. In previous version this was maximum 6 digits integer part and then 10 digits decimal part. Now it is maximum 7 digits integer part and then 9 digits decimal part.
Calculation after pressing F9 in Sheet2 takes ca. 2 s. (Excel 2007, Windows 7, 4 core processor).
I would have gone for something like this which gives you the first match if there is one:-
=INDEX(First!A:A,MATCH(1,(First!B:B<=A2)*(First!C:C>=A2),0))
assuming keys and start and end values are in a sheet called First and lookup values start in A2.
Array formula which must be entered with CtrlShiftEnter
In response to the question from #pnuts about how long it will take, I have set up a similar benchmark with 300,000 rows in each sheet and it has reached 1% after 90 minutes, so it should take about 150 hours to reach 100% or roughly one week. This is to be expected as the number of computations required is (rows in sheet 1) X (rows in sheet 2)
300,000 X 300,000
but in fact because the multiplication applies to complete columns, I believe it is more correctly
300,000 X 1,048,576
i.e. > 300 billion.
A practical version which gives good response for smaller ranges is as follows:-
I define three named ranges Range1, Range2 and Range3
=First!$A$1:INDEX(First!$A:$A,MATCH("ZZZ",First!$A:$A))
=First!$B$1:INDEX(First!$B:$B,MATCH(9.9E+307,First!$B:$B))
=First!$C$1:INDEX(First!$C:$C,MATCH(9.9E+307,First!$C:$C))
and the modified formula is
=INDEX(Range1,MATCH(1,(Range2<=A2)*(Range3>=A2),0))
I was thinking of deleting this answer, but would rather it stood as a counter-example.

Is there a 2 Value Look up function in MS Excel that can perform the following?

I am going crazy over this. It seems so simple yet I can't figure this out. I have two worksheets. First worksheet is my data. Second is like an answer key. Upon checking checking, A1:B1 in Sheet 1 is a match with the conditions in Row 52 in SHEET 2, therefore, the value in Column C is "MGC". What is the formula that will perform this function? It's really hard to explain without the data so I pasted a link of the sample spreadsheet. Thank you so much in advance.
sample spreadsheet here. https://docs.google.com/spreadsheets/d/1_AjuNfCdGfEM-XkqPa6W4hSIxQg4NM2Vg4c2C1pQ_vQ/edit?usp=sharing
screenshot here. (wont let me post i have no reputation)
In Sheet2, insert a column in front of Column A and put the formula in A2 =C2&D2.
Then in Sheet1, Cell C2 the formula =vlookup(A2&B2,Sheet2!A:B,2,0).
the first make a concatenated key to lookup, then the second looks up that key.
How about a index(match())? If I've understood correctly you need to match across both the A and B column in sheet one, checking for the relevant values in B and C on sheet 2 to retrun worksheet 2 column a to worksheet 1 column c.
third version try:
=INDEX(Sheet2!$C$1:$C$360,MATCH(Sheet1!A1&Sheet1!B1,Sheet2!$B$1:$B$360&Sheet2!$C$1:$C$360,0))
Basically what this does is use concatenation, the & operator, to specify you are looking for "Criteria A" & "Criteria B" in sheet 1, which makes the string "Criteria A Criteria B", which is supplied in the first part of the match function.
In the second it then says match this against all of my variables in sheet 2 in the same way with concantenation.
The final part of match function (0) specifies you want an 'exact' match
It then supplied this as a reference to the index function, which then finds the row intersecting with the value you want, and returns that.
As noted here https://support.microsoft.com/en-us/kb/59482 this is an array formula, so it behaves differently, and must be input differently. https://support.office.com/en-za/article/Guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7
There are (at least) 2 ways you could do this without VBA.
USING A SORTED LIST
The first relies on the assumption that your data can be re-sorted, so that everything "Unreported" is in the top, and everything "reported" is together below that (or vice versa). Assuming that this is the case (and it appears to already be sorted like this),we will use the function OFFSET to create a new range which shows only the values that align with either being "Unreported" or "Reported".
Offset takes a given reference to a point on a sheet, and then moves down/up & left/right to see what reference you want to return. Then, it returns a range of cells of a given height, and a given width. Here, we will want to start on Sheet2 at the top left, moving down until we find the term "Unreported" or "Reported". Once that term is found, we will want to move one column to the right (to pull column B from sheet 2), and then have a 'height' of as many rows as there are "unreported" or "reported" cells. This will look as follows in A1 on sheet 1, copied down:
=OFFSET(Sheet2!$A$1,MATCH(A1,Sheet2!A:A,0)-1,1,COUNTIF(Sheet2!A:A,A1),1)
This says: First, start at cell A1 on sheet2. Then find the term in A1 (either "unreported" or "reported", on sheet2!A:A (we subtract 1 because OFFSET starts at A1 - so if your data starts at A1 we need to actually stay at "0". If you have headers on sheet2, you will not need this -1). Then, move 1 column to the right. Go down the rows for as many times as Sheet2 column A has the term found in Sheet1 A1. Stay 1 column wide. Together, this will leave you with a single range on sheet2, showing column B for the entire length that column A matches your term in sheet1 A1.
Now we need to take that OFFSET, and use it to find out when the term in Sheet1 B1 is matched in Sheet2 column B. This will work as follows:
=MATCH(B1,[FORMULA ABOVE],0)
This shows the number of rows down, starting at the special OFFSET array created above, that the term from B1 is matched in column B from sheet2. To use this information to pull the result from column C on sheet 2, we can use the INDEX function, like so:
=INDEX([FORMULA ABOVE],MATCH(B1,[FORMULA ABOVE],0))
Because this would be fairly convoluted to have in a single cell, we can simplify this by using VLOOKUP, which will only require the OFFSET function to be entered a single time. This will work as follows:
=VLOOKUP(B1,[FORMULA ABOVE],2,0)
This takes the OFFSET formula above, finds the matching term in B1, and moves to the 2nd column to get the value from column C in sheet2. Because we are going to use VLOOKUP, the offset formula above will need to be adjusted to provide 2 columns of data instead of 1. Together, this will look as follows:
FINAL FORMULA FOR SHEET1, C1 & COPIED DOWN
=VLOOKUP(B1,OFFSET(Sheet2!$A$1,MATCH(A1,Sheet2!A:A,0)-1,1,COUNTIF(Sheet2!A:A,A1),2),2,0)
OPTION USING ARRAY FORMULAS
The above method will only work if your data is sorted so that the REPORTED and UNREPORTED rows are grouped together. If they cannot be sorted, you can use an ARRAY FORMULA, which essentially takes a formula which would normal apply to a single cell, and runs it over an entire range of cells. It returns an array of results, which must be reduced down to a single value. A basic array formula looks like this [assume for this example that A1 = 1, A2 = 2...A5 = 5]:
=IF(A1:A5>3,A1:A5,"")
Confirm this (and all array functions) by pressing CTRL + SHIFT + ENTER, instead of just ENTER. This looks at each cell from A1:A5, and if the value is bigger than 3, it gives the number from that cell - otherwise, it returns "". In this case, the result would be the array {"";"";"";4;5}. To get the single total of 9, wrap that in a SUM function:
=SUM(IF(A1:A5>3,A1:A5,""))
In your case, we will want to use an array formula to see what row in Sheet2 matches A1 from Sheet1, and B1 from Sheet1. This will look like this:
=IF(Sheet2!$A$1:A$100=A1,IF(Sheet2!$B$1:$B$100,ROW($B$1:$B$100),""),"")
This checks which rows in column A from sheet 2 match A1. For those that do, it then checks which rows in column B from sheet 2 match B1. For those, it pulls the row number from that match. Everything else returns "". Assuming no duplicates, there should only 1 row number which gets returned. To pull that number from the array of results, wrap the whole thing in a MATCH function. Now that you have the row number, you can use an INDEX function to pull the result in Column C with that row, like this:
FINAL ARRAY FORMULA METHOD
=INDEX($C$1:$C$100,MAX(IF(Sheet2!$A$1:A$100=A1,IF(Sheet2!$B$1:$B$100,ROW(Sheet2!$B$1:$B$100),""),"")))
Remember to confirm with CTRL + SHIFT + ENTER instead of just ENTER, when you type this formula. Note that I didn't refer to all of Sheet2!A:A, because array formulas run very slowly over large ranges.
The following formula should work without making any changes to the datasheets.
=INDEX(Sheet2!$A$1:$A$360,MATCH(Sheet1!A1,IF(Sheet2!$C$1:$C$360=Sheet1!B1,Sheet2!$B$1:$B$360),0))
Remember to save this formula as an array with CTRL+SHIFT+ENTER
Documentation on how to use INDEX and MATCH against multiple criteria can be found on Microsoft Support.
It's not clear what you want to do with the multiples that do not have corresponding matches. txed is listed as Unreported twice in Sheet1; kntyctap is listed as Unreported three times. There are only one corresponding match on Sheet2 for each of these.
Non-array Standard Formulas for multiple criteria matches
For Excel 2010 and above use this standard formula in Sheet1!C1:
=IFERROR(INDEX(Sheet2!$A$1:$A$999,AGGREGATE(15,6,ROW(1:999)/((Sheet2!$B$1:$B$999=A2)*(Sheet2!$C$1:$C$999=B1)), COUNTIFS(A$1:A1, A1, B$1:B1, B1))), "")
For version of Excel prior to 2010 use this standard formula in Sheet1!C1:
=IFERROR(INDEX(Sheet2!$A$1:$A$999, SMALL(INDEX(ROW($1:$999)+((Sheet2!$B$1:$B$999<>A1)+(Sheet2!$C$1:$C$999<>B1))*1E+99, , ), COUNTIFS(A$1:A1, A1, B$1:B1, B1))), "")
I've handled error with the IFERROR function in that latter formula. Excel 2003 and previous may have to use an IF(ISERROR(..., ...)) combination.

EXCEL Formulas Sum Everything above specific row

I want to SUM everything above a cell that contains the word "SUMTOTAL". So if I have 50 columns I want it to go to first row that has the text "SUMTOTAL" in it and then Sum everything aboce that word. Is it possible?
Use a MATCH formula to find the row and minus one from it then use an INDIRECT formula to put together a string of the address then plop it into a sum formula like this:
=SUM(INDIRECT("A1:A" & MATCH("SUMTOTAL",B:B,0)-1))
Assumption:
SUMTOTAL is in column B somewhere
The numbers you want to sum are in column A
Your data starts at row 1.
You are summing ONE column. To expand simply change "A1:A" to "A1:X" if you wanted to sum columns A to X
I assume that all your data is located in A1:N20, and SUMTOTAL appears somewhere inside this area (you can easily change the desired data location). The following formula does the summation of all numbers directly above SUMTOTAL, i.e., in the same column.
=SUM(OFFSET($A$1,0,SUMPRODUCT(COLUMN($A$1:$N$20)*($A$1:$N$20="SUMTOTAL"))-1,SUMPRODUCT(ROW($A$1:$N$20)*($A$1:$N$20="SUMTOTAL"))-1))
If you want to sum all numbers above SUMTOTAL, no matter if in the same column or not, use
=SUM(OFFSET($A$1,0,0,SUMPRODUCT(ROW($A$1:$N$20)*($A$1:$N$20="SUMTOTAL"))-1,COLUMNS($A$1:$N$20)))
=SUM(INDIRECT(ADDRESS(1,COLUMN())&":"&ADDRESS(ROW()-1,COLUMN())))

Resources