Formula to find n numbers occuring together in a cell - excel

I would like to ask whether it is possible to create formula in Excel that extracts n first numbers in certain cell but only if these numbers are next to each other (in a group).
For instance for RegExp we can write \d{8} which in string:
abc1234_123456789012abc_87654321000_abc
finds groups of every eight numbers occuring together no matter how many times they occur:
abc1234_123456789012abc_87654321_000abc
I would like to achieve similar effect (it may be even the first occurrence, not every) without using VBA (RegExp) and to have ability to easily change number of digits taken into account, i.e. from another cell, not by expanding the formula with additional functions.
Thank you in advance.

With Microsoft365, you could try:
For all overlapping values in C1:
=LET(A,MID(A1,SEQUENCE(LEN(A1)),B1),B,IF(ISNUMBER(--A),A,""),FILTER(B,B<>"",""))
For all non-overlapping sequences in D1:
=LET(A,B1,B,MID(A1,SEQUENCE(LEN(A1)),1),C,FILTERXML(SUBSTITUTE(TRIM(CONCAT("<t><s>'",IF(ISNUMBER(--B),B," "),"</s></t>"))," ","</s><s>'"),"//s"),D,FILTERXML("<t><s>"&TEXTJOIN("</s><s>",,MID(C,SEQUENCE(1,LEN(A1),2,A),A))&"</s></t>","//s[string-length()="&A&"]"),TEXT(D,REPT(0,A)))
The 2nd option got quite long since I needed to find a way to prevend false positives when I checked if preceding tokens were numeric etc. But you'd only now need to change the value in B1 to whichever number you'd like to find the non-overlapping values. For example:
To simply get the first occurence of any 8 digits, try:
=IFERROR(MID(A1,MATCH(1,INDEX((ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),B$1)))*(LEN(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),B$1))=B$1),),0),B$1),"Not Found")

Related

Divide text in the column into two columns (text and numbers) - Google sheet or Excel

I have a document in google sheets and the column consists of the name and version, like NLog.Config.4.3.0, NLog.Config.4.4.9 and so on.
See the image below for other examples.
I need to divide this into two columns - name and version, but I'm not familiar with regular expressions so close that I can get this info.
I can use excel and then import it to the Google doc, it doesn't matter for me how to do that.
enter image description here
You can try something like this:
Suppose you have your string in A1, then in B1 you can enter this:
=LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")))
and in C1 this:
=RIGHT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))+1)
you may need to do some adjustments if there are cases without numbers as it will produce an error, for example you can round it with an Iferror like this:
=IFERROR(LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))),A1)
Note: A1&"0123456789" is a 'trick' to avoid the search to return error, as the search is looking for all the numbers in the array; we just need the position of the first one, thus the MIN().
Supposing that your raw data were in A2:A, place this in B2:
=ArrayFormula(IFERROR(REGEXEXTRACT(A2:A,"(\D+)\.(.+)"),A2:A))
The regular expression reads "Extract any number of non-digits up to but not including a period as group one, and everything remaining into group two." (In other words, "As soon as you run into a digit after a period, start group two.")
The IFERROR clause means, "If this pattern can't be found, just return the original cell data."
Assuming your content is in column A (Google Sheets), try this arrayformula in any cell other than column A:
=arrayformula(iferror(split(REGEXREPLACE($A:$A,"(\.)(\d+.+$)",char(6655)&"$2"),char(6655)),))
There are two regex groups denoted in ():
(\.) and (\d+.+$).
The first group looks for a dot . - it's escaped using \. The second group looks for a number (0-9) \d, one or more occurrences + then ending with $ one or more + of any character ..
The replacement is char(6655) (wouldn't usually be found in your dataset), and the contents of group two $2.
Then the split function divides the text into two columns by the char(6655) character.
iferror returns nothing if nothing is split.
The arrayformula works down the sheet.

How to get the unique value from a cell which repeats multiple times in excel?

Hi All i have a data of around 50000 candidates and one of the column contains the subject like below .
I want a formula in Column B to get the Unique Value from column A.
If all the values in "," are same then i need the Single value else all the cell Data.(There may be multiple comma separated entries)
I tried find formula but is not working.
Thanks in Advance.
This will work for any number of comma-separated entries:
=IF(REPT(LEFT(A2,FIND(",",A2&",")),1+LEN(A2)-LEN(SUBSTITUTE(A2,",","")))=A2&",",LEFT(A2,FIND(",",A2)-1),A2)
Regards
Bit of a mess but this will work:
=IF(ISNUMBER(SEARCH(MID($A2,1,SEARCH(",",$A2,1)-1),$A2,LEN(MID($A2,1,SEARCH(",",$A2,1)-1)))),IF(ISNUMBER(SEARCH(MID($A2,1,SEARCH(",",$A2,1)-1),$A2,LEN(MID($A2,1,SEARCH(",",$A2,1)-1))*2)),MID($A2,1,SEARCH(",",$A2,1)-1),$A2),$A2)
If search finds the first word again past the length of it it checks again past double it's length. If that's a match it returns just that word and otherwise returns the entire cell.
I will keep trying to find a more elegant solution but for now, this'll do. Well providing you are always searching on an index of 3. If you aren't I can still do it but would like to use helper columns (or have a crack at VBA which is easily up to the task, I wan't to see effort first though as I won't code for free as a rule of thumb)

Count values after specific pattern in column occurs

My problem explained in Excel
Column A1:A22 have some binary numbers (0, 1). As you can see I highlighted numbers with GREEN fill color that match my pattern I want to find.
Column C5:C22 have formula as you can see in formula box, that CONCATENATE first four numbers ( A1:A4, A2:A5 etc..) in data set and check if it matches my pattern.
If this first four numbers match my pattern, I want Excel to count all NEXT numbers that are right after this pattern.
The biggest problem is that I can't do this this way because I have data set that have approximately 30.000 binary records in it, and my RAM memory can't handle that much of CONCANTENATE formula to count all NEXT values, after my pattern occurs.
I want someone to help me find other way without making HELPER columns, I want Excel formula, that in steps:
Search for pattern in data set..
IF pattern in data set matches my desired pattern, make AVERAGE of all values right after pattern occurs. So in example above my AVERAGE in cells C5:C22 = 0,66
I hope that I explain this in details so you know my problem, I need formula to do all the math stuff, I can't use helper columns like in example above.
Thanks in advance.
Concatenate once then Substitute Pattern to find
You can use CONCATENATE function in one cell, getting all the 1's and 0's into one cell.
Then SUBSTITUTE the Pattern to find with empty string "".
Then count how many Patterns were "Substitued" (Matched).
Like so:
Formulas in cells
D5-Concat all: =CONCAT(A1:A22)
E5-Len all: =LEN(D5)
F5-Substitute: =SUBSTITUTE(D5;Pattern;"")
G5-Len after: =LEN(F5)
H5-Matches: =(E5-G5)/PatternLen
To get it into one cell:
=(LEN(CONCAT(A1:A22))-LEN(SUBSTITUTE(CONCAT(A1:A22);Pattern;"")))/PatternLen

Using VLOOKUP or INDEX/MATCH to run a Lottery

I'm using Excel to run a sort of lottery.
The spreadsheet columns are set up thus:
COL1:Person Name; COL2: Chosen Number A; COL3: Chosen Number B; COL4: Chosen Number C
There is then a set of data, generated using RAND() and ROUND, that gives 3 winning numbers, each between 0 and 10.
What I'm trying to do is identify a winner, by using VLOOKUP or INDEX/MATCH, or some combination, or other function, to identify the winning person, so that there is a single cell that returns the name of the winner.
The added complexity is that by looking up each of the numbers individually by column, an individual selection of, say, 1,4,8, isn't a winning selection against a randomly selection of say, 4,8,1.
Ideas?
You can concatenate numbers to additionalcolumn so it will contain string "1,4,8," and then perform a VLOOKUP for concatenated in the same way winning numbers.
By the way, this solution will show only first person, but isn't it possible that several persons guessed same numbers and won?
If you want to generate a 3-digit number, by far the easiest thing to do is to use the formula
=RANDBETWEEN(0,999)
You can select the cells and then enter (via the format dialog accessible by right-clicking) the custom format 000 if you want it to display as 3 digits so that e.g. 7 displays as 007. This will allow you to directly use VLOOKUP on a single value. #kipar asks an excellent question about potential multiple winners.
I implemented the abovementioned solution and it was quite easy. After your 4 columns, you add one column with
=TEXT(B1;0)&TEXT(C1;0)&TEXT(D1;0)
which combines the number to one string. Then you put your winning number in a cell of preference mine was M28 and the value was 123. After your first five columns you use the following formula.
=IF.ERROR(INDEX(A$1:A$4;SMALL(IF(E$1:E$4=TEXT($M$28;0);ROW(E$1:E$4));RI-OW());1);"")
The IF.ERROR is used to put a blank when there are no multiple winners. The index is used to get the winners out of your first column, that'w why there's a one at the end. The small is used to find the first occurence of the winner. You also have to enter it as an array formula so press ctrl+shift+enter instead of just enter when the formula is completed. I hope this answer is satisfying.
PS. For extra information on the use of this function go here: http://chandoo.org/wp/2014/12/09/multiple-occurrences-lookup-and-extraction/

Count max value (text+number) occurrences with filtering a specific part of text in excel

I'm having an excel column range (including blank cells) something like:
00EGB00-GE001
00EGB00-GE001
00EGB00-GE001
00EGB00-GE001
00EGB00-GE002
00EGB00-GE002
00EGB00-GE002
00EGB00-GE002
00EGD20-GD101
What I need is to Count total number of similar values and I'm stuck with the logic for counting total unique "similar" values... example "GE" & "GD" separately.
How to count total number of unique "GE" values in the list?
I thought =COUNTIF(B:B,"*GE*") should work but it does not. It gives total count of "GE" but I need to find unique count. Example GE001 & GE002 should be considered as 2 values in total.
Kindly help
EDIT AGAIN: Given further clarification below, and assuming that the data always has the same number of digits, one way to do it is by putting this in Column B:
=RIGHT(A1,5)
Then, if you have Excel 2007 or up, Copy and Paste Values and use Remove Duplicates to leave you with the unique values. Then remove the items with GD, either manually or using a formula.
In this case, the output is:
GE001
GE002
In this case, you can easily see that it's 2. If you have lots of values, you can use COUNTA. Is that what you want?
YET ANOTHER EDIT BASED ON LAST COMMENT: this is probably getting closer:
=SUMPRODUCT(--(MID(A1:A9,9,2)="GE"),1/COUNTIF(A1:A9,A1:A9))
Where the "GE" is hard-coded in the formula above you could also substitute a cell reference where you can alter the value.
Or, if you don't know where the text you want will be exactly because the number of characters change, this will work (but you'd need to be careful with what you were searching on because it might repeat somewhere else in the string):
=SUMPRODUCT(--(ISERR(SEARCH("GE",A1:A9))<>TRUE),1/COUNTIF(A1:A9,A1:A9))
Again, you can replace the "GE" with a cell reference.
As discovered below, though -- blank cells will cause this to fail. There IS almost definitely a way to cater for them (maybe using a FREQUENCY based Array Formula), but if you can live with cleaning out the blank cells then that would be one way of doing it.
LAST EDIT: this will account for blank cells. It is an Array Formula, and CAN be used on whole columns, but that will be quite slow as it takes up a fair bit of calculation effort:
{=SUMPRODUCT(--(MID(A1:A9,9,2)="GE"),IF(ISBLANK(A1:A9),1,1/COUNTIF(A1:A9,A1:A9)))}
As it's an Array Formula, use Ctrl + Shift + Enter to input it.

Resources