Numeric String extraction - excel

I am trying to extract a sub-string from a string. The strings are currently in an excel column, row by row and are like this:
ABC 54 SOMETHING 11165 POP 1234567890
SOMETHING ABC/W 05/1234500022385
SomethingW1234500006840Abc05 d 13/1/15
What I want is to extract any 5 or 13-digit number from each row string.
I have come up with this algorithm for the job:
1) Enter line
2) Scan string
3) If numeric/integer found, check length from start to end of numeric string
4) If length = 5 or if length = 13, output only numeric string to next column
5) Enter new line...
6) Continue 1 - 5 Till the data set is exhausted
Is there a function in excel that can do this?
P.S: I am open to learn any language/tool that can get the job done.

It might be easier than you are making it. If I were you, I'd update that question to give unambiguous pairs of inputs and desired outputs. And I would take a good hard look at the accepted answer to this possibly similar question as it looks like it could be useful. Undoubtedly, someone will come up with a more beautiful regex for you, but here is an idea that might work..

Related

Trying to increment a 4 character alphanumeric code in Excel

I'm trying to create a CSV file of one of my customer's serial numbers. We print them as barcodes for them to use, and normally I'd use our barcode software to generate the numbers. However, we're using a different method of printing, and it requires a CSV/Excel file of all the numbers to be printed. The barcode is as follows:
MC100VGVA.
The last digit is a check digit created from the rest of the string.
Now, my problem comes with the "VGVA" bit. Column A is the prefix (MC), Column B is the number (100), Column C is the incrementing 4 characters (VGVA), and Column D is the check digit.
I need for the VGVA bit to increment alphanumerically. So, when it gets to VGVZ, I need it to go to VGW0. Then when it gets to VGZZ, it needs to go to VH00 and so on until they reach ZZZZ, in which the next digit would increase Column B to 101, and Column C would become 0000.
I've attempted to use the CHAR formula, as well as CONCATENATE, and MID. But, because I'm not well versed in these formulas, my attempts at editing them to work with 4 digits have been failing me.
I'm not opposed to using VBA if needed, but it's not something I've ever worked with, so you'll have to forgive any ignorance on my part.
Please let me know if you need more information. Thanks!
It looks like you are trying to create a new base, the one based on 27 digits (0 and all letter from 'A' to 'Z'). So I'd advise you to create a conversion from and to 27-digit system.
Let me first explain you what I mean in octal numbering (8 digits, from 0 to 7): in that system we start from (just some examples):
a=0011
b=1237
c=1277
The meaning of those numbers is:
a equals 0*8^3 + 0*8^2 + 1*8^1 + 1*8^0 = 9, so:
a+1 equals 10, and converting this to octal numbering yields:
0012
b equals 1*8^3+2*8^2+3*8^1+7*8^0 = 671, so:
b+1 equals 672, and converting this to octal numbering yields:
1240
c equals 1*8^3 + 2*8^2 + 7*8^1 + 7*8^0 = 703, so:
c+1 equals 704, and converting this to octal numbering yields:
1300
I propose to do exactly the same for your 27-digit system, with following example:
VGZZ equals 22*27^3 + 7*27^2 + 26*27^1 + 26 = 438857
VGZZ+1 equals 438858, and converting this to 27-digit numbering yields:
VH00
You can do this, using a VBA function you need to develop yourself. The converting from the string to the normal number is obvious, and in the other way around, you use =MOD(...,27^3) and other similar functions.
I believe I've found a non-VBA answer to this question, thanks to someone on another forum.
Here's what they suggested and it seems to be working perfectly:
B2
=B1+(C2="0000")
C2
=RIGHT(BASE(DECIMAL(C1,36)+1,36,4),4)
and maybe try this at D1
=MID("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ-. $/+%",MOD(SUMPRODUCT(SEARCH(MID((A1&B1&C1),ROW($1:$99),1),
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ-. $/+%") )-99,43)+1,1)

Convert number to string in Excel

I'm trying to do some transformation with numbers in excel. First i have that table:
And as you can see, i have Random Digits, which is generated by using RANDBETWEEN. Now i want the Column Type, to be automatically Generated. So for example if Random Digits is:
From 1 - 35 = Good
36 - 80 = Fair
81 - 100 = Poor
I was already trying with IF function, but with if function i'm able to generate only 2 values and not 3.
Thank you for answers.
INDEX and MATCH are a good way to avoid nesting lots of IF statements (generally to be avoided!):
=INDEX({"Good","Fair","Poor"},MATCH(B2,{0,36,81},1))
If you really wanted to use an IF statement, it would look like this:
=IF(B2<36,"Good",IF(B2<81,"Fair","Poor"))
Nest the If so where you get the true value just output what you need but if its false then just write another if statement...
Use one IF inside another IF like this:
=if('From 1 - 35';'thing to do if is true';if('36 - 80';'thing to do if is true';'thing to do when is 81 - 100'))
The excel formula you are looking for is
=IF(B1>100,"error",IF(B1>=81,"Poor",IF(B1>=36,"Fair",IF(B1>=1,"Good","error"))))
This will display the word "error" if you range is >100 or <1. Other answers have failed to address the cases where the number is >100 or <1, as the question specifically bounds the set of responses to be between 1 and 100.
The formula works as a nested if statement. In pseudo code the formula is equivalent to:
if(B1>100)
then "Error"
Else if (B1>=81)
then "Poor"
Else if (B1>=36)
then "Fair"
Else if (B1>=1)
then "Good"
else
"Error"

How do I sum data based on a PART of the headers name?

Say I have columns
/670 - White | /650 - black | /680 - Red | /800 - Whitest
These have data in their rows. Basically, I want to SUM their values together if their headers contain my desired string.
For modularity's sake, I wanted to merely specify to sum /670, /650, and /680 without having to mention the rest of the header text.
So, something like =SUMIF(a1:c1; "/NUM & /NUM & /NUM"; a2:c2)
That doesn't work, and honestly I don't know what i should be looking for.
Additional stuff:
I'm trying to think of the answer myself, is it possible to mention the header text as condition for ifs? Like: if A2="/650 - Black" then proceed to sum the next header. Is this possible?
Possibility it would not involve VBA, a draggable formula would be preferable!
At this point, I may as well request a version which handles the complete header name rather than just a part of it as I believe it to be difficult for formula code alone.
Thanks for having a look!
Let me know if I need to elaborate.
EDIT: In regards to data samples, any positive number will do actually, damn shame stack overflow doesn't support table markdown. Anyway, for example then..:
+-------------+-------------+-------------+-------------+-------------+
| A | B | C | D | E |
+---+-------------+-------------+-------------+-------------+-------------+
| 1 |/650 - Black |/670 - White |/800 - White |/680 - Red |/650 - Black |
+---+-------------+-------------+-------------+-------------+-------------+
| 2 | 250 | 400 | 100 | 300 | 125 |
+---+-------------+-------------+-------------+-------------+-------------+
I should have clarified:
The number range for these headers would go from /100 - /9999 and no more than that.
EDIT:
Progress so far:
https://docs.google.com/spreadsheets/d/1GiJKFcPWzG5bDsNt93eG7WS_M5uuVk9cvkt2VGSbpxY/edit?usp=sharing
Formula:
=SUMPRODUCT((A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($H$1)=4,$H$1&"",$H$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($I$1)=4,$I$1&"",$I$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))))
Apparently, each MID function is returning false with each F9 calculation.
EDIT EDIT:
Okay! I found my issue, it's the /being read when you ALSO mentioned that it wasn't required. Man, I should stop skimming!
Final Edit:
=SUMPRODUCT((RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match5)=4,Match5&"",Match5&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match6)=4,Match6&"",Match6&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match7)=4,Match7&"",Match7&" ")))
The idea is that Header and RETURNSUM will become match criteria like the matches written above, that way it would be easier to punch new criterion into the search table. As of the moment, it doesn't support multiple rows/dragging.
I have knocked up a couple of formulas that will achieve what you are looking for. For ease I have made the search input require the number only as pressing / does not automatically type into the formula bar. I apologise for the length of the answer, I got a little carried away with the explanation.
I have set this up for 3 criteria located in J1, K1 and L1.
Here is the output I achieved:
Formula 1 - SUMPRODUCT():
=SUMPRODUCT((A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($K$1)=4,$K$1&"",$K$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($L$1)=4,$L$1&"",$L$1&" "))))
Sumproduct(array1,[array2]) behaves as an array formula without needed to be entered as one. Array formulas break down ranges and calculate them cell by cell (in this example we are using single rows so the formula will assess columns seperately).
(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))
Essentially I have broken the Sumproduct() formula into 3 identical parts - 1 for each search condition. (A4:G4*: Now, as the formula behaves like an array, we will multiply each individual cell by either 1 or 0 and add the results together.
1 is produced when the next part of the formula is true and 0 for when it is false (default numeric values for TRUE/FALSE).
(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))
MID(text,start_num,num_chars) is being used here to assess the 4 digits after the "/" and see whether they match with the number in the 3 cells that we are searching from (in this case the first one: J1). Again, as SUMPRODUCT() works very much like an array formula, each cell in the range will be assessed individually.
I have then used the IF(logical_test,[value_if_true],[value_if_false]) to check the length of the number that I am searching. As we are searching for a 4 digit text string, if the number is 4 digits then add nothing ("") to force it to a text string and if it is not (as it will have to be 3 digits) add 1 space to the end (" ") again forcing it to become a text string.
The formula will then perform the calculation like so:
The MID() formula produces the array: {"650 ","670 ","800 ","680 ","977 ","9999","143 "}. This combined with the first search produces {TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} which when multiplied by A4:G4
(remember 0 for false and 1 for true) produces this array: {250,0,0,0,0,0,0} essentially pulling the desired result ready to be summed together.
Formula 2: =SUM(IF(Array)): [This formula does not work for 3 digit numbers as they will exist within the 4 digit numbers! I have included it for educational purposes only]
=SUM(IF(ISNUMBER(SEARCH($J$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($K$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($L$1,$A$1:$G$1)),A8:G8))
The formula will need to be entered as an array (once copy and pasted while still in the formula bar hit CTRL+SHIFT+ENTER)
This formula works in a similar way, SUM() will add together the array values produced where IF(ISNUMBER(SEARCH() columns match the result column.
SEARCH() will return a number when it finds the exact characters in a cell which represents it's position in number of characters. By using ISNUMBER() I am avoiding having to do the whole MID() and IF(LEN()=4,""," ") I used in the previous formula as TRUE/FALSE will be produced when a match is found regardless of it's position or cell formatting.
As previously mentioned, this poses a problem as 999 can be found within 9999 etc.
The resulting array for the first part is: {250,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} (if you would like to see the array you can highlight that part of the formula and calculate with F9 but be sure to highlight the exact brackets for that part of the formula).
I hope I have explained this well, feel free to ask any questions about stuff that you don't understand. It is good to see people keen to learn and not just fishing for a fast answer. I would be more than happy to help and explain in more depth.
I start this solution with the names in an array, you can read the header names into an array with not too much difficulty.
Sub test()
Dim myArray(1 To 4) As String
myArray(1) = "/670 - White"
myArray(2) = "/650 - black"
myArray(3) = "/680 - Red"
myArray(4) = "/800 - Whitest"
For Each ArrayValue In myArray
'Find position of last character
endposition = InStr(1, ArrayValue, " - ", vbTextCompare)
'Grab the number section from the string, based on starting and ending positions
stringvalue = Mid(ArrayValue, 2, endposition - 2)
'Convert to number
NumberValue = CLng(stringvalue)
'Add to total
Total = Total + NumberValue
Next ArrayValue
'Print total
Debug.Print Total
End Sub
This will print the answer to the debug window.

can someone suggest an idea on printing blanks in an xls file?

still fairly new to matlab, picked up this data analysis code from someone and I had to add in new functions.
for one function I'm calculating the average of every 3 entries in one column and print the result on another column. so it would be something like this
1 -1
3 -1
5 =(1+3+5)/3
7 -1
1 -1
1 =(7+1+1)/3
4 -1
what I wish to do is to print a blank in the cells that have -1. my first thought was to just assign string values to my results instead of ints. this didn't work because I think there is a line of code in there somewhere that converts everything to ints.
another possible solution is just to reopen the file and loop through all cells replacing any -1's with blank strings, though I'm not sure how to do this, and it's inefficient.
as last resort, I guess I can always tell the user of this xls sheet to use the find/replace function in excel before processing it.
edit: partial code of the save part:
data = [data.time, data.avg_time'];
data2 = num2cell(data);
data3 = {'t', 'avg t'};
data = [data3; data2];
xlswrite([filename, '.xls'], data);
I misunderstood your question (i thought of replacing NaN's with -1, thanks Amro).
You can use this:
A(A(:,2)==-1,2)=NaN
where A is the matrix you created first.
Hope it helps you :)

Excel 2007 - Generate unique ID based on text?

I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.
It would be best if I didn't have to use VBA really. But if I have to, so be it.
Solution Without VBA.
Logic based on First 8 characters + number of character in a cell.
= CODE(cell) which returns Code number for first letter
= CODE(MID(cell,2,1)) returns Code number for second letter
= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0
= LEN(cell) number of character in a cell
Concatenating firs 8 codes + adding length of character on the end
If 8 character is not enough, then replicate additional codes for next characters in a string.
Final function:
=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)
Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.
Yet, here is my solution, based on a UDF (Used-Defined Function):
Put the code in a module:
Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
Dim sLetter As String
Dim i As Integer
For i = 1 To Len(sName)
genId = Asc(Mid(sName, i, 1)) * i + genId
Next i
End Function
And call it in your worksheet like a formula:
=genId(A1)
[EDIT] Added the * i to take into account the order. It works on my unit tests
May be OTT for your needs, but you can use a call to CoCreateGuid to get a real GUID
Private Declare Function CoCreateGuid Lib "ole32" (ID As Any) As Long
Function GUID() As String
Dim ID(0 To 15) As Byte
Dim i As Long
If CoCreateGuid(ID(0)) = 0 Then
For i = 0 To 15
GUID = GUID & Format(Hex$(ID(i)), "00")
Next
Else
GUID = "Error while creating GUID!"
End If
End Function
Test using
Sub testGUID()
MsgBox GUID
End Sub
How to best implement depends on your needs. One way would be to write a macro to get a GUID populate a column where names exist. (note, using it as a udf as is is no good, since it will return a new GUID when recalculated)
EDIT
See this answer for creating a SHA1 hash of a string
Do you just want an incrementing numeric id column to sit next to your values? If so, and if your values will always be unique, you can very easily do this with formulae.
If your values were in column B, starting in B2 underneath your headers for example, in A2 you would type the formula "=IF(B2="","",1+MAX(A$1:A1))". You can copy and paste that down as far as your data extends, and it will increment a numeric identifier for each row in column B which isn't blank.
If you need to do anything more complicated, like identify and re-identify repeating values, or make identifiers 'freeze' once they're populated, let me know. Currently, when you clear or add values to your list the identifers will toggle themselves up and down, so you need to be careful if your data changes.
Unique identifier based on the number of specific characters in text. I used an identifier based on vowels and numbers.
=LEN($J$14)-LEN(SUBSTITUTE($J$14;"a";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"e";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"i";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"j";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"o";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"u";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"y";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"1";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"2";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"3";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"4";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"5";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"6";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"7";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"8";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"9";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"0";""))
You say you are confident that there are no duplicate values in your words. To push it further, are you confident that the first 8 characters in any word would be unique?
If so, you can use the below formula. It works by individually taking each character's ASCII code - 40 [assuming normal characters, this puts numbers at between 8 & 57, and letters at between 57 & 122], and multiplying that characters code by 10 ^ [that character's digit placement in the word]. Basically it takes that character code [-40], and concatenates each code onto the next.
EDIT Note that this code no longer requires that at least 8 characters exist in your word to prevent an error, as the actual word to be coded has 8 "0"'s appended to it.
=TEXT(SUM((CODE(MID(LOWER(RIGHT(REPT("0",8)&A3,8)),{1,2,3,4,5,6,7,8},1))-40)*10^{0,2,4,6,8,10,12,14}),"#")
Note that as this uses the ASCII values of the characters, the ID # could be used to identify the name directly - this does not really create anonymity, it just turns 8 unique characters into a unique number. It is obfuscated with the -40, but not really 'safe' in that sense. The -40 is just to get normal letters and numbers in the 2 digit range, so that multiplying by 10^0,2,4 etc. will create a 2 digit unique add-on to the created code.
EDIT FOR ALTERNATIVE METHOD
I had previously attempted to do this so that it would look at each letter of the alphabet, count the number of times it appears in the word, and then multiply that by 10*[that letter's position in the alphabet]. The problem with doing this (see comment below for formula) is that it required a number of 10^26-1, which is beyond Excel's floating point precision. However, I have a modified version of that method:
By limiting the number of allowed characters in the alphabet, we can get the max total size possible to 10^15-1, which Excel can properly calculate. The formula looks like this:
=RIGHT(REPT("0",15)&TEXT(SUM(LEN(A3)*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1),""))*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}),"#"),15)
[The RIGHT("00000000000000"... portion of the formula is meant to keep all codes the same number of characters]
Note that here, Alphabet is a named string which holds the characters: "abcdehilmnorstu". For example, using the above formula, the word "asdf" counts the instances of a, s, and d, but not 'f' which isn't in my contracted alphabet. The code of "asdf" would be:
001000000001001
This only works with the following assumptions:
The letters not listed (nor numbers / special characters) are not required to make each name unique. For example, asdf & asd would have the same code in the above method.
And,
The order of the letters is not required to make each name unique. For example, asd & dsa would have the same code in the above method.

Resources