I have the list with like 100,000 site link strings
Each link is unique, but it has consistent ?Item=
Then, it's either nothing or it continues after & symbol.
My question is: How do I pull out the item numbers?
I know replace function can offer similar functionality, but it works with Fixed sizes, in my case string can be different in size.
Link example:
www.site.com?sadfsf?sdfsdf&adfasfd?Item=JGFGGG55555
or
www.site.com?sadfsf?sdfsdf&adfasfd?Item=JGFGGG55555&sdafsdfsdfsdf
In both cases I need to get JGFGGG55555 only
If this always is the last portion of the string, you can use the following:
=MID(A1, FIND("?Item=", A1) + 6, 99)
This assumes:
no item numbers will be over 99 digits.
no additional fields follow the item number.
Edit:
With the update to your question, it is apparent you have some strings with additional data after the ?Item= field. Without using VBA there is not a simple means of using MID and FIND to extract this.
However you could create a column which acts as a placeholder.
For example, create a column using:
=MID(A1,FIND("?Item=",A1)+6,99)
This gets you the following value: JGFGGG55555&sdafsdfsdfsdf
Next, create a column using:
=IF(ISERROR(FIND("&",B2)),B2,LEFT(B2,FIND("&",B2)-1))
This produces: JGFGGG55555 by searching the first value for a & and using the portion before it. If it is not found, the first value is simply repeated.
This formula should work for both the examples given:
=MID(A1,FIND("=",A1),IFERROR(LEN(A1)-FIND("&",A1,FIND("=",A1))-1,LEN(A1)+1-FIND("=",A1)))
Related
I'm building a monitoring system that takes a log (where people register their work in a set format) and returns a counter, which I can use for analysis. The monitor and log are two separate workbooks. The log has entries like this: INITALS;DATE;HOUR:RESULT|
Each cell can contain multiple entries.
My first attempt was to do a simple countif and look for a string (note that I use ; instead of , in formulas since I work on a Dutch excel):
=COUNTIF('LOCATION'!Table[LOG];"*NB;??/??/????;??:??:#A*|*")
This worked fine, but the formula only counted the number of cells where this string was present, not the actual number of occurences. I then tried this solution.
=SUM(LEN('LOCATION'!Tabel13[LOG])-LEN(SUBSTITUTE('LOCATION'!Tabel13[LOG];"NB";"")))
This indeed counted the number of times "NB" was present in the LOG. However, when I tried to use the original search string, this solution stopped working:
=SUM(LEN('LOCATION'!Tabel13[LOG])-LEN(SUBSTITUTE('LOCATION'!Tabel13[LOG];"*NB;??/??/????;??:??:#A*|*";"")))
It seems to me that SUM does not recognize symbols like ? or * which are necessary to define the correct search string. Where did I go wrong? Or can this be solved in another way? I can still look into VBA, but the workbooks are slow as hell already.
"?" and "*" are wildcards. Some functions support these (like COUNTIFS()) where others don't. Like you found out, SUBSTITUTE() does not.
Here is one way to count, assuming ms365:
Formula in C1:
=REDUCE(0,A1:A2,LAMBDA(a,b,a+LET(X,SEQUENCE(LEN(b)),SUM(--(IFERROR(SEARCH("NB;??/??/????;??:??:#A*|*",b,X),0)=X)))))
Note: I removed the asterisk in front of "NB" just to make searching for a position valid in comparison to what i called variable "X".
I have the column as you can see in this image.
The Distributor Address is in the format: "Street Address, Postcode, State".
I need to retrieve only "Postcode" and "State" and combine them.
The new column should be like this ➡"NSW2007","VIC3182"...
How could I retrieve the specific letters and combine them?
You can use FILTERXML if you have the newest version of Excel. See Excel - Extract substring(s) from string using FILTERXML for an excellent overview.
In this case, something like the below should work:
EDIT: overlooked the specific format you wanted
=FILTERXML("<t><s>"&SUBSTITUTE(A1,",","</s><s>")&"</s></t>","//s[3]")&
FILTERXML("<t><s>"&SUBSTITUTE(A1,",","</s><s>")&"</s></t>","//s[2]")
Well, here is a different way:
=RIGHT(A2,3)&","&TRIM(MID(A2,FIND(",",A2,1)+2,LEN(A2)-FIND(",",A2,FIND(",",A2,1)+1)))
This does rely on the state being the last 3 characters and the postcode after the first comma...
There are several ways of solving that. I will show a solution using a few basic functions: RIGHT, MID and LEN.
Assuming your data is on A1:
=RIGHT(A1,3)&MID(A1, LEN(A1)-8, 4)
RIGHT returns the 3 last characters from the cell A1.
MID returns the middle of the cell given a starting position and a length.
LEN gives you the starting position of the zip code, which is always the length of the string - 8 in your table.
Is it essential that your data is well organized as in the image to work well. One alternative is finding the ", " as below.
=RIGHT(A1,3)&MID(A1, FIND(", ",A1)+2,4)
I have a string in a cell which is a list of words like this:
a,b,c,d,e,f,g,h,f,c,a,h
I'd like to find and remove all the words which are on this list:
c,f,h
I can use multiple SUBSTITUTE but since my list is long, nested SUBSTITUTE is not convenient.
Here is the example sheet: https://docs.google.com/spreadsheets/d/11ZT3CMJ9vfUcOIvhXYLHUHHQm5tm1Tb1N9hi3LeI0WM/edit#gid=0
There is also a solution here but I'd like to use an existing list instead of manually inserting to the function.
You can use an existing list instead of manually inserting to the function.
Make a list of the words you want replaced in a column eg. A3:A9 and use the following formula to create your expression.
=JOIN("|",A3:A9)
Then use the working solution from your mentioned answer (or whichever formula meets your needs) to get the final result.
=REGEXREPLACE(A1,"\b("&B3&")\b",)
This is a second version based on suggestions from #marikamitsos
, I have added a string ", .. $" to the last partern position because there is no comma at the end of the text in a1.
=RegexReplace(RegexReplace(A1&",","\b(" & SUBSTITUTE(A3&",",",",",|") & ")(\b|$)",""),",$","")
In addition to previous answer, you can also try
=textjoin(",", 1, ArrayFormula(regexreplace(split(A2, ","), textjoin("|",1,A5:A8),)))
Although this question has been asked and answered, (Stack Overflow is where I learned how to implement SP), an issue has come up which I can't figure out.
I'm using SP to sum shipments within a pivot table using a product number (with wild-cards), and a specific date. For instance, part numbers can be "AX10235-HP", "AX11135-HP", "AX10235-HP2", "AX10235-HPSPARE" or TP10101-IBM. (There are a large variety of numbers.)
So in this case I want to sum the qty shipments of "AX???35-HP". I wish to sum just the first 2 parts in my short list. However, the command used causes all the parts to sum except the *-IBM number; as if there was a wild-card at the end of the number. In other words "AX???35-HP" is the same as "AX???35-HP*". I've tried wrapping the value in quotes but it takes uses the quotes literally so fails.
This is the function
SUMPRODUCT((S_PART_DATA)*(ISNUMBER(SEARCH($A6,S_PART_RANGE))*(S_PART_DATES=T$4)))
S_PART_DATA array of Shipments,
S_PART_RANGE array of list of part numbers,
S_PART_DATES array of Dates shipments were made
It works the way you describe because SEARCH function finds $A6 within other text, hence it may not be an exact match - better to use SUMIFS function like this:
=SUMIFS(S_PART_DATA,S_PART_RANGE,$A6,S_PART_DATES,T$4)
Assuming all named ranges are the same size and A6 contains the value AX???35-HP
If that doesn't work try this version
=SUMPRODUCT(S_PART_DATA*ISNUMBER(SEARCH("^"&$A6&"^","^"&S_PART_RANGE&"^"))*(S_PART_DATES=T$4))
concatenating the ^ values means you will [probably] only get exact matches
I have 2 column data in Excel like this:
Can somebody help me write a formula in column C that will take the first name or the last name from column A and match it with column B and paste the values in column C. The following picture will give you the exact idea what I am trying to do. Thanks
Since your data is not "regular", you can try this formula which uses wild card to look for just the last name.
=INDEX($B$1:$B$4,MATCH("*" &MID(A1,FIND(" ",A1)+1,99)&"*",$B$1:$B$4,0))
It would be simpler if the first part followed some rule, but some have the first initial of the first name at the beginning; and others at the end.
Edit: (explanation added)
The FIND returns the character number of the first space
Add 1 to that to get the character number of the next word
Use MID to extract that word
Use that word in MATCH (with wild-cards before and after), to find it in the array of email addresses. This will return it's position in the array (row number)
Use that row number as an argument to the INDEX function to return the actual email address.
If you want to first examine the email address, you will need to determine which of the letters comprise the last name. Since this is not regular according to your example, you will need to check both options.
You will not be able to look for the first name from the email, as it is not present.
If you can guarantee that the first part will be follow the rule of one or the other, eg: either
FirstInitialLastName or
LastNameFirstInitial
Then you can try this:
=IFERROR(INDEX($B$1:$B$4,MATCH(MID(A1,FIND(" ",A1)+1,99)& LEFT(A1,1) &"*",$B$1:$B$4,0)),
INDEX($B$1:$B$4,MATCH( LEFT(A1,1)&MID(A1,FIND(" ",A1)+1,99) &"*",$B$1:$B$4,0)))
This seems to do what you want.
=IFERROR(VLOOKUP(LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))&"*",$B$1:$B$4,1,FALSE),VLOOKUP(LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&"*",$B$1:$B$4,1,FALSE))
Its pretty crazy long and would likely be easier to digest and debug broken up into columns instead of one huge formula.
It basically makes FLast and FirstL out of the name field by splitting on the space.
LastF:
=LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))
And FirstL:
=LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))
We then make 2 vlookups for these by using wildcards:
LastF:
=VLOOKUP([lastfirst equation from above]&"*",$B$1:$B$4,1,FALSE)
And FirstL:
=VLOOKUP([firstlast equation from above]&"*",$B$1:$B$4,1,FALSE)
And then wrap those in IfError so they both get tried:
=IfError([firstLast vlookup],[lastfirst vlookup])
The rub is that's going to be hell to edit if you ever need to, which is why I suggest doing each piece in another column then referencing the previous one.
You also need to be aware that this answer will get tripped up by essentially the same name - e.g. Sam Smith and Sasha Smith would both match whatever the first entry for ssmith was. Any solution here will likely have the same pitfall.