I have a spreadsheet with hundreds of URLs. Each URL has a search-engine friendly name in it, followed by a numeric ID, with a "/1" at the end. here's an example:
http://www.somesite.com/directory/varying-file-names-Guide/431/1
in this case, the 431 is the numeric ID. I'm trying to extract this numeric ID into its own column, but my lack of Excel knowledge is definitely holding me back. I've found a few examples, but my attempts to customize these example to match my needs results in errors. In all cases, the value I need to extract will always be between "-Guides/" and "/1" in the URL.
any help would be greatly appreciated.
thanks!
Assuming your URL in A1 try this formula in B1 to extract that number
=REPLACE(LEFT(A1,LEN(A1)-2),1,SEARCH("Guide/",A1)+5,"")
assumes "Guide" not "Guides" - if it's "Guides" then the 5 needs to be a 6.....,i.e.
=REPLACE(LEFT(A1,LEN(A1)-2),1,SEARCH("Guides/",A1)+6,"")
...or a different approach that will extract any number that ends at the 3rd charcater from the end
=LOOKUP(10^5,RIGHT(LEFT(RIGHT(A1,7),5),{1,2,3,4,5})+0)
extracts up to 5 digits
This formula should work regardless of the word preceding the numeric ID. Assuming your URL is in cell A1,
In cell B1:
=SUBSTITUTE(LEFT(A1, LEN(A1) - 2), "/","||", LEN(LEFT(A1, LEN(A1) - 2)) - LEN(SUBSTITUTE(LEFT(A1, LEN(A1) - 2), "/","")))
In cell C1:
=RIGHT(B1,LEN(B1)-FIND("||",B1)-1)
Thanks to everyone who answered. A co-worker of mine also came up with a different approach that also worked.
=MID(A6,FIND("uide/",A6,1)+5,LEN(A6)-FIND("tion/",A6,1)-6)
I thought I'd include this for anyone else facing the same issue. Interesting to see the various ways people approach the problem.
Related
I'm tasked with going through a long column of instagram profile URLs and creating a new adjacent column comprised of just their usernames. I could theoretically go through the list individually, copy-pasting the part between ".com/" and the last "/" and then hyperlinking each of them, but I feel like there might be a faster way.
I've experimented with formulas trying to extract only the username but to no avail. I also realized formula cells can't be hyperlinked, so I would also need a solution for that. Here what I was trying so far:
Here the input and expected output:
URL
User Name
http://instgram.com/stack_overflow/
stack_overflow
http://instgram.com/stackoverflowing/
stackoverflowing
http://instgram.com/stackoverflowthestack/
stackoverflowthestack
http://instgram.com/stackoverflowingstacks/
stackoverflowingstacks
The end result should look like User Name Column and be adjustable for any length of usernames (slashes and spaces among other special characters cannot be used in instagram usernames).
Also, I'm unsure why my google docs takes semi-columns instead of commas as I'm used to with Excel, but it is what it is.
Figuring this out would save me loads of time in the long-run and I would be very appreciative.
If you can use TEXTAFTER (Office Insider Beta only, Windows: 2203 (Build 15104), Mac: 16.60 (220304)). You don't have to deal with LEFT/RIGHT/FIND functions. On B2 cell you enter and expand down the following formula:
=SUBSTITUTE(TEXTAFTER(A1,"/",-2),"/","")
Here is the output:
You can replace SUBSTITUTE as follow via TEXTBEFORE to remove the last /:
=TEXTBEFORE(TEXTAFTER(A1,"/",-2),"/")
Explanation
The main idea is that:
TEXTAFTER(text,delimiter,[instance_num], [match_mode],
[match_end], [if_not_found])
allows to search backward, using the third input argument: instance_num (with a negative number). We are interested in the penultimate occurrence of /, therefore this input argument would be -2.
Alternative Solution
If such functions are not available for your excel version the following works using MID(text, start_num, num_chars) function:
=LET(url, A1, length, LEN(url), count, length-LEN(SUBSTITUTE(url,"/",""))-1,
startPos, FIND(" ",SUBSTITUTE(url,"/"," ",count))+1, numChars, length-startPos,
MID(url,startPos, numChars))
Note: We use LET to avoid repeating the same calculation and also to make it easier to understand. Without LET it's a bigger formula but it works too:
=MID(A1, FIND(" ",SUBSTITUTE(A1,"/"," ",
LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))-1))+1, LEN(A1) -
(FIND(" ",SUBSTITUTE(A1,"/"," ",LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))-1))+1))
we are looking for the penultimate / so count name achieves that:
count, length-LEN(SUBSTITUTE(url,"/",""))-1
The following formula:
startPos, FIND(" ",SUBSTITUTE(url,"/"," ",count))+1
is a way to replace just the penultimate / by space ( ). URLs don't have spaces so it is a good replacement. Then finding the position of such space plus 1 will give us the starting position required by MID. Having the starting position and the length of the URL we can calculate the number of characters (numChars), so we can finally invoke MID with the required input arguments.
Note: The approach you tried in your question, relies on specific pattern of the URL (ending with .com) the above approaches don't have such constraint, so you can use the following URL: https://www.redcross.org/ and it works.
Since I've exhausted about every resource I could find in regards to this question, I figured it was finally time to ask this community.
I have a very large (15k+ row) dataset that I'm looking to generate a report on giving the top 25 largest values based on one of the columns, HOWEVER, there is additional criteria that needs to be considered other than just the values in one column. I have done this already with less criteria, but adding more is giving me trouble.
My (working) formula for Top N with some criteria:
{=LARGE(IF('IMPORTED DATA'!$X$4:$X$1048576 = IF('Data Cleanup'!$AX$3 = 1, "Gaming Designed", "Not Gaming Designed"), 'IMPORTED DATA'!$BH$4:$BH$1048576), ROW(A2) - ROW(A$1))}
The issue comes when I have another criteria I need to add that uses wildcard characters to distinguish the 'correct' criteria. Here is what I've come up with so far, but this just results in the COUNTIF portion always resulting in true, so not actually applying the added criteria:
{=LARGE(IF(COUNTIF('IMPORTED DATA'!$P$6:$P$1048576, IF('Data Cleanup'!$AX$3 = 1, "?????", "????")) * ('IMPORTED DATA'!$X$6:$X$1048576 = IF('Data Cleanup'!$AX$3 = 1, "Gaming Designed", "Not Gaming Designed")) * ('IMPORTED DATA'!$E$6:$E$1048576 <> "All Other (Suppressed)"), 'IMPORTED DATA'!$BH$6:$BH$1048576), ROW(A2) - ROW(A$1))}
I tried to work-around IF statements not accepting wildcard characters using the COUNTIF method but to no avail.
I understand that this is a bit of a rough question, but I'll do my best to respond to as many questions as I can to help clarify.
A couple more bits of information that may be helpful:
This is entirely based in Excel 2019, I know that FILTER would be an easy solution, but I don't have access to that in this version of excel.
The reason for using wildcards is because it was the easiest way to distinguish between the two categories to sort: above or below 100Hz. Anything under 100Hz will be 4 characters, while anything above will be 5.
I also need other data from the same row as the results, so any methods must be also applicable to MATCH criteria so that I can look up the rest of the data with the same search parameters.
its very hard to understand without seeing the data.
What i understood is that if you make a helper column in the dataset as per the criteria you want that would solve your problem.
at least thats how i am also using.
You need to create a ranking column in the data sheet.
Ranking Formula = =COUNTIFS($M$3:$M$233,">="&M3,$K$3:$K$233,K3)
with thise formula you can add as many as criteria as you want.
Index Formula = =INDEX($K$3:$K$233,MATCH(1,($K$3:$K$233=$B$1)*($N$3:$N$233=A3),0))
you need to change the columns names you want.
no need row() functions just try always to use simple sequence will work
Good luck
Ended up solving this in a very simple method thanks to Scott Craner's comment.
Since wildcards don't work in if statements, using LEN did the trick. Final formula ended up being:
{=LARGE(IF(LEN('IMPORTED DATA'!$P$6:$P$30000)=IF('Data Cleanup'!$AX$3 =1,5,4) * ('IMPORTED DATA'!$X$6:$X$30000 = IF('Data Cleanup'!$AX$3 = 1, "Gaming Designed", "Not Gaming Designed")) * ('IMPORTED DATA'!$E$6:$E$30000 <> "All Other (Suppressed)"), 'IMPORTED DATA'!$BH$6:$BH$30000), ROW(A2) - ROW(A$1))}
Thank you to everyone for your help!
I posted this Qn at
https://www.mrexcel.com/forum/excel-questions/1029120-sumporduct-matched-criteria-extracted-text.html?posted=1#post4939092
but nobody was able to answer it.
G3=SUMPRODUCT(($B$1:$E$1=mid(G1,3,4))*($B$2:$E$2=G2)*($B$3:$E$6))
Basically using mid(G1,3,4) is causing the error. How do I solve it?
Thank you
Its not entirely clear what you are trying to achieve, but if your MID expression is supposed to extract and compare the year you have 2 problems:
1) its not extracting the year correctly
2) you are comparing a text value with a numeric value
Try using VALUE(MID(G1,1,4)))
I have a large spreadsheet with column data like:
ABC:1:I.0
ABC:1:I.1
ABC:1:I.2
ABC:1:I.3
ABC:2:I.0
ABC:2:I.1
ABC:2:I.2
ABC:2:I.3
ABC:3:I.0
ABC:3:I.2
ABC:3:I.3
ABC:4:I.0
ABC:4:I.1
ABC:4:I.2
ABC:4:I.3
ABC:5:I.0
ABC:5:I.1
ABC:5:I.2
ABC:5:I.3
ETC.
I need to replace the above with the following:
ABC:I.Data[1].0
ABC:I.Data[1].1
ABC:I.Data[1].2
ABC:I.Data[1].3
ABC:I.Data[2].0
ABC:I.Data[2].1
ABC:I.Data[2].2
ABC:I.Data[2].3
ABC:I.Data[3].0
ABC:I.Data[3].2
ABC:I.Data[3].3
ABC:I.Data[4].0
ABC:I.Data[4].1
ABC:I.Data[4].2
ABC:I.Data[4].3
ABC:I.Data[5].0
ABC:I.Data[5].1
ABC:I.Data[5].2
ABC:I.Data[5].3
ETC.
Here is a sample of the data, most of the data follows a similar format with the exception of the naming "ABC", which can vary in size, so it might be "ABCD" and also with the exception of the letter "I", it can be "O" as well. Also, some might be missing some values such as ABC:3:I.1 which is missing from the data. I am not too familiar with excel formulas or VBA code. Does anyone know how to do this? I have no preference on which method it has to be done in as I don't mind learning some VBA code if someone provides me with a VBA solution.
I was thinking of using some sort of loop along with some conditional statements.
Thanks!
Please try:
=LEFT(F11,FIND(":",F11))&MID(F11,FIND(":",F11,6)+1,1)&".Data["&MID(F11,FIND(":",F11,2)+1,1)&"]."&RIGHT(F11,1)
copied down to suit, assuming placed in Row11 and your data is in ColumnF starting in Row11.
Curiosities:
When this A was first posted it attempted to address only the tabulated example input and output. I temporarily deleted that version while addressing that what was in the table as ABC might at times be ABCD and that what was I might at times be O.
OP has posted an answer that I edited to make no visible change but which shows as the deletion of two characters. A copy of the OP’s formula exhibited a syntax error prior to my edit.
OP suggested an edit to my answer but this was rejected by the review process. As it happens, I think the edit suggestion was incorrect.
I have edited my answer again to include these ‘curiosities’ and to match the cell reference used by the OP in his answer.
=LEFT(A1,SEARCH(":",A1)) & MID(A1, SEARCH(".",A1)-1, 2) &
"Data[" & MID(A1,SEARCH(":",A1)+1,1) & "]" & RIGHT(A1,2)
With the help of pnuts I was able to come up with my own solution:
=LEFT(F11,LEN(F11)-5)&MID(F11,LEN(F11)-2,2)&"Data["&MID(F11,LEN(F11)-4,1)&"]"&RIGHT(F11,2)
My solution works based on the fact that the length of the last six values in the string ABC:1:I:0 will always be the same in size for all the data I have, hence you see LEN(F11)-some number in my code. The only part of the string that changes in size is the first part, in this case ABC which can also be ABCDEF, etc.
If you'd like to use formulas rather than VBA, an easy option is to split the data into 4 columns, using the Text To Columns option - first split using the colon as a delimiter, then using a full-stop / period as a delimiter.
Once you have 4 columns of data (one for each block), you can use the Concatenate function to join them and add in the extra characters: =CONCATENATE(A1,":",C1,".","Data[",B1,"].",D1)
This should still work if you have extra / alternative characters (eg ABCD instead of ABC), as long as you have the same delimiters, but obviously you'd need to test to make sure.
this is my first post so i am sorry if this is confusing at all. I am trying to use a vLookup to run a comparative analysis between two reports. I am using a part number as a reference and trying to return the cost associated with the part from one of the two reports. So, the first issue that I encountered was due to the fact that some of the part numbers had letters in them and some didn't, so to be consistent I used the following code to clean up the part numbers:
IFERROR(VALUE(F11&C11), F11&C11)
where F11 and C11 are two components of the part number that needed to be concatenated to generate the full number. Now, the vLookup will not return anything except for #N/A for a few of the part numbers that are actually in the sheet. All of the part numbers are formatted the same for the 892 part numbers that I am searching for but get a returned value on 571 of the 892 part numbers but of the remaining 321 part numbers that did not have a return, about a third actually exist in my sheet. Lastly and for example, part number 110874402 exists in both sheets but gets a #N/A from the vLookup. When I copy the value from one sheet and search it in the other sheet using Ctrl + F, I get the following:
(I have an image to show but apparently can't post it without a reputation of 10 or more...oops)
The highlighted cell shows that the value exists but Excel can't find it. Does anyone have any ideas why this is or what I could be doing differently? I've been having this issue for a few months now on separate projects and haven't found any resolution.
Thanks in advance,
try =VLOOKUP("*"&TRIM(F569)&"*", BOBJ!$D$3:$P$2237, 7, FALSE) - I have a feeling spaces may have crept around the part numbers, which means that the exact match will not work.
The TRIM takes the spaces from the cell you are looking at, and the "*"'s will allow a wildcard search - note that this also means that CAT would also match CAT1, but if it produces results where there were none before, it gives you something to check for.