splitting underscores in Excel - excel

I'm fairly new to Excel and need some assistance. I have a Column that has a list of files that look like:
12345_v1.0_TEST_Name [12345]_01.01.2022.html
45321_v55.9_Some Name Here [64398]_07.15.2018.html
56871_v14.2_Test[64398]_10.30.2019.html
Each file name can be different depending on what output is provided to me.
Note: There are other random files in the same format, however where it says Test_Name there could be an underscore and sometimes no underscore. Would like that to be ignored in the formula or vba. Files also can change but will be in the same format.
I need some help with a formula or vba that splits the underscores and outputs the data into their own cells:
Column C 12345
Column D v1.0
Column E TEST_Name [12345]
Column F 01.01.2022
Column G .html

Since there can be different file extensions however the format remains same, hence the above formula which i provided has been amended with some few tweaks so that it works for any file extensions,
FORMULA IN CELL C1
=IF(LEN($B1)-LEN(SUBSTITUTE($B1,"_",""))+1>4,
TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($B1,"."&TRIM(RIGHT(SUBSTITUTE(
SUBSTITUTE($B1,"."," ",LEN($B1)-LEN(SUBSTITUTE($B1,".","")))," ",REPT(" ",200)),100)),"_"&"."&
TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE($B1,"."," ",LEN($B1)-LEN(SUBSTITUTE($B1,".","")))," ",
REPT(" ",200)),100))),"_"," ",3),"_",REPT(" ",100)),COLUMN(A1)*99-98,100)),
TRIM(MID(SUBSTITUTE(SUBSTITUTE($B1,"."&TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(
$B1,"."," ",LEN($B1)-LEN(SUBSTITUTE($B1,".","")))," ",REPT(" ",200)),100)),"_"&"."&
TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE($B1,"."," ",LEN($B1)-LEN(SUBSTITUTE($B1,".","")))," ",
REPT(" ",200)),100))),"_",REPT(" ",100)),COLUMN(A1)*99-98,100)))
FILL DOWN & FILL ACROSS!!!

There are other random files in the same format.....Files also can change but will be in the same format.
So, assuming the files indeed will be in the same format, we can brake this query down into the following requirements:
Change the 1st and 2nd occurence and the very last of the underscore into anything to split on;
Change the dot before the file-extension into anything to split on under the assumption we don't know if this would be '.html' or any other extension.
Since you have Microsoft365 we can use dynamic arrays and some basic functions to retrieve what you want:
=LET(X,SEARCH("_??.??.????.",A1),Y,"</s><s>",TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(REPLACE(A1,X,12,Y&MID(A1,X+1,10)&Y),"_",Y,2),"_",Y,1)&"</s></t>","//s")))
To break this down a little bit:
SEARCH("_??.??.????.",A1) - This part will make sure that we find the position of the very last underscore upto the dot before the file extension assuming you don't have any other date in your filenames in this specific format;
SUBSTITUTE() - We can use this formula to specifically change the 1st and 2nd instances of the underscore to anything we can split on;
FILTERXML() - You may notice we used valid xml start/end-tags to split our data using this function.
TRANSPOSE() - This last function will now spill the returned array over the columns instead of rows.
Without LET():
=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(REPLACE(A1,SEARCH("_??.??.????.",A1),12,"</s><s>"&MID(A1,SEARCH("_??.??.????.",A1)+1,10)&"</s><s>"),"_","</s><s>",2),"_","</s><s>",1)&"</s></t>","//s"))

Is this what you are trying to achieve, although there might be more eloquent way to use a formula, and solve this, however you may try using this as well,
FORMULA USED IN CELL C1
=IF(LEN($B2)-LEN(SUBSTITUTE($B2,"_",""))+1>4,TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($B2,".html","_.html"),"_"," ",3),"_",REPT(" ",100)),COLUMN(A2)*99-98,100)),TRIM(MID(SUBSTITUTE(SUBSTITUTE($B2,".html","_.html"),"_",REPT(" ",100)),COLUMN(A2)*99-98,100)))
Fill Down & Fill Across !

Related

Unable to convert text to Numbers

I've been given an excel to import on Database, it was exported from an Access DB. in the excel there's a column type_class, in one excel it's good(sheet1), but on another excel which I moved to sheet2 to make VLOOKUP function, I can't tell whether it's a text or a number column from the first sight. the upper-left green-thing is not showing on all cells. but, using ISTEXT function result in text. below is the original column without any changes or formatting, as well as ISTEXT result.
when I use the column in a VLOOKUB function to transfer the Name to the first sheet, only (1010, 1101, 1102,....), hence the cells with the green-mark on the upper-left corner.
I can easly format the key in sheet1 using text-to-columns, cell formatting, and any other way.
but I cannot change the column in sheet2, I tried:
Text-to-Columns
Cell Formatting
VALUE(text), CLEAN(text), TRIM(text), TRIM(CLEAN(text)), CLEAN(SUBSTITUTE())
Multiply by 1
but only the cell with the green-mark changes to a number, the rest stays the same. I browsed the internet but didn't get a solution either.
Edit:
I uploaded what is need to test the case on the drive. you can find it here
Help Appreciated
For your digit strings that you can't convert to text, from the comments it seems there are extra characters in that string not removable by TRIM or CLEAN.
Determine what those character are
Assume a "non-convertible" digit string is in A1
Enter the following formula
B1: =MID($A$1,ROWS($1:1),1) and fill down
C1: = UNICODE(B1) and fill down
From this you can determine the character to use in a SUBSTITUTE function.
For example:
From the above we see that the character code that we need to get rid of is 160.
So we use:
=SUBSTITUTE(A1,CHAR(160),"")
or, to convert it in one step to a number:
=--SUBSTITUTE(A1,CHAR(160),"")
Note If the character code is >255, use UNICHAR instead of CHAR in the SUBSTITUTE function.
Without an example, I use value() to convert what excel takes as text like so:
=value(left(“10kg”,2))
Or the following also works:
=left(“10kg”,2)*1
Note those double quotes should be the straight ones - sorry smartphone is not always smart...
And if leading or trailing spaces are an issue, then trim() is one solution.

Need to remove non-matching text from a delimited Excel string

I have an Excel database query to get all the RBAC user roles that are assigned to each user, and the database returns a string delimited by & (ampersands) between each user role, e.g.:
&Admin&Supervisor&ViewReports&WriteReports&
My query filters records that only have a matching string, let's say it's Reports. However it still returns the full list of user roles for a matching user, and in this case some users have >10 roles assigned and it makes the table look really messy and not suitable for printing.
I could manually clean up each row, but there are quite a lot of them, and since this is going to be run regularly I'm wondering if there may be a good Excel formula or VBS method to split delimited sections of a string and only keep ones that match a string criteria.
I'm aware of "Text to Columns" and its ability to make use of delimiters, but it just spat out a ton of columns and made things worse. I've done several searches about cleaning up delimited strings in Excel but couldn't find any results that were similar to my situation: need to split a delimited string and do something RegEx-esque to only keep parts that match a pattern.
Ideally I'd like to keep the cleaned up results in a single cell, so the above example &Admin&Supervisor&ViewReports&WriteReports& would look like:
ViewReports WriteReports
or
ViewReports,WriteReports
or similar, in a single cell. Not too picky about formatting really, just need the non-relevant parts of the string gone.
you could use a combination of trim, mid and substitute to find your values so using your example above:
if oyu have a blank excel sheet and ad your example to cell A3 then place 1,2,3 and 4 in cells B2, C2, D2, E2 then use copy this formula into cell B3:
=TRIM(MID(SUBSTITUTE($A3,"&",REPT(" ",LEN($A3))),(B$2-1)*LEN($A3)+1,LEN($A3)))
this should give you the value "Admin".
After that just pull formulas to the right and you will get all 4 values in your example
Please let me know if you need more explanation.
For more info on this equation please see webpage:
https://exceljet.net/formula/split-text-with-delimiter
This formula will work in Excel/Office 365. It won't work in earlier versions due to the TEXTJOIN function which appeared in 2016.
Assumes the data is simple strings as above (i.e. not an XML document that might contain duplicates of the created nodes. If that were the case, there is another method of splitting the string we could use).
Split the string on the ampersand with FILTERXML
Use a variation of the INDEX function to return an array of the matching sections
Concatenate those sections with TEXTJOIN
=TEXTJOIN(" ",TRUE,INDEX(FILTERXML("<t><s>" & SUBSTITUTE(A1,"&","</s><s>")&"</s></t>","//s"),N(IF(1,{3,5}))))
The …N(IF(1,{3,5}))… portion is how to return an array of values from the INDEX functions. In this case, 3 and 5 refer to the value before the third and fifth ampersand. Note that 1 would return an error, since there is nothing before the first ampersand.
You can return whichever elements you want. You just need to know (or calculate using the MATCH function), the proper index number(s).
Note that, with TEXTJOIN you can specify whatever delimiter you want. I specified a space, but you could specify comma or anything.

Make SumIf ignore words?

=SUMIF(E3:E,"YES",C3:C)
The above formula works in adding the numbers in C if the corresponding E cell is "YES", however my cells in C have "# MINS" in them, is there a way to make SumIf ignore words and only add the number?
SCREENSHOT OF SPREADSHEET: https://cdn.discordapp.com/attachments/358381825246101505/488443165364322327/Screenshot_1.png
If you’re using Google Spreadsheets, you have the possibility to format the numbers as you want.
In the cells, store the numbers only so that SUMIF will work, then create a custom number format: in the toolbar - Format - Number - More Formats - Custom number format - type in # “MINS”.
=SUMPRODUCT(LEFT(C3:C5,LEN(C3:C5)-LEN(" mins"))*(D3:D5="yes"))
This is an array like calculation. As such full column references may bog your computer down with excess calculations.
Get rid of the MINS. You can use Find & Replace or Text to Columns, etc.
Create a custom number format of 0 \M\I\N\S.
Use your original formula.
excel
=SUMIF(E:E,"YES",C:C)
google-spreadsheet
=SUMIF(E3:E,"YES",C3:C)
Shareable link

Excel formula to extract file name from paths without file extension

I have this excel formula which I'm hoping can be amended to extract the file name without the file extension. I know this can be done over two formulas but it would make the process I am working with a whole lot easier. I'm also trying to avoid VBA as I don't have any knowledge on it.
=IFERROR(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1,"\",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1)),"")
Thanks!
You already have logic for finding the position of the last "\" character in string. You can use the same logic for finding the position of the last "." character in string. Then the difference of those two numbers minus one will be the length of the file name without the file extension. Use this length for the num_chars argument ot the MID function. For example...
=IFERROR(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1,"\",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,FIND(CHAR(1),SUBSTITUTE(A1,".",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))-FIND(CHAR(1),SUBSTITUTE(A1,"\",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))-1),"")
Note that this formula assumes that all file names have an extension. If some files have no extension then you would need to add extra logic to the formula.
It looks like you expect A1 to contain the filename, but you can do this all at once if you're willing to replace every instance of "A1" with Cell("filename",A1). Either way, it looks like you are overcomplicating the situation by looking to count the number of file paths, when really all you need is the "[" & "]" which encloses the name of the file.
If A1 is any random cell with nothing special in it, the formula looks like this:
=MID(CELL("filename",A1),FIND("[",CELL("filename",A1))+1,FIND("]",CELL("filename",A1))-FIND("[",CELL("filename",A1))-1)
If A1 holds CELL("filename",B1), then the formula looks like this:
=MID(A1,FIND("[",A1)+1,FIND("]",A1)-FIND("[",A1)-1)
CELL("filename",B1) looks at B1, and pulls the property "filename". In this case, it doesn't matter whether you use B1, Z10, etc. - it just can't be a circular reference. You can also use CELL to pull cell-specific info, such as cell address etc.
Then the MID function simply looks at A1, starts 1 after the "[", and picks up all the characters up to the "]".

Using SUMIFS with multiple AND OR conditions

I would like to create a succinct Excel formula that SUMS a column based on a set of AND conditions, plus a set of OR conditions.
My Excel table contains the following data and I used defined names for the columns.
Quote_Value (Worksheet!$A:$A) holds an accounting value.
Days_To_Close (Worksheet!$B:$B) contains a formula that results in a number.
Salesman (Worksheet!$C:$C) contains text and is a name.
Quote_Month (Worksheet!$D:$D) contains a formula (=TEXT(Worksheet!$E:$E,"mmm-yy"))to convert a date/time number from another column into a text based month reference.
I want to SUM Quote_Value if Salesman equals JBloggs and Days_To_Close is equal to or less than 90 and Quote_Month is equal to one of the following (Oct-13, Nov-13, or Dec-13).
At the moment, I've got this to work but it includes a lot of repetition, which I don't think I need.
=SUM(SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,"=Oct-13")+SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,"=Nov-13")+SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,"=Dec-13"))
What I'd like to do is something more like the following but I can't work out the correct syntax:
=SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,OR(Quote_Month="Oct-13",Quote_Month="Nov-13",Quote_Month="Dec-13"))
That formula doesn't error, it just returns a 0 value. Yet if I manually examine the data, that's not correct. I even tried using TRIM(Quote_Month) to make sure that spaces hadn't crept into the data but the fact that my extended SUM formula works indicates that the data is OK and that it's a syntax issue. Can anybody steer me in the right direction?
You can use SUMIFS like this
=SUM(SUMIFS(Quote_Value,Salesman,"JBloggs",Days_To_Close,"<=90",Quote_Month,{"Oct-13","Nov-13","Dec-13"}))
The SUMIFS function will return an "array" of 3 values (one total each for "Oct-13", "Nov-13" and "Dec-13"), so you need SUM to sum that array and give you the final result.
Be careful with this syntax, you can only have at most two criteria within the formula with "OR" conditions...and if there are two then in one you must separate the criteria with commas, in the other with semi-colons.
If you need more you might use SUMPRODUCT with MATCH, e.g. in your case
=SUMPRODUCT(Quote_Value,(Salesman="JBloggs")*(Days_To_Close<=90)*ISNUMBER(MATCH(Quote_Month,{"Oct-13","Nov-13","Dec-13"},0)))
In that version you can add any number of "OR" criteria using ISNUMBER/MATCH
You can use DSUM, which will be more flexible. Like if you want to change the name of Salesman or the Quote Month, you need not change the formula, but only some criteria cells. Please see the link below for details...Even the criteria can be formula to copied from other sheets
http://office.microsoft.com/en-us/excel-help/dsum-function-HP010342460.aspx?CTT=1
You might consider referencing the actual date/time in the source column for Quote_Month, then you could transform your OR into a couple of ANDs, something like (assuing the date's in something I've chosen to call Quote_Date)
=SUMIFS(Quote_Value,"<=90",Quote_Date,">="&DATE(2013,11,1),Quote_Date,"<="&DATE(2013,12,31),Salesman,"=JBloggs",Days_To_Close)
(I moved the interesting conditions to the front).
This approach works here because that "OR" condition is actually specifying a date range - it might not work in other cases.
Quote_Month (Worksheet!$D:$D) contains a formula (=TEXT(Worksheet!$E:$E,"mmm-yy"))to convert a date/time number from another column into a text based month reference.
You can use OR by adding + in Sumproduct. See this
=SUMPRODUCT((Quote_Value)*(Salesman="JBloggs")*(Days_To_Close<=90)*((Quote_Month="Cond1")+(Quote_Month="Cond2")+(Quote_Month="Cond3")))
ScreenShot
Speed
SUMPRODUCT is faster than SUM arrays, i.e. having {} arrays in the SUM function. SUMIFS is 30% faster than SUMPRODUCT.
{SUM(SUMIFS({}))} vs SUMPRODUCT(SUMIFS({})) both works fine, but SUMPRODUCT feels a bit easier to write without the CTRL-SHIFT-ENTER to create the {}.
Preference
I personally prefer writing SUMPRODUCT(--(ISNUMBER(MATCH(...)))) over SUMPRODUCT(SUMIFS({})) for multiple criteria.
However, if you have a drop-down menu where you want to select specific characteristics or all, SUMPRODUCT(SUMIFS()), is the only way to go. (as for selecting "all", the value should enter in "<>" + "Whatever word you want as long as it's not part of the specific characteristics".
In order to get the formula to work place the cursor inside the formula and press ctr+shift+enter and then it will work!
With the following, it is easy to link the Cell address...
=SUM(SUMIFS(FAGLL03!$I$4:$I$1048576,FAGLL03!$A$4:$A$1048576,">="&INDIRECT("A"&ROW()),FAGLL03!$A$4:$A$1048576,"<="&INDIRECT("B"&ROW()),FAGLL03!$Q$4:$Q$1048576,E$2))
Can use address / substitute / Column functions as required to use Cell addresses in full DYNAMIC.

Resources