Replacing Delimiter in MCONCAT formula - excel

I'm using the MCONCAT formula (with success & help from others) to create a single string of multiple attachment names to associate them with a single record # (I am converting data from a legacy system to another by way of flat files and a data loader).
An example: | Contract 1 | filename.pdf, filename2.doc |
However, when the first load was run, records that had a comma in the name error-ed out because the data loader is viewing the comma as the break between files. After some research, we decided to use '#' as the delimiter between multiple files in a cell. Now I am stuck trying to substitute the comma delimiters in my MCONCAT formula with '#' and have been fruitless so far.
Here is the code as I am using it now:
=SUBSTITUTE(MCONCAT(IF($A$2:$A$11133=$D2,", "&$B$2:$B$11133,"")),", ","",1)
Is this possible to do? If so, how & maybe (if not asking to much) a short explanation so I can fully understand.
An example of the hopeful solution: | Contract 1 | filename.pdf # filename2.doc |

Depending on the complexity of the filenames with commas in, you might be able to do what you want simply using the Find & Select / Replace feature of excel.
Please use a copy of your workbook if you try anything suggested.
If your separator is always [list item][comma][space][list item] and none of your [list item(s)] contain [comma][space] then using a "find what" term of ", " (note the space!) and "replace with" term of "# ", Using [at][space] instead of [space][at][space] is probably better, and selecting the column containing the list should fix your problem.
A VBA solution is possible but it would probably be more effort than its worth. You might need to write lots of rules telling it how to split and join stuff, and end up with it still not being perfect.
While doing it manually might not be a fun idea, you could use something like "text to columns" to split your list then look over the results and fix the errors then re-join using your new delimiter.

Related

Find value within cell from a series in Excel

I have a list of addresses, such as this:
Lake Havasu,  Lake Havasu City,  Arizona.
St. Johns River,  Palatka,  Florida.
Tennessee River,  Knoxville,  Tennessee.
I would like to extract the State from these addresses and then have a column showing the abbreviated State name (AZ, FL, TN etc.).
I have a table that has the States with their abbreviation and once I extract the State, doing a simple INDEX MATCH to get the abbreviation is easy. I don't want to use text-to-columns because this file will constantly have values added to it and it would be much easier to just have a formula that does the extraction for me.
The ways I've tried to approach this that have failed so far are:
Some kind of SEARCH() function that looks at the full State list and tries to find a value that exists in the cell
A MID or RIGHT approach to only capture the last section but I can't work out how to have FIND only look for the second ", "
A version of INDEX MATCH but that fails because I can't find a good way to search or find the values as per approach (1)
Any help would be appreciated!
Please try this formula, where A2 is the original text.
=FILTERXML("<data><a>" & SUBSTITUTE(A2,", ","</a><a>") & "</a></data>","data/a[3]")
An alternative would be to look for the 2nd comma as shown below. Note that the "50" in the formula is an irrelevant number required by the MID() function. It shouldn't be smaller than the number of characters you need to return, however.
Char(160) is a character that wouldn't (shouldn't) naturally occur in your text, as it might if the text comes from a UNIX database. You can replace it with another one that fits the description.
=TRIM(MID(A2, FIND(CHAR(160),SUBSTITUTE(A2,",",CHAR(160),2)) + 1,50))
The following variation of the above would remove the final period. It will fail if there is anything following the period, such as an unwanted blank. That could be accommodated within the formula as well but it would be easier to treat the original data, if that is an option for you.
=TRIM(MID(LEFT(A2, LEN(A2)-1), FIND(CHAR(160),SUBSTITUTE(A2,",",CHAR(160),2)) + 1,50))
To find the abbreviation I would recommend to use VLOOKUP rather than INDEX/MATCH.
Use this (screenshot refers):
=MID(MID(B3,1,LEN(B3)-1),SEARCH(",",B3,SEARCH(",",B3,1)+1)+3,LEN(B3))

Looking to remove a space in one of the results while performing multiple mid searches on a column

I am trying to embed a SUBSTITUTE in my function, but I am not sure where to incorporate it. I am trying to extract just the text "Scrumactiviteiten" but in the source data sometimes a space will be in there. A sample:
Column A
1 Team xxxx 2018-17 Scrumactiviteiten 123 and then something
2 Team xxxx 2018-17 Scrum activiteiten 123 and then something
Column B (My formula)
1 Scrumactiviteiten
2 Scrum activiteiten
The function I used to extract it (ignore the "Balans" search please):
=IFERROR(IFERROR(IFERROR(MID(A1;SEARCH("Scrum activiteiten";A1;1);18);
MID(A1;SEARCH("Scrumactiviteiten";A1;1);17));MID(A1;SEARCH("Balans";A1;1);10));" ")
This works fine, but to remove the space I tried to embed a SUBSTITUTE where I use the mid search result as the old text and provide "Scrumactiviteiten" as the new text:
=IFERROR(IFERROR(IFERROR(SUBSTITUTE(A24;((MID(A24;SEARCH("Scrum activiteiten";A24;1);18)));"Scrumactiviteiten");MID(A24;SEARCH("Scrumactiviteiten";A24;1);17));MID(A24;SEARCH("Balans";A24;1);10));" ")
The result however is a copy of the full string. I also tried putting the substitute before the search but that would not work either. I am pretty new to Excel formula's and I think I messed up the order or just plain don't understand how I embed a SUBSTITUTE in the formula I created. Some explanation would be much appreciated on what I'm doing wrong! Thank you in advance,
Mark
The problem is you are not providing the correct arguments to the function, try this formula:
=IFERROR(IFERROR(IFERROR(SUBSTITUTE(((MID(A24;SEARCH("Scrum activiteiten";A24;1);18)));" ";"");MID(A24;SEARCH("Scrumactiviteiten";A24;1);17));MID(A24;SEARCH("Balans";A24;1);10));" ")
To use SUBSTITUTE you first provide the string in which you want to replace something, the next two arguments are the string you want replaced and the string you want to replace it with. So for example =SUBSTITUTE("Scrum activiteiten";" ";"") returns Scrumactiviteiten as the space " " is replaced with an empty string "".

How to search for items with multiple "-" in excel or VBA?

I have a list of item numbers (100K) like this:
Some of the items have format like SAG571A-244-4 (thousands) which need to be filtered so I can delete them and only keep the items that have ONE hyphen per SKU. How can I isolate the items that have two instances of "-" in it's SKU? I'm open to solutions within Excel or using VBA as well.
Native text filters don't seem to be capable of this. I'm stumped.
As per John Coleman's comment, "*-*-*" can be used to isolate strings that have at least two dashes in them.
I would add that if you're entering them as a custom text filter, you should lose the double quotes (so just *-*-*) as otherwise the field seems to interpret the quotes literally.
Seems to work for me.
If you want just an excel formula to verify this and give you a result of the number of hyphens (0, 1, or 2+), here is one:
=IF(ISERROR(SEARCH("-",A1)),"0",IF(ISERROR(SEARCH("-",A1,IFERROR(SEARCH("-",A1)+1,LEN(A1)))),"1","2+"))
Replace A1 with your relevant column, then fill down. This is kind of a terrible way to do this performance wise, but you avoid using VBA and possibly xlsm files.
The code first checks to see if there is one hyphen, then if there is it checks to see if there is another hyphen after the position the first one was found. Looking for multiple hyphens in this manner is cumbersome and I don't recommend it.

Replacing a section of the data in a cell for thousands of excel data

I have a large spreadsheet with column data like:
ABC:1:I.0
ABC:1:I.1
ABC:1:I.2
ABC:1:I.3
ABC:2:I.0
ABC:2:I.1
ABC:2:I.2
ABC:2:I.3
ABC:3:I.0
ABC:3:I.2
ABC:3:I.3
ABC:4:I.0
ABC:4:I.1
ABC:4:I.2
ABC:4:I.3
ABC:5:I.0
ABC:5:I.1
ABC:5:I.2
ABC:5:I.3
ETC.
I need to replace the above with the following:
ABC:I.Data[1].0
ABC:I.Data[1].1
ABC:I.Data[1].2
ABC:I.Data[1].3
ABC:I.Data[2].0
ABC:I.Data[2].1
ABC:I.Data[2].2
ABC:I.Data[2].3
ABC:I.Data[3].0
ABC:I.Data[3].2
ABC:I.Data[3].3
ABC:I.Data[4].0
ABC:I.Data[4].1
ABC:I.Data[4].2
ABC:I.Data[4].3
ABC:I.Data[5].0
ABC:I.Data[5].1
ABC:I.Data[5].2
ABC:I.Data[5].3
ETC.
Here is a sample of the data, most of the data follows a similar format with the exception of the naming "ABC", which can vary in size, so it might be "ABCD" and also with the exception of the letter "I", it can be "O" as well. Also, some might be missing some values such as ABC:3:I.1 which is missing from the data. I am not too familiar with excel formulas or VBA code. Does anyone know how to do this? I have no preference on which method it has to be done in as I don't mind learning some VBA code if someone provides me with a VBA solution.
I was thinking of using some sort of loop along with some conditional statements.
Thanks!
Please try:
=LEFT(F11,FIND(":",F11))&MID(F11,FIND(":",F11,6)+1,1)&".Data["&MID(F11,FIND(":",F11,2)+1,1)&"]."&RIGHT(F11,1)
copied down to suit, assuming placed in Row11 and your data is in ColumnF starting in Row11.
Curiosities:
When this A was first posted it attempted to address only the tabulated example input and output. I temporarily deleted that version while addressing that what was in the table as ABC might at times be ABCD and that what was I might at times be O.
OP has posted an answer that I edited to make no visible change but which shows as the deletion of two characters. A copy of the OP’s formula exhibited a syntax error prior to my edit.
OP suggested an edit to my answer but this was rejected by the review process. As it happens, I think the edit suggestion was incorrect.
I have edited my answer again to include these ‘curiosities’ and to match the cell reference used by the OP in his answer.
=LEFT(A1,SEARCH(":",A1)) & MID(A1, SEARCH(".",A1)-1, 2) &
"Data[" & MID(A1,SEARCH(":",A1)+1,1) & "]" & RIGHT(A1,2)
With the help of pnuts I was able to come up with my own solution:
=LEFT(F11,LEN(F11)-5)&MID(F11,LEN(F11)-2,2)&"Data["&MID(F11,LEN(F11)-4,1)&"]"&RIGHT(F11,2)
My solution works based on the fact that the length of the last six values in the string ABC:1:I:0 will always be the same in size for all the data I have, hence you see LEN(F11)-some number in my code. The only part of the string that changes in size is the first part, in this case ABC which can also be ABCDEF, etc.
If you'd like to use formulas rather than VBA, an easy option is to split the data into 4 columns, using the Text To Columns option - first split using the colon as a delimiter, then using a full-stop / period as a delimiter.
Once you have 4 columns of data (one for each block), you can use the Concatenate function to join them and add in the extra characters: =CONCATENATE(A1,":",C1,".","Data[",B1,"].",D1)
This should still work if you have extra / alternative characters (eg ABCD instead of ABC), as long as you have the same delimiters, but obviously you'd need to test to make sure.

Sorting on a numerical value using csvfix for linux - turns numbers to strings

I'm using csvfix to sort a CSV file based on an integer (counter) value in the second column. However it seems that csvfix puts double quotes around all fields in the file, turning them to strings, before it performs the sort. The result is that the rows are sorted by the string value, such that "1000" comes before "2".
There is a command-line option -smq that is supposed to apply "smart quoting" but that's not helping me. If I use the command csvfix echo -smq file.csv, the output has no quotes around numerical fields, but when I pipe that into csvfix sort -f 2 file.csv, the file is written without quotes but still sorted in "string order". It makes no difference whether I include the -smq flag in the sort command or not.
Additionally I would like csvfix to ignore the first row of string headers. Csvfix issue tracking claims this is already implemented but I can only find the -ifn flag that seems to cut the header row out entirely.
These seem pretty basic pieces of functionality for this tool, so I'm probably missing something very simple. Hoping someone on here has used csvfix and figured out.
According to the on line documentation for csvfix, sort has a N option for numeric sorts:
csvfix sort -f 2:N file.csv
Having said this, CSV isn't a particularly good format for text manipulation. If possible, you're much better off choosing DSV (delimiter separated values) such as Tab or Pipe separated, so that you can simply pipe the output to sort, which has ample capability to sort by field, using whatever collation method you need.

Resources