Excel Formula - Match substrings of List to List - excel

I have two Lists in an excel spreadsheet.
The first list has strings such as
1234 blue 6 abc
xyz blue/white 1234
abc yellow 123
The other list contains substrings of the first list
yellow
blue/white
blue
Result
1234 blue 6 abc blue
xyz blue/white 1234 blue/white
abc yellow 123 yellow
Now I need some kind of match formula to assign the correct value from the second list to the first. The problem, there is no specific pattern to determine where the color substring is positioned. The other problem, the values are not totally unique. As my example above shows, the lookup needs to be in an order (checking for "blue/white" before checking for "blue").
I played around the formulas like match, find also using wildcards * but couldn't come to any result.
A similar question asked here on SO covers the opposite case How to find if substring exists in a list of strings (and return full value in list if so)
Any help is appriciated. A formula would be cool, but using vba is also okay.

=INDEX(D$7:D$9, AGGREGATE(15, 7, ROW($1:$3)/ISNUMBER(SEARCH(D$7:D$9, A2)), 1))

Here is a solution with VBA
List 1 (strings) is in column A
List 2 (substrings) is in column C
The code basically contains to nested while loops checking whether the substring is inside the string.
row_1 = 1
While .Cells(row_1, "A") <> ""
row_2 = 1
While .Cells(row_2, "C") <> ""
color = .Cells(row_2, "C").Value
If InStr(1, .Cells(row_1, "A"), color, vbBinaryCompare) > 0 Then
.Cells(row_1, "B") = color
End If
row_2 = row_2 + 1
Wend
row_1 = row_1 + 1
Wend

Related

Create a user form with inputs for one entry with multiple parameters

I am looking to create a user form that would allow to give one entry multiple parameters, resulting in a table that looks like this:
fruit
colour(s)
apple
red
green
banana
yellow
brown
green
Also to note, each entry should be able to have a range of parameters, anywhere from 1 to likely 10 at most. Desirably, it would also merge the empty cells with the entry, but this is not necessary. Is there a way to do this in excel without VBA? If not, where would I start with VBA.
ok so I think the user inputs a line of information and you want to build the above table.
so, for example the input could be:
A1: Apples
B1: red, green
then another entry could be
A1: banana
A2: yellow, brown, green
the below code can handle this with titles in A5 and B5 of Fruit and Colour(s) respectively:
Sub tableCreate()
fruit = Range("A1").Value
colours = Range("B1").Value
coloursArr = Split(colours, ",")
inputRow = Range("A" & Rows.Count).End(xlUp).Row + 1
For x = 0 To UBound(coloursArr)
Range("A" & inputRow).Value = fruit
Range("B" & inputRow).Value = coloursArr(x)
inputRow = inputRow + 1
Next x
Range("A1").Clear
Range("B1").Clear
End Sub
Hopefully this gets you started. There are some efficiency savings to be had on this if it'll be used a lot but I just wanted to show a simple solution
this link will help with putting a button on the sheet to enter the info.

cut important parts of string in a column

i have a column called Dateiname which contains a string. my goal is to get only the string Gruen Gelb Orange from the column and create a new column which represents each row if it contains Gruen Gelb Orange
i tried with this code:
result['Y'] = result.Dateiname.str[-10:-4]
as these words are not equally long i get 4_ or 1_ or just _, depending if it is Gruen or Gelb which i want to slice out. Is there any possibility to get the parts Gruen Gelb Orange of the column Dateiname and save it into the column Y?
the goal would be this:
Use str.extract:
result['Y'] = result.Dateiname.str[-10:-4].str.extract('(Gruen|Gelb|Orange)')
Another solution is split by _ or . and get second value from end by indexing:
result.Dateiname.str.split('_|\.').str[-2]
Or if want check all data:
result['Y'] = result.Dateiname.str.extract('(Gruen|Gelb|Orange)')
If your data follows same format as required_word followed by .csv then use str.extract with regex:
For Example:
result = pd.DataFrame({'Dateiname':['asdfjaskld_3242_34.fsdf_450_Violet.csv',
'asdfjaskld_3242_34.fsdf_450_Green.csv',
'asdfjaskld_3242_34.fsdf_450_Indigo.csv',
'asdfjaskld_3242_34.fsdf_450_Red.csv']})
result['Y'] = result.Dateiname.str.extract(r'([a-zA-Z]+).csv')
print(result)
Dateiname Y
0 asdfjaskld_3242_34.fsdf_450_Violet.csv Violet
1 asdfjaskld_3242_34.fsdf_450_Green.csv Green
2 asdfjaskld_3242_34.fsdf_450_Indigo.csv Indigo
3 asdfjaskld_3242_34.fsdf_450_Red.csv Red
You can use:
result['Y'] = result['Dateiname'].str.split('_').str[-1].str[:-4]

Count or Sum (?) items in a single cell that match criteria from a list in Excel

I have a single cell that is the output of a survey, that contains items selected from a list of 20 possible items.
ie.
Original possible selections:
Ape, Blue, Cat, Red, Dog, Yellow, Pig, Purple, Zebra
User is asked to "select all of the animals," from the list of possible selections. The output places all of the items they've identified into a single cell, separated by commas.
A new row is created for each survey entry.
ie.
| User 1 | "Ape, Cat, Pig, Purple" |
| User 2 | "Cat, Red, Dog, Pig, Zebra" |
| User 3 | "Ape, Cat, Dog, Pig, Zebra" |
etc...
I have a table with all of the animals and colors, with defined ranges.
ie. animals = A1:A5, and colours = B1:B4
I need to "score" the cell for each user, in a new cell. Where the score value is the count of the number of correctly identified items each counts as 1 point.
ie.
| User 1 | "Ape, Cat, Pig, Purple" | 3 |
| User 2 | "Cat, Red, Dog, Pig, Zebra" | 4 |
| User 3 | "Ape, Cat, Dog, Pig, Zebra" | 5 |
What would the formula need to be for that score cell for each row?
I found a previous thread, that seems to point in the right direction,
Excel: Searching for multiple terms in a cell
But this only checks for the existence of any of the items in a cell from a list and returns a true or false
Thanks for anyone's help!
COUNTIF with SUMPRODUCT:
=SUMPRODUCT(COUNTIF(D2,"*" & $A$1:$A$5 & "*"))
Which also has the limitation of the amimals not being a sub-string, like Ant and Ant-Eater
If sub-strings are a problem then use this:
=SUMPRODUCT(--(ISNUMBER(SEARCH(", " & $A$1:$A$5 & ", ",", " & D2 & ", "))))
This will make a complete match between the commas.
The formula shown is entered in D3 (an array formula, so use Ctrl+Shift+Enter) and filled down to D5
A3:A6 is a named range "animals"
Note this is only reliable if none of your terms are sub-strings of another term.
If you do not like to use the formulas above, which are very efficient and most ideal, a simpler but longer way would be as follows:
select the animals--> Data--> Text to Columns, and split them into columns with the separator being a comma
Once this is done, do a countif on each column, and it will total it up for you. You will need to do 20 countifs though, so it is far from ideal
IE
=countifs(column which it could be in],[no.1 animal])+
countifs(column which it could be in],[no.2 animal])+...
countifs(column which it could be in],[no.20 animal])
This is easy to see how it works and if you receive more answers you will have to split them out again

Excel non-uniform data extraction

I've had a really hard time tracking down a solution for this--though I'm sure it's out there. Just not sure of the exact wording to get what I'm looking for.
I have a huge data set where some of the data is missing information so it is not uniform. I want to extract just the name into one column and the e-mail in to the next column.
The best way I can narrow this down is there is a space between each unique entry with the name always being in the first box.
Example:
John Doe
John Doe's Company
(555) 555-5555
John.doe#johndoe.com
John Doe
(555) 555-5555
John Doe
Jane Doe's Company
John.doe#johndoe.com
With the results wanted being (in two excel columns):
John Doe | john.doe#johndoe.com
John Doe |
John Doe | john.doe#johndoe.com
Any suggestions on the best way to do this would be appreciated it. To make it complicated if there was no e-mail I would want to ignore that set completely, but I could just manually check.
VBA coding:
1. Indicate in Row1 the initial row where the data begins.
2. Place a flag in this case the word "end" to indicate the end of the information.
3. Create a second sheet
Sub ToList()
Row1 = 1 'Row initial from data
Row2 = 1 'Row initial to put list
Do
Name = False
Do
field = Trim(Sheets(1).Cells(Row1, 1))
If field <> "" And LCase(field) <> "end" And Not Name Then
Sheets(2).Cells(Row2, 1) = field
Name = True
End If
Row1 = Row1 + 1
Loop Until (IIf(field = "" Or LCase(field) = "end", True, False))
fieldprev = Sheets(1).Cells(Row1 - 2, 1)
If InStr(fieldprev, "#") > 0 Then
Sheets(2).Cells(Row2, 2) = fieldprev
End If
Row2 = Row2 + 1
Loop Until (IIf(LCase(field) = "end", True, False))
End Sub
Extracting the e-mail address shouldn't be too difficult as you just need to is search for a string containing the # character. A series of search() and mid() functions can be used to separate out the individual words. Search for each instance of a space and use that value in a mid() function. Then search for # in the results and you should find the e-mail address. Extracting the name will be more difficult if the original data is very messy.
However I second the comment above about using an external script, especially for a large dataset. Excel isn't really designed for the sort of thing you are describing here.

VBA HasNextCell function

I have tried searching for a solution but cant find it.
I have a list of products, and each product has many parts. Is there a HasNext function in VBA to see if there are more parts for a product? For instance, for chicken burger, I want to pick out all the parts, put them in an array and display it in another sheet.
I cant hard-code the array, because the client would add in more products in the future. There might be 15, 20, 23 parts etc. Is there a HasNext function to get the value in the next column and add it into the array?
Product | Part 1 | Part 2 | Part 3
Chicken Burger | Veggie | Bun | Patty
You can use Range.End property to detect how "long" the title row is:
Dim col
col=Range("A1").End(xlToRight).Column
For i = 1 to col
If Cells(1, col).Value <> "" Then
'...
End
P.S. I wonder why MSDN refers to it as "property" instead of "method"...

Resources