excel filter data with many headers - excel

I have a really long excel spreadsheet which I need to sort in a really unusual way:
I have many columns, one of which is full of numbers and blank spaces. The column is cut into many parts and is separated by blank spaces. The blank spaces act as the beginning and the end of two areas.
What I need to do is to leave only the numbers that are bigger than 999999999 and smaller than 2000000000 while keeping only the blank spaces adjacent to them. (and filtering all other columns the same way this one column is filtered)
--- Example Table:
Name | ID................. | other data
Bob.. |......................|~-~-~---~-``~
Taxes | 1000077008 | ~~ -`~ `~ ~--
Alice |......................| ~~--~-~ ~_~
Carel |......................|~~ ~ ~--_ ~~
Beans | 2000007804 | ~ ~_~ `~ ~~ `
Coffee| 1000078363 | ~ ~-`--`-` `_~-
--- Example Filtered Table:
Name | ID................. | other data
Bob.. |......................|~-~-~---~-``~
Taxes | 1000077008 | ~~ -`~ `~ ~--
Carel.|......................|~~ ~ ~--_ ~~
Coffee| 1000078363 | ~ ~-`--`-` `_~-

The spaces in front of the numbers show that the format is Text. Change the format in Excel to General or to Numeric and use a custom filter to achieve what you want.
This is an example of a custom filter:
If you cannot change the format, then use the =INT(TRIM(A1)) formula in Excel and sort them.

Related

Excel - Forumla to seacrh for mutliple terms and show the matching term

I have the challenge that I need to search in Excel for multiple terms and to get the result back for each cell which of the different terms has matched.
I know there is a formula combination to search for multiple terms but this will not give me the matched term back. The exampel below gives only a "0" or "1" back.
=IF(ISNUMBER(SEARCH({"TermA","TermB","TermC"},A1)),"1","0")
| | A | B |
| 1 | This is TermA | TermA |
| 2 | Some TermB Text | TermB |
| 3 | And TermA Text | TermA |
| 4 | another TermC | TermC |
Background I have to do some normalization of the values and look therefore for some forumla which can identify the values and list the match. The values which are used to search for should be later on another page so it can be easily extended.
Thank you for some hints and approaches which will put me into the right direction.
To return all matching terms:
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.='TermA' or .='TermB' or .='TermC']"),COLUMN(A1))
Wrap in an IFERROR() if no match is found at all.
If one has ExcelO365 and you refer to a range, things got a lot easier:
Formula in E1:
=TRANSPOSE(FILTER(C$1:C$3,ISNUMBER(FIND(C$1:C$3,A1))))
=INDEX(FILTER(C:C,C:C<>""),MATCH(1, COUNTIF(A1, "*"&FILTER(C:C,C:C<>"")&"*"), 0))
For use in office 365 version. If previous version replace FILTER(C:C,C:C<>"") with C$1:C$4 for your example or whatever your range of search values may be. Table reference is also possible.
The formula searches for the first match in your list of values if the text including your term contains a matching value anywhere in that text. It returns the first match.

Use pandas to conditionally format substrings in Excel

I have an Excel table like so:
+------------+-----------------------+
| String1 | String2 |
+------------+-----------------------+
| Example 1 | This is example 1 |
| Example 2 | The second Example, 2 |
+------------+-----------------------+
I'm trying to compare the two strings, and format them conditionally. Ideally, I'd be able to create a third column, with the string difference in bold (or whatever formatting I want, applied) like so:
+--------------+---------------------------------+-----------------------------------+
| String1 |    String2      |   Formatted String  |
+--------------+---------------------------------+-----------------------------------+
| Example 1 | This is Example 1   | This is Example 1   |
| Example 2 | The second Example, 2 | The second Example, 2 |
+--------------+---------------------------------+-----------------------------------+
I know that using XlsxWriter I can apply conditional formatting to a df as I'm writing to excel, but it seems I can only do that to an entire cell. Is there any way to apply my formatting to some contents of each cell?
Alternatively, could I insert HTML tags into my df to produce say, "<b>This is</b> Example 1" and then render those tags in excel?
For anyone encountering this problem: XlsxWriter can output rich strings. You have to define all your formats explicitly, but it works.

Substracting part of cell

So lets say that in one row i have in 2 cells some data and I want to extract the data after the second "_" character:
| | A | B |
|---|:----------:|:---------------------:|
| 1 | 75875_QUWR | LALAHF_FHJ_75378_WZ44 | <- Input
| 2 | 75875_QUWR | 75378_WZ44 | <- Expected output
I tried using =RIGHT() function but than i will remove text from this first cell and so on, how can i write this function? Maybe I would compare this old cell and than to do if the second row is empty because maybe function deleted it to copy the one from first? No idea
Try:
=MID("_"&A1,FIND("#",SUBSTITUTE("_"&A1,"_","#",LEN("_"&A1)-LEN(SUBSTITUTE("_"&A1,"_",""))-1))+1,100)
Regardless of the times a "_" is present in your string, it will end up with the last two "words" in your string. Source
Use following formula.
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,"_","#",2))+1,100))

Exact frequency of a specific word in a single cell (excluding suffix and prefix)

I earlier worked out a good solution for this with the help of the comunity, it works really good but I found out it can only handle suffix words (it dosen't ignore prefix-words).
Formula:
=IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;B1&" ";"")))/(LEN(B1)+1)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"")
A contains sentences, multiple words (without punctuation)
B contains the word I want to count the exact frequency of.
C here is there the formula is placed and where I get the result
Sample table:
| A | B | C |
|:-------------------------:|:----:|:--------:|
| boots | shoe | 0 |
----------------------------------------------|
| shoe | shoe | 1 |
----------------------------------------------|
| shoes | shoe | 0 |
----------------------------------------------|
| ladyshoe dogshoe catshoe | shoe | 3 |
----------------------------------------------|
In C-column I am getting correct output in row 1, 2 and 3 but not 4. I want C4 should return 0 and not 3.
The problem is that it makes no match for shoexxxxxxxxxxx (correct) but makes a match for xxxxxxxxxxxshoe (wrong).
I only want the formula to count the exact match for shoe, any other word should not be counted for.
You want this formula:
=IF(B1<>"",(LEN(A1)-LEN(SUBSTITUTE(A1," "&B1&" ","")))/(LEN(B1)+2),"")+IF(A1=B1,1,0)+IF(LEFT(A1,LEN(B1)+1)=B1&" ",1,0)+IF(RIGHT(A1,LEN(B1)+1)=" "&B1,1,0)
I'll denote a space by * to make the following clearer:
There are four cases to consider:
string; the word has no spaces on either side (and is therefore the only word in cell A1
string*; the word appears at the start of a list of words.
*string; the word appears at the end of a list of words.
*string*; the word is in the middle of a list of words.
First we count the number of occurrences of *string*, by substituting "*string*" for "", subtracting the length of the new string from the old one, and dividing by len(string)+2 (which is the length of *string*).
Then we add one more to our count if A1 is exactly string, with no spaces either side.
Then we add one more if A1 starts with string*, and one more if A1 ends with *string.

Looking multiple values on multiple columns in excel

I have a table where each person has several "Job" columns. I need to find all the employees who has a specific value ("Actor") in one of their "Job" columns.
I thought about doing HLookup on multiple columns, but Lookup functions only returns the first match (and I'm not sure I can use it on multiple columns). I also tried Pivot Tables, but all I got is aggregation, not the exact matches. How can I solve it?
For example, from the sample data below, when looking for "Actor", I would like to get both "John, Doe" and "Todd, Dude"
Sample data:
Id | First Name | Last Name | email | Job1 | Job2 | Job3 | Job4
-----------------------------------------------------------------------------------
1 | John | Doe | jd#i.com | Actor | Photographer | Producer |
2 | Todd | Dude | sd#i.com | Lights | Actor | |
3 | Janis | Joplin | jj#i.com | Singer | | |
Assuming the table as you give it is in A1:H4 (with headers in row 1), and that you put e.g. "Actor" in J1, this array formula** in J2:
=IFERROR(INDEX($B$2:$B$4&" "&$C$2:$C$4,SMALL(IF($E$2:$H$4=J$1,ROW($E$2:$H$4)-MIN(ROW($E$2:$H$4))+1),ROWS($1:1))),"")
Copy down until you start to get blanks. The formula may also be copied across to give results for other professions listed in K1, L1, etc.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
I have no idea, how to it in one formula. All lookups and match returns only first reference. You could add column jobActor --- true if one of the jobs is Actor and then create pivot --- filter on jobAll, row names is person names.
Maybe advanced filter could be the way how to do it.

Resources