Use pandas to conditionally format substrings in Excel - excel

I have an Excel table like so:
+------------+-----------------------+
| String1 | String2 |
+------------+-----------------------+
| Example 1 | This is example 1 |
| Example 2 | The second Example, 2 |
+------------+-----------------------+
I'm trying to compare the two strings, and format them conditionally. Ideally, I'd be able to create a third column, with the string difference in bold (or whatever formatting I want, applied) like so:
+--------------+---------------------------------+-----------------------------------+
| String1 |    String2      |   Formatted String  |
+--------------+---------------------------------+-----------------------------------+
| Example 1 | This is Example 1   | This is Example 1   |
| Example 2 | The second Example, 2 | The second Example, 2 |
+--------------+---------------------------------+-----------------------------------+
I know that using XlsxWriter I can apply conditional formatting to a df as I'm writing to excel, but it seems I can only do that to an entire cell. Is there any way to apply my formatting to some contents of each cell?
Alternatively, could I insert HTML tags into my df to produce say, "<b>This is</b> Example 1" and then render those tags in excel?

For anyone encountering this problem: XlsxWriter can output rich strings. You have to define all your formats explicitly, but it works.

Related

Excel - Forumla to seacrh for mutliple terms and show the matching term

I have the challenge that I need to search in Excel for multiple terms and to get the result back for each cell which of the different terms has matched.
I know there is a formula combination to search for multiple terms but this will not give me the matched term back. The exampel below gives only a "0" or "1" back.
=IF(ISNUMBER(SEARCH({"TermA","TermB","TermC"},A1)),"1","0")
| | A | B |
| 1 | This is TermA | TermA |
| 2 | Some TermB Text | TermB |
| 3 | And TermA Text | TermA |
| 4 | another TermC | TermC |
Background I have to do some normalization of the values and look therefore for some forumla which can identify the values and list the match. The values which are used to search for should be later on another page so it can be easily extended.
Thank you for some hints and approaches which will put me into the right direction.
To return all matching terms:
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.='TermA' or .='TermB' or .='TermC']"),COLUMN(A1))
Wrap in an IFERROR() if no match is found at all.
If one has ExcelO365 and you refer to a range, things got a lot easier:
Formula in E1:
=TRANSPOSE(FILTER(C$1:C$3,ISNUMBER(FIND(C$1:C$3,A1))))
=INDEX(FILTER(C:C,C:C<>""),MATCH(1, COUNTIF(A1, "*"&FILTER(C:C,C:C<>"")&"*"), 0))
For use in office 365 version. If previous version replace FILTER(C:C,C:C<>"") with C$1:C$4 for your example or whatever your range of search values may be. Table reference is also possible.
The formula searches for the first match in your list of values if the text including your term contains a matching value anywhere in that text. It returns the first match.

Substracting part of cell

So lets say that in one row i have in 2 cells some data and I want to extract the data after the second "_" character:
| | A | B |
|---|:----------:|:---------------------:|
| 1 | 75875_QUWR | LALAHF_FHJ_75378_WZ44 | <- Input
| 2 | 75875_QUWR | 75378_WZ44 | <- Expected output
I tried using =RIGHT() function but than i will remove text from this first cell and so on, how can i write this function? Maybe I would compare this old cell and than to do if the second row is empty because maybe function deleted it to copy the one from first? No idea
Try:
=MID("_"&A1,FIND("#",SUBSTITUTE("_"&A1,"_","#",LEN("_"&A1)-LEN(SUBSTITUTE("_"&A1,"_",""))-1))+1,100)
Regardless of the times a "_" is present in your string, it will end up with the last two "words" in your string. Source
Use following formula.
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,"_","#",2))+1,100))

excel filter data with many headers

I have a really long excel spreadsheet which I need to sort in a really unusual way:
I have many columns, one of which is full of numbers and blank spaces. The column is cut into many parts and is separated by blank spaces. The blank spaces act as the beginning and the end of two areas.
What I need to do is to leave only the numbers that are bigger than 999999999 and smaller than 2000000000 while keeping only the blank spaces adjacent to them. (and filtering all other columns the same way this one column is filtered)
--- Example Table:
Name | ID................. | other data
Bob.. |......................|~-~-~---~-``~
Taxes | 1000077008 | ~~ -`~ `~ ~--
Alice |......................| ~~--~-~ ~_~
Carel |......................|~~ ~ ~--_ ~~
Beans | 2000007804 | ~ ~_~ `~ ~~ `
Coffee| 1000078363 | ~ ~-`--`-` `_~-
--- Example Filtered Table:
Name | ID................. | other data
Bob.. |......................|~-~-~---~-``~
Taxes | 1000077008 | ~~ -`~ `~ ~--
Carel.|......................|~~ ~ ~--_ ~~
Coffee| 1000078363 | ~ ~-`--`-` `_~-
The spaces in front of the numbers show that the format is Text. Change the format in Excel to General or to Numeric and use a custom filter to achieve what you want.
This is an example of a custom filter:
If you cannot change the format, then use the =INT(TRIM(A1)) formula in Excel and sort them.

Specific concatenation of text cells by formula

I'm having difficulty producing a CONCATENATE formula that combines text cells in the way that I want. There are five fields that I want to concatenate: Title, Forename, RegnalNumber, Surname, and Alias, in that order. I'm no regex expert, so excuse the poor formatting, but this is a rough way of expressing what I'm trying to achieve:
(title)? (forename) (regnalnumber)? (surname)?, (alias).
The only field that can't be null is the forename field, although it might have the value "?", in which case it shouldn't output anything in the concatenation, i.e. it should be treated as blank. Hopefully the following test cases should demonstrate the output I'm trying to achieve: the output on the right is what it should look like:
| Title | Forename | RN | Surname | Alias | CONCATENATE |
+--------+----------+----+-----------+--------------+---------------------------------------+
| Ser | Jaime | | Lannister | Kingslayer | Ser Jaime Lannister, Kingslayer |
| | Pate | | | | Pate |
| Lord | ? | | Vance | | Lord Vance |
| King | Aerys | II | Targaryen | The Mad King | King Aerys II Targaryen, The Mad King |
| Lord | Jon | | Arryn | | Lord Jon Arryn |
| | Garth | | | Of Oldtown | Garth, Of Oldtown |
I've experimented for ages trying to make this concatenation work, but haven't been able to get it right. This is the current formula, with cell references replaced by the field name for comprehensibility:
=CONCATENATE(IF(Title<>"",Title&" ",""),IF(AND(Forename<>"",Forename<>"?"),Forename,""),IF(RN<>""," "&RN,""), IF(OR(AND(Forename<>"", Forename<>"?"), Surname<>"", RN<>""), " ",""), IF(Surname<>"",Surname,""),IF(AND(Alias<>"",OR(Alias<>"",AND(Forename<>"", Forename<>"?"),Surname<>"")),", "&Alias, Alias))
There is one case where it doesn't work: if the Surname and RN are null but the the Forename and Alias are non-null. For example, if the Forename is Garth, and the Alias is Of Oldtown, the concatenation outputs: Garth , Of Oldtown. It's the same if the title is non-null. It shouldn't have a space before the comma.
Can you help me to fix this formula so it works as expected? If you can find a way to simplify it, even better! I know I'm probably overcomplicating this a great deal. I'm using LibreOffice Calc 4.3.1.2, not Excel.
The best way imho to solve situations like this is to divide the problem over multiple simple columns, rather than 1 huge complex formula. Remember you can always hide the columns that you don't want to see.
So create a column for Title that says =if(a2="","",a2&" ").
That can be extended for all the other columns, except:
for Forename, where you want to include the "?" as follows: =if(b2="?","",b2&" ")
for Alias, where you want to include the leading ",": =if(e2="","",", "&e2)
Lastly just concatenate each of your working columns with something like: =f2&g2&h2&i2&j2.
This breaks the problem down into very simple components, and makes it easy to debug. If you want to add extra functionality at a later stage, it is easy to swap out one of your formulae for something else.
I know this is only a bit of fun, but can I suggest a more algorithmic approach?
The algorithm is:-
If a field is empty or ?, do nothing
Else
If concatenation so far is empty, add field to concatenation
Else
Add a space followed by the field to concatenation
which leads to this formula in G2 :-
=IF(OR(A2="",A2="?"),F2,IF(F2="",A2,F2&" "&A2)
(need to put single apostrophes in column F to make it work)
which when copied across looks like this:-

Looking multiple values on multiple columns in excel

I have a table where each person has several "Job" columns. I need to find all the employees who has a specific value ("Actor") in one of their "Job" columns.
I thought about doing HLookup on multiple columns, but Lookup functions only returns the first match (and I'm not sure I can use it on multiple columns). I also tried Pivot Tables, but all I got is aggregation, not the exact matches. How can I solve it?
For example, from the sample data below, when looking for "Actor", I would like to get both "John, Doe" and "Todd, Dude"
Sample data:
Id | First Name | Last Name | email | Job1 | Job2 | Job3 | Job4
-----------------------------------------------------------------------------------
1 | John | Doe | jd#i.com | Actor | Photographer | Producer |
2 | Todd | Dude | sd#i.com | Lights | Actor | |
3 | Janis | Joplin | jj#i.com | Singer | | |
Assuming the table as you give it is in A1:H4 (with headers in row 1), and that you put e.g. "Actor" in J1, this array formula** in J2:
=IFERROR(INDEX($B$2:$B$4&" "&$C$2:$C$4,SMALL(IF($E$2:$H$4=J$1,ROW($E$2:$H$4)-MIN(ROW($E$2:$H$4))+1),ROWS($1:1))),"")
Copy down until you start to get blanks. The formula may also be copied across to give results for other professions listed in K1, L1, etc.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
I have no idea, how to it in one formula. All lookups and match returns only first reference. You could add column jobActor --- true if one of the jobs is Actor and then create pivot --- filter on jobAll, row names is person names.
Maybe advanced filter could be the way how to do it.

Resources