Replace multiple substrings with blanks in Pandas

Replace multiple substrings with blanks in Pandas - python-3.x

I have a situation where I want to replace part of a string with blanks. For example, my columns looks something like this:
user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine
and the desired outcome I want is:
user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine
Essentially I would like to remove parts of strings as shown above such as "nothing in particular" , nothing specific" , "no comment" and "not much happening really"
and I am using the following code to achieve this:
def remove_no_comments(text):
text = re.sub(r"^nothing in particular", ' ', text)
text = re.sub(r"^nothing specific", ' ', text)
text = re.sub(r"^no comment", ' ', text)
text = re.sub(r"^not much happening really", ' ', text)
text = text.lower()
return text
df['user_comments_clean] = df['user_comments_clean].astype(str).apply(remove_no_comments)
But while using this, it is making my other user inputs as nan and I am really not sure what I am doing wrong here. Any possible solutions to resolve this?

You could use str.replace() along with a regex alternation:
terms = ["nothing in particular", "nothing specific", "no comment", "not much happening really"]
regex = r'^(?:' + r'|'.join(terms) + r')\b\s*'
df["user_comment_clean"] = df["user_comment"].str.replace(regex, '', regex=True)

Related

Changing multiple ROW values with in a single column, Followed but a filtering process

So my goal here is to filter for 2 service code(there will be hundreds in a single column) but in this case I need "4" and "CO4" that is the letter o capitalized not the number zero. FYI
Issues/Goals:
4 and CO4 have a space in them like CO4(space) this varies as in some may not have the space. Humans..am I right? lol
Filtering in an addition column called 'Is Void' for False values with the above two service codes.* this is where I believe my issue is
2a) this is because I lose a lot of data about 1700 rows with that code I will show in a bit.
Sample Data base:
My code: This has everything imported and data base in open too.
dfRTR = dfRTR[["Zone Name", "End Time", "Is Void", "Ticket Notes", "Service Code", "Unit Count", "Disposal Monitor Name", "Addr No","Addr St", "Ticket Number"]] #Columns I will use
dfRTR.replace("CO4 ","CO4") #Takes out (space) in CO4
dfRTR.replace("4 ", "4") #Takes out (space) in 4
filt = dfRTR[(dfRTR['Is Void'] == False) & (dfRTR["Service Code"].isin(["CO4 ", "4"]))] #my problem child.
If I take this code out I have all my data, but with it only about 700-800 Rows which is supposed to be around 1500-2000k Rows in the column "Is Void".
I have only been coding for about two months, not knowing how to replace two values at once in the same column, is another topic. I am trying to automate my whole audit which can take 4hrs to 2-3days depending on the project. Any advice is greatly appreciated.
EDIT:
So, if i manual make my exccel all text then run this code:
dfRTR['Service Code'] = dfRTR['Service Code'].str.replace(r'\W', "")
filt = dfRTR[(dfRTR['Is Void'] != True) & dfRTR["Service
Code"].isin(["CO4","4"])]
filt.to_excel('VecTest.xlsx')
And I can return all my data I need filtered. Only down side is that my date will have text formatting. I will try to automate one column in this case 'Service Code' to be text then run it again.
Edit part 2: Making the above file in a CVS mas the filtering process way easier. Problem is converting it back to an excel. Excel formulas have an issue with simple formula like =A1=B1 if both cells A1 and B1 have a value of 1 they will not match. CVS pulls away all the extra "Formatting" but in the Excel VBA code I use to format it make the cell give off this warning, even though the data being pulled is from CVS format.
VBA code makes the values appear with the excel warning:
"The number in this cell is formatted as a text or preceded with an apostrophe"
Conclusion:
I would need to make all my check using python before using CVS formatting.

I figured it out. My solution is a function but it does not need to be:
def getRTRVecCols(dfRTR):
# this is majority of special chars that may be in columns cells
spec_chars = ["!", '"', "#", "%", "&", "'", "(", ")",
"*", "+", ",", "-", ".", "/", ":", ";", "<",
"=", ">", "?", "#", "[", "\\", "]", "^", "_",
"`", "{", "|", "}", "~", "–"]
# this forloop will remove all those special chars for a specific column
for char in spec_chars:
dfRTR["Service Code"] = dfRTR["Service Code"].str.replace(char, ' ')
#with the above we may get white spaces. This is how to remove those
dfRTR["Service Code"] = dfRTR["Service Code"].str.split().str.join(" ")
# take in multiple values for in one column to filter for (specific for me)
codes = ["CO4", "4"]
#filtering our data set with multiple conditions. 2 conditions is in the same single column and another condition in a differnt column
dfRTR = dfRTR[(dfRTR["Is Void"] != True) & (dfRTR["Service Code"].isin(codes))]
#saves to a excel file
dfRTR.to_excel('RTR Vec Cols.xlsx')
#my function call
getRTRVecCols(dfRTR)
Now I Do get this warning, which I am still to noob in python to understand yet. But is next on my list to fix. But it works perfect for now*
FutureWarning: The default value of regex will change from True to False in a
future version. In addition, single character regular expressions will*not* be treated as literal strings when regex=True.
dfRTR["Service Code"] = dfRTR["Service Code"].str.replace(char, ' ')

Userform textbox fills in if Vlookup finds the information related to the numbers inserted

I have an userform where people have to fill in with data. If the data already exists, when they put the information in the DocumentTitleBox, other textboxes should automatically fill in.
My code works with letters, but not with numbers.
For example, when I put "aaa", it returns the vlookup values. But if I put "123", it won't do anything, even though there is a vlookup for it.
I cannot figure it out why. This is part of my code:
Private Sub DocumentTitleBox_Change()
On Error Resume Next
Result = WorksheetFunction.VLookup(DocumentTitleBox.Value, Worksheets("example").Range("D:E"), 2, False)
FIND = WorksheetFunction.VLookup(DocumentTitleBox.Value, Worksheets("example").Range("D:E"), 1, False)
On Error GoTo 0
If FIND = DocumentTitleBox.Value Then
NameBox.Value = Result
End If
Thank you in advance!

I always use this kind of thing. Could be cleaned up and stuff but I like the flexibility and I change stuff all the time so this works for me.
Private Sub DocumentTitleBox_Change()
If IsNumeric(DocumentTitleBox.Value) Then
ReturnRow = Application.IfError(Application.Match(DocumentTitleBox.Value + 0, Worksheets("example").Columns(4), 0), "Not Found")
Find = Application.IfError(Application.Index(Worksheets("example").Columns(5), ReturnRow), "Not Present")
Else
ReturnRow = Application.IfError(Application.Match(DocumentTitleBox.Value, Worksheets("example").Columns(4), 0), "Not Found")
Find = Application.IfError(Application.Index(Worksheets("example").Columns(5), ReturnRow), "Not Present")
End If
If Not Find Like "Not Present" Then
NameBox.Value = Find
Else
NameBox.Value = ""
End If
End Sub
PS: I don´t know how to avoid the match functions odd behaviour with strings/numbers so I just go with the +0 and IsNumeric trick. One thing to note is case sensitivity, adjust that as needed, right now its not.

If DocumentTitleBox is a text box, try using DocumentTitleBox.Text instead of DocumentTitleBox.Value.

How to color words of cells in xlsx

I have list of sentance, which i want to write to xlsx.
I have a second list with words. I want that all words from the second list that are also in the first list are supposed to be red.
My code so far will not color the words. Thanks for your help in advance.
import xlsxwriter
workbook = xlsxwriter.Workbook('strings.xlsx')
worksheet = workbook.add_worksheet()
red = workbook.add_format({'color': 'red'})
sentance = [
'HI guys i need',
'some help with',
'this.',
'Some stuff i ',
'allready tried',
'Thank you',
'For your help',
]
list_word=['you','help','tried','some more stoff','and more stuff']
worksheet.set_column('A:A', 40)
for row_num, sentance in enumerate(sentance):
format_pairs = []
for word in list_word:
find_word = word
for word in sentance:
if word == find_word:
format_pairs.extend((red, word))
else:
format_pairs.append(word)
worksheet.write_rich_string(row_num, 0, *format_pairs)
workbook.close()
EDIT:
I saw the other posted, the problem is that i have a list of words not just one, which i want to change the color for. I edited the code a littel.
The list is a variable and the sentance also, so they always change, so it makes no sence for me to write a code for just one or two words.

Please check the help document https://xlsxwriter.readthedocs.io/working_with_colors.html
It provides information on how to set colors for cells

Limit text to allowed characters only - (not by enumerating the wrong characters) | VBA

I would like to limit certain textboxes to accept only [A-Za-z]
I hope, a counterpart to Like exists.
With Like I would have to make a long list of not allowed characters to be able to filter.
Not MyString like [?;!°%/=....]
I can think of a solution in the form of:
For Counter = 1 To Len(MyString)
if Mid(MyString, Counter, 1) Like "*[a-z]*" = false then
MsgBox "String contains bad characters"
exit sub
end if
next
... but is there a more sophisticated 1liner solution ?
Until then, I have created a function to make it "Oneliner":
Function isPureString(myText As String) As Boolean
Dim i As Integer
isPureString = True
For i = 1 To Len(myText)
If Mid(myText, i, 1) Like "*[a-zA-Z_íéáűúőöüóÓÜÖÚŐŰÁÉÍ]*" = False Then
isPureString = False
End If
Next
End Function
If i add 1 more parameter, its also possible to define the allowed characters upon calling the function.

Ok, it seems my question was a bit of a duplicate, even though that did not pop in my search results.
So credits for #QHarr for posting the link.
The solution I can forge from that idea for my "oneliner" is:
If myText Like WorksheetFunction.Rept("[a-zA-Z]", Len(myText))=false then 'do something.
Using .rept is inspiringly clever and elegant in my oppinion.
So what is does: Multiplies the search criteria for each charater instead of looping through the characters.
EDIT:
In an overaboundance of nice and elegant solutions, the most recent leader is:
If not myText Like "*[!A-Za-z]*" then '... do something
Statistics update:
I have tested the last 3 solutions' performance:
I have pasted # in the below text strin at the beginning, at the end or nowhere.
The criteria were: "*[a-zA-Z \S.,]*"
For 100000 repetitions
text = "This will be a very Long text, with one unwanted in the middle, to be able to test the difference in performance of the approaches."
1.) Using the [!...] -> 30ms with error, 80ms if no error
2.) Using .Rept -> around 1800ms for all cases
3.) Using characterLoop+Mid -> around 3000ms if no error / 40-80ms ms if early error

Excel - VBA : Simplify this If statement comparing cell to words

Easy question here.
I am currently using this in my program :
If LCase(inp_rng.Offset(1, 0).Value) = "street" Or LCase(inp_rng.Offset(1, 0)) = "ave." Then
score = score - 50
End If
It is obviously not clean but I can find a way to put it in one sentence only. What is the programming way of writing something like this:
If LCase(inp_rng.Offset(1,0).Value = ("street", "ave.", "road", "...", etc.) Then
'do something
End If
Thanks in advance!

You can use Select Case statement instead:
i = LCase(inp_rng.Offset(1,0).Value
Select Case i
Case "street", "ave.", "road"
'do something
Case Else
'do something
End Select
Alternatively you can populate all possible answers in an array and search the array for a match.

You may use Filter() array function
http://msdn.microsoft.com/en-us/library/office/aa164525(v=office.10).aspx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace multiple substrings with blanks in Pandas - python-3.x

You could use str.replace() along with a regex alternation: terms = ["nothing in particular", "nothing specific", "no comment", "not much happening really"] regex = r'^(?:' + r'|'.join(terms) + r')\b\s*' df["user_comment_clean"] = df["user_comment"].str.replace(regex, '', regex=True)

Related

Changing multiple ROW values with in a single column, Followed but a filtering process

Userform textbox fills in if Vlookup finds the information related to the numbers inserted

How to color words of cells in xlsx

Limit text to allowed characters only - (not by enumerating the wrong characters) | VBA

Excel - VBA : Simplify this If statement comparing cell to words

Categories

Resources