Referencing Column Titles That Contain Spaces in Python 3 (Pandas) - python-3.x

Simple question here: I am trying to reference two columns and then divide them, but the column titles contain spaces.
Title 1: First Word Sum
Title 2: First Word Clicks
When I try to do something like this, it doesn't work:
cvr = (final.First Word Sum / final.First Word Clicks)
How do I rectify this? Thank you

You cannot do it the way you are trying to. However the following would work:
cvr = final['First Word Sum'] / final['First Word Clicks']

Related

How do I drop complete rows (including all values in it) that contain a certain value in my Pandas dataframe?

I'm trying to write a python script that finds unique values (names) and reports the frequency of their occurrence, making use of Pandas library. There's a total of around 90 unique names, which I've anonymised in the head of the dataframe pasted below.
,1,2,3,4,5
0,monday09-01-2022,tuesday10-01-2022,wednesday11-01-2022,thursday12-01-2022,friday13-01-2022
1,Anonymous 1,Anonymous 1,Anonymous 1,Anonymous 1,
2,Anonymous 2,Anonymous 4,Anonymous 5,Anonymous 5,Anonymous 5
3,Anonymous 3,Anonymous 3,,Anonymous 6,Anonymous 3
4,,,,,
I'm trying to drop any row (the full row) that contains the regex expression "^monday.*", intending to indicate the word "monday" followed by any other number of random characters. I want to drop/deselect any cell/value within that row.
To achieve this goal, I've tried using the line of code below (and many other approaches I found on SO).
df = df[df[1].str.contains("^monday.*", case = True, regex=True) == False]
To clarify, I'm trying to search values of column "1" for the value "^.monday.*" and then deselecting the rows and all values in that row that match the regex expression. I've succesfully removed "monday09-01-2022" and "tuesday10-01-2022" etc.. but I'm also losing random names that are not in the matching rows.
Any help would be very much appreciated! Thank you!

Extract last word in string in R - error faced

First, I wish to extract the last word and first word for the Description column (this column contains at least 3 words) into a newly created column firstword and lastword. However, the word() function is not applied to all the rows. As such, there are many rows with empty lastword, though these rows actually have a last word (as you can see from the Description column). This is shown in the first two lines of codes.
Second, I am also trying to get the third line of code to replace the lastword with firstword, if lastword is empty. However it isn't working.
Is there a way to rectify this?
c1$lastword = word(c1$Description,start=-1) #extract last word
c1$firstword = word(c1$Description,start=1) #extract first word
c1$lastword=ifelse(c1$lastword == " ", c1$firstword, c1$lastword)
I realise that there is white space at the beginning of some of the rows of the Description variable, which isn't shown when viewed in R.
Removing the whitespace using stri_trim() solved the issue.
c1$Description = stri_trim(c1$Description, "left") #remove whitespace

Openpyxl to check for keywords, then modify next to cells to contain those keywords and total found

I'm using python 3.x and openpyxl to parse an excel .xlsx file.
For each row, I check a column (C) to see if any of those keywords match.
If so, I add them to a separate list variable and also determine how many keywords were matched.
I then want to add the actual keywords into the next cell, and the total of keywords into the cell after. This is where I am having trouble, actually writing the results.
contents of the keywords.txt and results.xlsx file
here
import openpyxl
# Here I read a keywords.txt file and input them into a keywords variable
# I throwaway the first line to prevent a mismatch due to the unicode BOM
with open("keywords.txt") as f:
f.readline()
keywords = [line.rstrip("\n") for line in f]
# Load the workbook
wb = openpyxl.load_workbook("results.xlsx")
ws = wb.get_sheet_by_name("Sheet")
# Iterate through every row, only looking in column C for the keyword match.
for row in ws.iter_rows("C{}:E{}".format(ws.min_row, ws.max_row)):
# if there's a match, add to the keywords_found list
keywords_found = [key for key in keywords if key in row[0].value]
# if any keywords found, enter the keywords in column D
# and how many keywords into column E
if len(keywords_found):
row[1].value = keywords_found
row[2].value = len(keywords_found)
Now, I understand where I'm going wrong, in that ws.iter_rows(..) returns a tuple, which can't be modified. I figure I could two for loops, one for each row, and another for the columns in each row, but this test is a small example of a real-world scenario, where the amount of rows are in the tens of thousands.
I'm not quite sure which is the best way to go about this. Thankyou in advance for any help that you can provide.
Use the ws['C'] and then the offset() method of the relevant cell.
Thanks Charlie for the offset() tip. I modified the code slightly and now it works a treat.
for row in ws.iter_rows("C{}:C{}"...)
for cell in row:
....
if len(keywords_found):
cell.offset(0,1).value = str(keywords_found)
cell.offset(0,2).value = str(len(keywords_found))

Find string (from table) in cell in matlab

I want to find the location of one string (which I take it from a table) inside of a cell:
A is my table, and B is the cell.
I have tested :
strncmp(A(1,8),B(:,1),1)
but it couldn't find the location.
I have tested many commands like:
ismember,strmatch,find(strcmp),find(strcmpi)find(ismember),strfind and etc ... but they all give me errors mostly because of the type of my data !
So please suggest me a solution.
You want strfind:
>> strfind('0123abcdefgcde', 'cde')
ans =
7 12
If A is a table and B a cell array, you need to index this way:
strfind(B{1}, A.VarName{1});
For example:
>> A = cell2table({'cde'},'VariableNames',{'VarName'}); %// create A as table
>> B = {'0123abcdefgcde'}; %// create B as cell array of strings
>> strfind(B{1}, A.VarName{1})
ans =
7 12
Luis Mendo's answer is absolotely correct, but I want to add some general information.
Your problem is that all the functions you tried (strfind, ...) only work for normal strings, but not for cell array. The way you index your A and B in your code snippet they still stay a cell array (of dimension (1,1)). You need to use curly brackets {} to "get rid of" the cell array and get the containign string. Luis Mendo shows how to do this.
Modified solution from a Mathworks forum, for the case of a single-column table with ragged strings
find(strcmp('mystring',mytable{:,:}))
will give you the row number.

in excel, I want to count the number of cells that do not contain a specific character

in excel, I want to count the number of cells that do not contain a specific character (in this case, a "." /period).
I tried something like countif(A1:A10,"<>.*") but this is wrong and I can't seem to figure it out.
Say I have these data in column A:
D
N
P
.
.
A
N
.
P
.
And the count would be 6
For your example:
=COUNTIF(A1:A10,"<>.")
returns 6. But it would be a different story if say you wanted to exclude P. from the count also.
Your data may not be quite what you think it is however, because including the * should make no difference for your example.
Or you could subtract periods from the total and be left with the non periods
=COUNTIF(A1:A10,"*")-COUNTIF(A1:A10,"=.")
gives 6.
If your data includes periods along with other characters in the same cell and want a similar count:
then this:
=COUNTA(A1:A10)-COUNTIF(A1:A10,"*.*")
will return 5

Resources