Im trying to add suffix to % Paid row in the dataframe, but im stuck with only adding suffix to the column names.
is there a way i can add suffix to a specific row values,
Any suggestions are highly appreciated.
d={
("Payments","Jan","NOS"):[],
("Payments","Feb","NOS"):[],
("Payments","Mar","NOS"):[],
}
d = pd.DataFrame(d)
d.loc["Total",("Payments","Jan","NOS")] = 9991
d.loc["Total",("Payments","Feb","NOS")] = 3638
d.loc["Total",("Payments","Mar","NOS")] = 5433
d.loc["Paid",("Payments","Jan","NOS")] = 139
d.loc["Paid",("Payments","Feb","NOS")] = 123
d.loc["Paid",("Payments","Mar","NOS")] = 20
d.loc["% Paid",("Payments","Jan","NOS")] = round((d.loc["Paid",("Payments","Jan","NOS")] / d.loc["Total",("Payments","Jan","NOS")])*100)
d.loc["% Paid",("Payments","Feb","NOS")] = round((d.loc["Paid",("Payments","Feb","NOS")] / d.loc["Total",("Payments","Feb","NOS")])*100)
d.loc["% Paid",("Payments","Mar","NOS")] = round((d.loc["Paid",("Payments","Mar","NOS")] / d.loc["Total",("Payments","Mar","NOS")])*100)
without suffix
I tried this way, it works but.. im looking for adding suffix for an entire row..
d.loc["% Paid",("Payments","Jan","NOS")] = str(round((d.loc["Paid",("Payments","Jan","NOS")] / d.loc["Total",("Payments","Jan","NOS")])*100)) + '%'
d.loc["% Paid",("Payments","Feb","NOS")] = str(round((d.loc["Paid",("Payments","Feb","NOS")] / d.loc["Total",("Payments","Feb","NOS")])*100)) + '%
d.loc["% Paid",("Payments","Mar","NOS")] = str(round((d.loc["Paid",("Payments","Mar","NOS")] / d.loc["Total",("Payments","Mar","NOS")])*100)) + '%'
with suffix
Select row separately by first index value, round and convert to integers, last to strings and add %:
d.loc["% Paid"] = d.loc["% Paid"].round().astype(int).astype(str).add(' %')
print (d)
Payments
Jan Feb Mar
NOS NOS NOS
Total 9991.0 3638.0 5433.0
Paid 139.0 123.0 20.0
% Paid 1 % 3 % 0 %
Related
Goal:
Add a column ('Team_url') to my nfl teams dataframe (df_teams) with each teams website-url.
Problem:
If I print the url, it works just fine. If I try to store it to df_teams['Team_url'], it only stores the last result of the iteritems.
Data:
df_teams['Team_web']
0 Arizona-Cardinals
1 Chicago-Bears
2 Green-Bay-Packers
3 New-York-Giants
4 Detroit-Lions
.
.
.
31 Houston-Texans
Code:
for i, j in df_teams['Team_web'].iteritems():
url_1 = "https://www.nfl.com/teams/{0}/roster".format(j)
df_teams['Team_url'] = url_1
Print:
print(url_1):
https://www.nfl. com/teams/Arizona-Cardinals/roster
https://www.nfl. com/teams/Chicago-Bears/roster
.
.
.
https://www.nfl.com/teams/Houston-Texans/roster
print(df_teams['Team_url'])
0 https://www.nfl.com/teams/Houston-Texans/roster
1 https://www.nfl.com/teams/Houston-Texans/roster
2 https://www.nfl.com/teams/Houston-Texans/roster
Questions:
How can I store what is printed for the url_1 in the dataframe column?
df['Team_url'] = 'https://www.nfl.com/teams/' + df['Team_web'] + '/roster'
so I want to do a fisher exact test (one sided) on every row of a 3000+ row table with a format matching the below example
gene
sample_alt
sample_ref
population_alt
population_ref
One
4
556
770
37000
Two
5
555
771
36999
Three
6
554
772
36998
I would ideally like to make another column of the table equivalent to
[(4+556)!(4+770)!(770+37000)!(556+37000)!]/[4!(556!)770!(37000!)(4+556+770+37000)!]
for the first row of data, and so on and so forth for each row of the table.
I know how to do a fisher test in R for simple 2x2 tables, but I wouldn't know how I would apply the fisher.test() function to each row of a large table. I also can't use an excel formula because the numbers get so big with the factorials that they reach excel's digit limit and result in a #NUM error. What's the best way to simply complete this? Thanks in advance!
Beginning with a tab-delimited text file on desktop (table.txt) with the same format as shown in the stem question
if(!require(psych)){install.packages("psych")}
multiFisher = function(file="Desktop/table.txt", saveit=TRUE,
outfile="Desktop/table.csv", progress=T,
verbose=FALSE, digits=3, ... )
{
require(psych)
Data = read.table(file, skip=1, header=F,
col.names=c("Gene", "MD", "WTD", "MC", "WTC"), ...)
if(verbose){print(str(Data))}
Data$Fisher.p = NA
Data$phi = NA
Data$OR1 = format(0.123, nsmall=3)
Data$OR2 = NA
if(progress){cat("\n")}
for(i in 1:length(Data$Gene)){
Matrix = matrix(c(Data$WTC[i],Data$MC[i],Data$WTD[i],Data$MD[i]), nrow=2)
Fisher = fisher.test(Matrix, alternative = 'greater')
Data$Fisher.p[i] = signif(Fisher$p.value, digits=digits)
Data$phi[i] = phi(Matrix, digits=digits)
OR1 = (Data$WTC[i]*Data$MD[i])/(Data$MC[i]*Data$WTD[i])
OR2 = 1 / OR1
Data$OR1[i] = format(signif(OR1, digits=digits), nsmall=3)
Data$OR2[i] = signif(OR2, digits=digits)
if(progress) {cat(".")}
}
if(progress){cat("\n"); cat("\n")}
if(saveit){write.csv(Data, outfile)}
return(Data)
}
multiFisher()
Given a dataframe as follows:
firstname lastname email_address \
0 Doug Watson douglas.watson#dignityhealth.org
1 Nick Holekamp nick.holekamp#rankenjordan.org
2 Rob Schreiner rob.schriener#wellstar.org
3 Austin Phillips austin.phillips#precmed.com
4 Elise Geiger egeiger#puracap.com
5 Paul Urick purick#diplomatpharmacy.com
6 Michael Obringer michael.obringer#lashgroup.com
7 Craig Heneghan cheneghan#west-ward.com
8 Kathy Hirst kathleen.hirst#sunovion.com
9 Stefan Bluemmers stefan.bluemmers#grunenthal.com
companyname
0 Dignity Health
1 Ranken Jordan Pediatric Bridge Hospital
2 WellStar Health System
3 Precision Medical Products, Inc.
4 puracap.com
5 Diplomat Specialty Pharmacy
6 Lash Group
7 West-Ward Pharmaceuticals
8 Sunovion Pharmaceuticals
9 GrĂ¼nenthal Group
How could I create possible email addresses using common email patterns as such: firstlast#example.com, first.last#example.com, f.last#example.com, lastF#example.com, first_last#example.com, firstL#example.com, etc.
df['email1'] = df.firstname.str.lower() + '.' + df.lastname.str.lower() + '#' + df.companyname.str.replace('\s+', '').str.lower() + '.com'
print(df['email1'])
Out:
0 doug.watson#dignityhealth.com
1 nick.holekamp#rankenjordanpediatricbridgehospi... --->problematic
2 rob.schreiner#wellstarhealthsystem.com
3 austin.phillips#precisionmedicalproducts,inc..com --->problematic
4 elise.geiger#puracap.com.com --->problematic
...
9995 terry.hanley#kempersportsmanagement.com
9996 christine.marks#geocomp.com
9997 darryl.rickner#doe.com
9998 lalit.sharma#lovelylifestyle.com
9999 parul.dutt#infibeam.com
Some of them seems quite problematic, anyone could help to solve this issue? Thanks a lot.
EDITED:
print(df) after applying #Sajith Herath's solution:
Out:
firstname lastname companyname \
0 Nick Holekamp Ranken ...
email
0 nick. ...
You can use a method to create permutations of username with different separators and define a max length that simplify the domain using company name as follows
import pandas as pd
import random
data = {"firstname":["Nick"],"lastname":["Holekamp"],"companyname":["Ranken \
Jordan Pediatric Bridge Hospital"]}
df = pd.DataFrame(data=data)
max_char = 5
emails = []
def simplify_domain(text):
if len(text)>max_char:
text = ''.join([c for c in text if c.isupper()])
return text.lower()
return text.replace("\s+","").lower()
def username_permutations(first_name,last_name):
# define separators
separators = [".", "_", "-"]
#lower case
combinations = list(map(lambda x:f"{first_name.lower()}{x} \
{last_name.lower()}",separators))
#append a random number to tail
n = random.randint(1, 100)
combinations.extend(list(map(lambda x:f"{x}{n}",combinations)))
return combinations
for index,row in df.iterrows():
usernames = username_permutations(row["firstname"],row["lastname"])
email_permutations = list(map(lambda x: f" \
{x}#{simplify_domain(row['companyname'])}.com",usernames))
emails.append(','.join(email_permutations))
df["email"] = emails
Final result will be nick.holekamp#rjpbh.com,nick_holekamp#rjpbh.com,nick-holekamp#rjpbh.com,nick.holekamp66#rjpbh.com,nick_holekamp66#rjpbh.com,nick-holekamp66#rjpbh.com
you can modify simplify_domain method to validate given string such as removing inc or .com values
I am quite new to Python and I am now struggling with printing my list in columns. It prints my lists in one columns only but I want it printed under 4 different titles. I know am missing something but can't seem to figure it out. Any advice would be really appreciated!
def createMyList():
myAgegroup = ['20 - 39','40 - 59','60 - 79']
mygroupTitle = ['Age','Underweight','Healthy','Overweight',]
myStatistics = [['Less than 21%','21 - 33','Greater than 33%',],['Less than 23%','23 - 35','Greater than 35%',],['Less than 25%','25 - 38','Greater than 38%',]]
printmyLists(myAgegroup,mygroupTitle,myStatistics)
return
def printmyLists(myAgegroup,mygroupTitle,myStatistics):
print(': Age : Underweight : Healthy : Overweight :')
for count in range(0, len(myAgegroup)):
print(myAgegroup[count])
for count in range(0, len(mygroupTitle)):
print(mygroupTitle[count])
for count in range(0, len(myStatistics)):
print(myStatistics[0][count])
return
createMyList()
To print data in nice columns is nice to know Format Specification Mini-Languag (doc). Also, to group data together, look at zip() builtin function (doc).
Example:
def createMyList():
myAgegroup = ['20 - 39','40 - 59','60 - 79']
mygroupTitle = ['Age', 'Underweight','Healthy','Overweight',]
myStatistics = [['Less than 21%','21 - 33','Greater than 33%',],['Less than 23%','23 - 35','Greater than 35%',],['Less than 25%','25 - 38','Greater than 38%',]]
printmyLists(myAgegroup,mygroupTitle,myStatistics)
def printmyLists(myAgegroup,mygroupTitle,myStatistics):
# print the header:
for title in mygroupTitle:
print('{:^20}'.format(title), end='')
print()
# print the columns:
for age, stats in zip(myAgegroup, myStatistics):
print('{:^20}'.format(age), end='')
for stat in stats:
print('{:^20}'.format(stat), end='')
print()
createMyList()
Prints:
Age Underweight Healthy Overweight
20 - 39 Less than 21% 21 - 33 Greater than 33%
40 - 59 Less than 23% 23 - 35 Greater than 35%
60 - 79 Less than 25% 25 - 38 Greater than 38%
I am trying for a roaster method, where I have want to loop over
the same list and print primary,secondary and change the primary and secondary every week,if weekday is monday.
team = [('abc', 123),('def', 343),('ghi', 345),('jkl', 453)]
Week 1:-
primary :- ('abc', 123)
secondary :- ('def', 343)
Week 2:-
primary:- ('ghi', 345)
secondary:- ('jkl', 453)
Week 3:-
primary:- ('jkl', 453)
secondary:-('abc', 123)
And so on.
team = [('abc', 123),('def', 343),('ghi', 345),('jkl', 453)]
count = 0
if week_day == 'Wed':
if True:
count += 1
print('count', count)
print('pri', team[count][0])
print('sec_name', team[count + 1])
In Python 3 you can use generators/. With itertools.cycle you can iterate over a list indefinitely:
import itertools as it
team = [('abc', 123), ('def', 343), ('ghi', 345), ('jkl', 453)]
pri_gen = it.cycle(team)
sec_gen = it.cycle(team)
next(sec_gen) # remove the first abc and start with def
for pri, sec in zip(pri_gen, sec_gen):
print(pri, sec)
# wait until next monday