Goal:
Add a column ('Team_url') to my nfl teams dataframe (df_teams) with each teams website-url.
Problem:
If I print the url, it works just fine. If I try to store it to df_teams['Team_url'], it only stores the last result of the iteritems.
Data:
df_teams['Team_web']
0 Arizona-Cardinals
1 Chicago-Bears
2 Green-Bay-Packers
3 New-York-Giants
4 Detroit-Lions
.
.
.
31 Houston-Texans
Code:
for i, j in df_teams['Team_web'].iteritems():
url_1 = "https://www.nfl.com/teams/{0}/roster".format(j)
df_teams['Team_url'] = url_1
Print:
print(url_1):
https://www.nfl. com/teams/Arizona-Cardinals/roster
https://www.nfl. com/teams/Chicago-Bears/roster
.
.
.
https://www.nfl.com/teams/Houston-Texans/roster
print(df_teams['Team_url'])
0 https://www.nfl.com/teams/Houston-Texans/roster
1 https://www.nfl.com/teams/Houston-Texans/roster
2 https://www.nfl.com/teams/Houston-Texans/roster
Questions:
How can I store what is printed for the url_1 in the dataframe column?
df['Team_url'] = 'https://www.nfl.com/teams/' + df['Team_web'] + '/roster'
Related
Im trying to add suffix to % Paid row in the dataframe, but im stuck with only adding suffix to the column names.
is there a way i can add suffix to a specific row values,
Any suggestions are highly appreciated.
d={
("Payments","Jan","NOS"):[],
("Payments","Feb","NOS"):[],
("Payments","Mar","NOS"):[],
}
d = pd.DataFrame(d)
d.loc["Total",("Payments","Jan","NOS")] = 9991
d.loc["Total",("Payments","Feb","NOS")] = 3638
d.loc["Total",("Payments","Mar","NOS")] = 5433
d.loc["Paid",("Payments","Jan","NOS")] = 139
d.loc["Paid",("Payments","Feb","NOS")] = 123
d.loc["Paid",("Payments","Mar","NOS")] = 20
d.loc["% Paid",("Payments","Jan","NOS")] = round((d.loc["Paid",("Payments","Jan","NOS")] / d.loc["Total",("Payments","Jan","NOS")])*100)
d.loc["% Paid",("Payments","Feb","NOS")] = round((d.loc["Paid",("Payments","Feb","NOS")] / d.loc["Total",("Payments","Feb","NOS")])*100)
d.loc["% Paid",("Payments","Mar","NOS")] = round((d.loc["Paid",("Payments","Mar","NOS")] / d.loc["Total",("Payments","Mar","NOS")])*100)
without suffix
I tried this way, it works but.. im looking for adding suffix for an entire row..
d.loc["% Paid",("Payments","Jan","NOS")] = str(round((d.loc["Paid",("Payments","Jan","NOS")] / d.loc["Total",("Payments","Jan","NOS")])*100)) + '%'
d.loc["% Paid",("Payments","Feb","NOS")] = str(round((d.loc["Paid",("Payments","Feb","NOS")] / d.loc["Total",("Payments","Feb","NOS")])*100)) + '%
d.loc["% Paid",("Payments","Mar","NOS")] = str(round((d.loc["Paid",("Payments","Mar","NOS")] / d.loc["Total",("Payments","Mar","NOS")])*100)) + '%'
with suffix
Select row separately by first index value, round and convert to integers, last to strings and add %:
d.loc["% Paid"] = d.loc["% Paid"].round().astype(int).astype(str).add(' %')
print (d)
Payments
Jan Feb Mar
NOS NOS NOS
Total 9991.0 3638.0 5433.0
Paid 139.0 123.0 20.0
% Paid 1 % 3 % 0 %
Station_ID
ID809086
t
ID809088
ID809089
.
.
ID809098
t
ID809100
.
.
.
Basically I want to replace all 't' with previous ID +1, such as t = ID809087 and ID809099 in above scenario. Thanks in advance
Basically I want to replace all 't' with previous ID +1, such as t = ID809087 and ID809099 in above scenario. Thanks in advance
Try:
tmp = 'ID' + df['Station_ID'].str.extract(r'^ID(\d+)')[0].astype(float).interpolate().astype(int).astype(str)
df['Station_ID'] = np.where(df['Station_ID'].eq('t'), tmp, df['Station_ID'])
print(df)
Prints:
Station_ID
0 ID809086
1 ID809087
2 ID809088
3 ID809089
4 ID809098
5 ID809099
6 ID809100
I would like to get the S&P 500 ['Adj Close'] column and replace the column with the corresponding stock symbol, however, I am not able to replace the dataframe columns because it gives me an error: KeyError: '5'
What I would like to achieve is to loop through all the available stocks from the list and replace the Adj Close with the stock symbol.
This is what I did:
First I have scraped the stock symbols from Wikipedia and added them to a list.
data = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
symbols = data[0] # get first column
symbols.head()
stock = symbols['Symbol'].to_list()
print(stock[0:5])
this gives me a list of stock symbols as below:
['MMM', 'ABT', 'ABBV', 'ABMD', 'ACN']
then I scraped Yahoo finance to get the daily financial data as below
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'
params = {
'range' : '1y',
'interval' : '1d',
'events' : 'history'
}
response = requests.get(stock_url.format(stock[0]), params=params)
file = StringIO(response.text)
reader = csv.reader(file)
data = list(reader)
df = pd.DataFrame(data)
stock_data = df['5']
Fix for key error
You are calling the the url using the list 'stock' and it gives a 404 response when I tried.
Call the URL with individual stock like below,
requests.get(stock_url.format(stock[0]), params=params)
Also do below, The column 5 is stored as integer instead of character. That is the reason you got 'key error'
stock_data = df[5]
I tried for stock 'MMM' - stock[0] and it prints below:
0 1 2 3 4 5 \
0 Date Open High Low Close Adj Close
1 2019-12-11 168.380005 168.839996 167.330002 168.740005 162.682480
2 2019-12-12 166.729996 170.850006 166.330002 168.559998 162.508926
3 2019-12-13 169.619995 171.119995 168.080002 168.789993 162.730667
4 2019-12-16 168.940002 170.830002 168.190002 170.750000 164.620316
.. ... ... ... ... ... ...
249 2020-12-04 172.130005 173.160004 171.539993 172.460007 172.460007
250 2020-12-07 171.720001 172.500000 169.179993 170.149994 170.149994
251 2020-12-08 169.740005 172.830002 169.699997 172.460007 172.460007
252 2020-12-09 172.669998 175.639999 171.929993 175.289993 175.289993
253 2020-12-10 174.869995 175.399994 172.690002 173.490005 173.490005
[254 rows x 7 columns]
Loop through stocks and replace Adj Close (Edited as per requirements from comments)
Code for looping through stocks and replacing Adj close with Stock symbol.
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'
params = {
'range' : '1y',
'interval' : '1d',
'events' : 'history'
}
df = pd.DataFrame()
for i in stock:
response = requests.get(stock_url.format(i), params=params)
file = io.StringIO(response.text)
reader = csv.reader(file)
data = list(reader)
df1 = pd.DataFrame(data)
df1.loc[df1[5] == 'Adj Close',5] = i
df = df.append(df1)
Tried the code for first 3 stocks and here it is:
Given a dataframe as follows:
firstname lastname email_address \
0 Doug Watson douglas.watson#dignityhealth.org
1 Nick Holekamp nick.holekamp#rankenjordan.org
2 Rob Schreiner rob.schriener#wellstar.org
3 Austin Phillips austin.phillips#precmed.com
4 Elise Geiger egeiger#puracap.com
5 Paul Urick purick#diplomatpharmacy.com
6 Michael Obringer michael.obringer#lashgroup.com
7 Craig Heneghan cheneghan#west-ward.com
8 Kathy Hirst kathleen.hirst#sunovion.com
9 Stefan Bluemmers stefan.bluemmers#grunenthal.com
companyname
0 Dignity Health
1 Ranken Jordan Pediatric Bridge Hospital
2 WellStar Health System
3 Precision Medical Products, Inc.
4 puracap.com
5 Diplomat Specialty Pharmacy
6 Lash Group
7 West-Ward Pharmaceuticals
8 Sunovion Pharmaceuticals
9 GrĂ¼nenthal Group
How could I create possible email addresses using common email patterns as such: firstlast#example.com, first.last#example.com, f.last#example.com, lastF#example.com, first_last#example.com, firstL#example.com, etc.
df['email1'] = df.firstname.str.lower() + '.' + df.lastname.str.lower() + '#' + df.companyname.str.replace('\s+', '').str.lower() + '.com'
print(df['email1'])
Out:
0 doug.watson#dignityhealth.com
1 nick.holekamp#rankenjordanpediatricbridgehospi... --->problematic
2 rob.schreiner#wellstarhealthsystem.com
3 austin.phillips#precisionmedicalproducts,inc..com --->problematic
4 elise.geiger#puracap.com.com --->problematic
...
9995 terry.hanley#kempersportsmanagement.com
9996 christine.marks#geocomp.com
9997 darryl.rickner#doe.com
9998 lalit.sharma#lovelylifestyle.com
9999 parul.dutt#infibeam.com
Some of them seems quite problematic, anyone could help to solve this issue? Thanks a lot.
EDITED:
print(df) after applying #Sajith Herath's solution:
Out:
firstname lastname companyname \
0 Nick Holekamp Ranken ...
email
0 nick. ...
You can use a method to create permutations of username with different separators and define a max length that simplify the domain using company name as follows
import pandas as pd
import random
data = {"firstname":["Nick"],"lastname":["Holekamp"],"companyname":["Ranken \
Jordan Pediatric Bridge Hospital"]}
df = pd.DataFrame(data=data)
max_char = 5
emails = []
def simplify_domain(text):
if len(text)>max_char:
text = ''.join([c for c in text if c.isupper()])
return text.lower()
return text.replace("\s+","").lower()
def username_permutations(first_name,last_name):
# define separators
separators = [".", "_", "-"]
#lower case
combinations = list(map(lambda x:f"{first_name.lower()}{x} \
{last_name.lower()}",separators))
#append a random number to tail
n = random.randint(1, 100)
combinations.extend(list(map(lambda x:f"{x}{n}",combinations)))
return combinations
for index,row in df.iterrows():
usernames = username_permutations(row["firstname"],row["lastname"])
email_permutations = list(map(lambda x: f" \
{x}#{simplify_domain(row['companyname'])}.com",usernames))
emails.append(','.join(email_permutations))
df["email"] = emails
Final result will be nick.holekamp#rjpbh.com,nick_holekamp#rjpbh.com,nick-holekamp#rjpbh.com,nick.holekamp66#rjpbh.com,nick_holekamp66#rjpbh.com,nick-holekamp66#rjpbh.com
you can modify simplify_domain method to validate given string such as removing inc or .com values
I am trying for a roaster method, where I have want to loop over
the same list and print primary,secondary and change the primary and secondary every week,if weekday is monday.
team = [('abc', 123),('def', 343),('ghi', 345),('jkl', 453)]
Week 1:-
primary :- ('abc', 123)
secondary :- ('def', 343)
Week 2:-
primary:- ('ghi', 345)
secondary:- ('jkl', 453)
Week 3:-
primary:- ('jkl', 453)
secondary:-('abc', 123)
And so on.
team = [('abc', 123),('def', 343),('ghi', 345),('jkl', 453)]
count = 0
if week_day == 'Wed':
if True:
count += 1
print('count', count)
print('pri', team[count][0])
print('sec_name', team[count + 1])
In Python 3 you can use generators/. With itertools.cycle you can iterate over a list indefinitely:
import itertools as it
team = [('abc', 123), ('def', 343), ('ghi', 345), ('jkl', 453)]
pri_gen = it.cycle(team)
sec_gen = it.cycle(team)
next(sec_gen) # remove the first abc and start with def
for pri, sec in zip(pri_gen, sec_gen):
print(pri, sec)
# wait until next monday