I have a dataframe as below
0 1 2 3 4 5
0 0.428519 0.000000 0.0 0.541096 0.250099 0.345604
1 0.056650 0.000000 0.0 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.0 0.000000 0.000000 0.000000
3 0.849066 0.559117 0.0 0.374447 0.424247 0.586254
4 0.317644 0.000000 0.0 0.271171 0.586686 0.424560
I would like to modify it as below
0 0 0.428519
0 1 0.000000
0 2 0.0
0 3 0.541096
0 4 0.250099
0 5 0.345604
1 0 0.056650
1 1 0.000000
........
Use stack with reset_index:
df1 = df.stack().reset_index()
df1.columns = ['col1','col2','col3']
print (df1)
col1 col2 col3
0 0 0 0.428519
1 0 1 0.000000
2 0 2 0.000000
3 0 3 0.541096
4 0 4 0.250099
5 0 5 0.345604
6 1 0 0.056650
7 1 1 0.000000
8 1 2 0.000000
9 1 3 0.000000
10 1 4 0.000000
11 1 5 0.000000
12 2 0 0.000000
13 2 1 0.000000
14 2 2 0.000000
15 2 3 0.000000
16 2 4 0.000000
17 2 5 0.000000
18 3 0 0.849066
19 3 1 0.559117
20 3 2 0.000000
21 3 3 0.374447
22 3 4 0.424247
23 3 5 0.586254
24 4 0 0.317644
25 4 1 0.000000
26 4 2 0.000000
27 4 3 0.271171
28 4 4 0.586686
29 4 5 0.424560
Numpy solution with numpy.tile and numpy.repeat, flattening is by numpy.ravel:
df2 = pd.DataFrame({
"col1": np.repeat(df.index, len(df.columns)),
"col2": np.tile(df.columns, len(df.index)),
"col3": df.values.ravel()})
print (df2)
col1 col2 col3
0 0 0 0.428519
1 0 1 0.000000
2 0 2 0.000000
3 0 3 0.541096
4 0 4 0.250099
5 0 5 0.345604
6 1 0 0.056650
7 1 1 0.000000
8 1 2 0.000000
9 1 3 0.000000
10 1 4 0.000000
11 1 5 0.000000
12 2 0 0.000000
13 2 1 0.000000
14 2 2 0.000000
15 2 3 0.000000
16 2 4 0.000000
17 2 5 0.000000
18 3 0 0.849066
19 3 1 0.559117
20 3 2 0.000000
21 3 3 0.374447
22 3 4 0.424247
23 3 5 0.586254
24 4 0 0.317644
25 4 1 0.000000
26 4 2 0.000000
27 4 3 0.271171
28 4 4 0.586686
29 4 5 0.424560
Related
So far I have:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Firefox()
url = r"https://rotogrinders.com/game-stats/nba-player?site=fanduel&range=yesterday"
driver.get(url)
cookies = driver.find_element_by_xpath(r'//*[#id="bc-close-cookie"]').click()
select = driver.find_element_by_xpath(r'/html/body/div[1]/div/section/div/section/div[2]/div[2]/a[2]').click()
I need to scrape the table data to a .csv file.
Any suggestions?
Because of the subscription pop-up:
When this error occurs, click on no thank you before clicking on the all button.
Selenium is overkill here because the data is within the <script> tags. You could get it through a simple request using requests, then either a) use BeautifulSoup to pull out the <script> tag, then parse it, or b) pull it straight away using regex. I chose the latter.
It comes in a nice json format meaning turning into a table is relatively easy with pandas. You'll also use pnadas to write to csv.
import requests
import pandas as pd
import re
import json
url = 'https://rotogrinders.com/game-stats/nba-player?site=fanduel&range=yesterday'
response = requests.get(url)
p = re.compile("var data = (\[.*\])")
result = p.search(response.text)
jsonStr = result.group(1)
jsonData = json.loads(jsonStr)
df = pd.DataFrame(jsonData)
df.to_csv('nba.csv', index=False)
Output:
print(df.to_string())
id name fpts gp fgm fga ftm fta 3pm 3pa 2pm 2pa reb ast stl blk to pts oreb pfoul tfoul ffoul min fouls dd td usg pace fg% ft% 3p% 2p% team pos player
0 947 Lance Stephenson 6.70 1 1 1 2 2 0 0 1 1 1 3 0 0 3 4 0 1 0 0 17.80 0 0 0 13.19 5 1 1 0 1 IND SCW Lance Stephenson
1 1079 Stephen Curry 58.00 1 12 27 9 9 6 16 6 11 5 8 1 0 2 39 0 3 0 0 43.95 0 0 0 32.40 33.50 0.44 1 0.38 0.55 GSW CG Stephen Curry
2 1087 Chris Paul 51.50 1 8 14 2 2 2 6 6 8 5 11 2 1 0 20 0 3 0 0 36.68 0 1 0 20.19 15 0.57 1 0.33 0.75 PHO DIS Chris Paul
3 1277 Taj Gibson 12.50 1 1 3 0 0 0 0 1 3 5 1 0 1 0 2 1 4 0 0 17.95 0 0 0 7.42 2 0.33 0 0 0.33 NYK PB Taj Gibson
4 1334 Andre Iguodala 24.00 1 1 2 2 2 0 0 1 2 5 4 0 4 4 4 2 2 0 0 31.32 0 0 0 10.47 5 0.50 1 0 0.50 GSW 3DW Andre Iguodala
5 1485 JaVale McGee 10.80 1 3 5 2 2 0 0 3 5 4 0 0 0 2 8 1 3 0 0 17.82 0 0 0 17.69 7 0.60 1 0 0.60 PHO PB JaVale McGee
6 13301 Jonas Valanciunas 39.00 1 8 11 1 2 1 4 7 7 10 2 1 2 3 18 1 5 0 0 33.03 0 1 0 18.82 14 0.73 0.50 0.25 1 NOP PB Jonas Valanciunas
7 13312 Bismack Biyombo 19.30 1 2 3 5 7 0 0 2 3 4 1 2 0 2 9 2 3 0 0 27.90 0 0 0 12.06 6.50 0.67 0.71 0 0.67 PHO PB Bismack Biyombo
8 13315 Klay Thompson 13.90 1 6 17 0 0 0 7 6 10 2 1 0 0 2 12 1 1 0 0 23.10 0 0 0 33.47 18 0.35 0 0 0.60 GSW 3DW Klay Thompson
9 13335 Kemba Walker 11.40 1 1 5 3 4 0 3 1 2 2 2 1 0 2 5 0 2 1 0 21.17 0 0 0 17.80 9 0.20 0.75 0 0.50 NYK CG Kemba Walker
10 13353 Alec Burks 19.60 1 3 9 5 5 2 6 1 3 3 2 1 0 3 13 1 3 0 0 22.58 0 0 0 26.32 13.50 0.33 1 0.33 0.33 NYK CG Alec Burks
11 13913 Evan Fournier 10.10 1 2 8 1 2 1 6 1 2 3 1 0 0 1 6 1 1 0 0 25.15 0 0 0 16.24 9 0.25 0.50 0.17 0.50 NYK SHW Evan Fournier
12 13942 Jae Crowder 21.60 1 4 9 2 2 3 6 1 3 3 0 2 0 1 13 0 3 0 0 32.65 0 0 0 13.33 11 0.44 1 0.50 0.33 PHO 3DW Jae Crowder
13 13955 Jeremy Lamb 27.40 1 2 5 8 10 2 3 0 2 2 2 2 1 1 14 1 2 0 0 18.88 0 0 0 23.43 10 0.40 0.80 0.67 0 IND SCW Jeremy Lamb
14 14077 Garrett Temple 0.20 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 10.88 0 0 0 3.68 1 0 0 0 0 NOP SHW Garrett Temple
15 14559 Justin Holiday 23.50 1 6 13 0 0 4 9 2 4 5 3 0 0 3 16 1 2 0 0 34.22 0 0 0 19.86 15 0.46 0 0.44 0.50 IND SHW Justin Holiday
16 16879 Tim Hardaway Jr. 18.50 1 4 10 0 0 3 7 1 3 5 1 0 0 0 11 0 3 0 0 28.28 0 0 0 14.61 10 0.40 0 0.43 0.33 DAL SHW Tim Hardaway Jr.
17 16943 Reggie Bullock 11.70 1 1 3 0 0 1 3 0 0 6 1 0 0 0 3 1 1 0 0 20.18 0 0 0 6.60 2 0.33 0 0.33 0 DAL 3DW Reggie Bullock
18 18566 Dwight Powell 8.60 1 2 3 2 3 0 0 2 3 3 0 0 0 1 6 2 1 0 0 14.35 0 0 0 14.83 3.50 0.67 0.67 0 0.67 DAL VB Dwight Powell
19 18620 Andrew Wiggins 22.80 1 5 15 0 1 1 6 4 9 4 2 1 1 2 11 2 5 0 0 38.03 0 0 0 19.04 15.50 0.33 0 0.17 0.44 GSW SCW Andrew Wiggins
20 18632 Julius Randle 24.40 1 1 9 2 4 0 2 1 7 7 6 1 1 3 4 2 1 1 0 29.48 0 0 0 21.36 12 0.11 0.50 0 0.14 NYK VB Julius Randle
21 18899 Kristaps Porzingis 37.70 1 7 15 2 2 2 4 5 11 11 1 0 2 1 18 2 5 0 0 32.28 0 1 0 21.33 15 0.47 1 0.50 0.45 DAL VB Kristaps Porzingis
22 18941 Cameron Payne 18.60 1 4 10 3 3 1 4 3 6 3 0 1 0 0 12 0 1 1 0 18.93 0 0 0 23.92 11.50 0.40 1 0.25 0.50 PHO DIS Cameron Payne
23 18945 Kevon Looney 36.50 1 5 6 3 4 0 0 5 6 15 3 2 0 5 13 6 3 0 1 28.17 0 1 0 19.52 7 0.83 0.75 0 0.83 GSW PB Kevon Looney
24 18949 Devin Booker 41.00 1 11 25 5 5 1 8 10 17 5 6 0 0 2 28 0 2 1 0 38.30 0 0 0 32.56 29.50 0.44 1 0.13 0.59 PHO SCW Devin Booker
25 31814 Nemanja Bjelica 22.20 1 4 5 0 0 0 1 4 4 6 2 1 1 2 8 1 1 0 0 14.13 0 0 0 21.68 6 0.80 0 0 1 GSW VF Nemanja Bjelica
26 35227 Brandon Ingram 30.00 1 4 10 6 8 1 1 3 9 5 6 1 0 3 15 0 2 0 0 26.88 0 0 0 27.53 17 0.40 0.75 1 0.33 NOP SCW Brandon Ingram
27 35995 Damion Lee 14.00 1 3 5 2 2 1 2 2 3 0 0 2 0 1 9 0 3 0 0 14 0 0 0 19.66 7 0.60 1 0.50 0.67 GSW CG Damion Lee
28 36032 Dorian Finney-Smith 14.20 1 2 6 0 0 1 5 1 1 6 2 0 0 1 5 1 4 0 0 32 0 0 0 9.57 6 0.33 0 0.20 1 DAL 3DW Dorian Finney-Smith
29 36041 Gary Payton II 18.00 1 3 5 0 1 0 2 3 3 5 0 2 0 0 6 1 1 1 0 17.40 0 0 0 12.51 4.50 0.60 0 0 1 GSW CG Gary Payton II
30 37796 Frank Ntilikina 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1.70 0 0 0 0.00 0 0 0 0 0 DAL DIS Frank Ntilikina
31 37845 Maxi Kleber 22.00 1 3 9 2 2 1 7 2 2 5 4 0 1 2 9 1 2 0 0 27.13 0 0 0 19.46 11 0.33 1 0.14 1 DAL VB Maxi Kleber
32 37861 Torrey Craig 32.90 1 5 10 0 0 2 5 3 5 7 1 3 1 1 12 0 3 0 0 34.22 0 0 0 13.24 11 0.50 0 0.40 0.60 IND 3DW Torrey Craig
33 37889 Josh Hart 34.40 1 5 10 5 9 2 4 3 6 7 4 1 0 0 17 1 4 0 0 35.28 0 0 0 17.32 13.50 0.50 0.56 0.50 0.50 NOP SHW Josh Hart
34 408821 Mitchell Robinson 37.00 1 6 7 5 10 0 0 6 7 15 0 0 1 1 17 7 3 0 0 30.05 0 1 0 16.51 6 0.86 0.50 0 0.86 NYK PB Mitchell Robinson
35 408841 Landry Shamet 4.20 1 0 2 0 0 0 2 0 0 1 0 1 0 0 0 0 3 0 0 10.07 0 0 0 7.94 2 0 0 0 0 PHO SHW Landry Shamet
36 408842 Mikal Bridges 42.60 1 5 11 2 2 0 4 5 7 8 6 4 0 0 12 2 1 0 0 37.18 0 0 0 14.91 10 0.45 1 0 0.71 PHO 3DW Mikal Bridges
37 408968 Devonte' Graham 19.10 1 5 16 1 1 4 10 1 6 3 1 0 0 1 15 0 1 0 0 28.03 0 0 0 25.36 17.50 0.31 1 0.40 0.17 NOP CG Devonte' Graham
38 408971 Luka Doncic 44.60 1 9 23 8 11 2 9 7 14 8 8 1 0 8 28 1 1 0 0 38.13 0 0 0 40.37 35.50 0.39 0.73 0.22 0.50 DAL CG Luka Doncic
39 409062 Jalen Brunson 20.50 1 8 13 2 2 1 2 7 11 5 1 0 0 6 19 2 3 0 0 34.75 0 0 0 23.26 18 0.62 1 0.50 0.64 DAL DIS Jalen Brunson
40 570141 Gary Clark 3.50 1 1 5 0 0 1 4 0 1 0 1 0 0 1 3 0 0 0 0 7.95 0 0 0 31.85 6 0.20 0 0.25 0 NOP VF Gary Clark
41 1115150 RJ Barrett 30.20 1 6 13 4 7 1 5 5 8 6 2 0 1 0 17 1 1 0 0 27.98 0 0 0 23.93 15.50 0.46 0.57 0.20 0.63 NYK SCW RJ Barrett
42 1115261 Goga Bitadze 32.30 1 5 12 2 5 1 4 4 8 9 5 0 1 2 13 4 2 2 0 31.35 0 0 0 22.78 12.50 0.42 0.40 0.25 0.50 IND VB Goga Bitadze
43 1115443 Jaxson Hayes 8.90 1 3 4 0 0 0 0 3 4 2 1 0 0 1 6 0 5 0 0 16.47 0 0 0 12.94 5 0.75 0 0 0.75 NOP PB Jaxson Hayes
44 1115445 Nickeil Alexander-Walker 13.60 1 1 4 2 2 0 0 1 4 3 4 0 0 0 4 0 3 0 0 24.42 0 0 0 10.16 5 0.25 1 0 0.25 NOP CG Nickeil Alexander-Walker
45 1115568 Jordan Poole 13.40 1 1 7 3 4 0 5 1 2 2 4 0 0 0 5 0 2 0 0 24.67 0 0 0 16.34 9 0.14 0.75 0 0.50 GSW CG Jordan Poole
46 1115625 Cam Johnson 19.90 1 2 7 2 2 1 6 1 1 7 1 1 0 0 7 0 2 0 0 20.47 0 0 0 16.04 8 0.29 1 0.17 1 PHO SHW Cam Johnson
47 1311751 Oshae Brissett 12.70 1 0 7 2 4 0 3 0 4 6 1 1 0 1 2 0 1 0 0 21.98 0 0 0 18.36 10 0 0.50 0 0 IND SHW Oshae Brissett
48 1333886 Juan Toscano-Anderson 15.30 1 2 5 0 0 1 3 1 2 4 1 2 0 2 5 1 4 0 0 14.87 0 0 0 19.72 6 0.40 0 0.33 0.50 GSW SHW Juan Toscano-Anderson
49 2439136 Obi Toppin 5.40 1 0 1 0 0 0 1 0 0 2 0 0 1 0 0 0 1 0 0 18.52 0 0 0 2.16 1 0 0 0 0 NYK VB Obi Toppin
50 2439366 Josh Green 4.70 1 1 3 0 0 0 0 1 3 1 1 0 0 0 2 0 1 0 0 11.18 0 0 0 11.91 3 0.33 0 0 0.33 DAL 3DW Josh Green
51 2439514 Immanuel Quickley 31.30 1 4 13 4 4 2 9 2 4 4 5 3 0 4 14 0 1 0 0 23.28 0 0 0 35.07 19 0.31 1 0.22 0.50 NYK SHW Immanuel Quickley
52 3005116 Quentin Grimes 16.20 1 5 9 0 0 3 6 2 3 1 2 0 0 1 13 1 4 0 0 23.83 0 0 0 17.89 9 0.56 0 0.50 0.67 NYK 3DW Quentin Grimes
53 3005227 Isaiah Jackson 33.90 1 5 12 5 8 0 1 5 11 7 1 3 0 0 15 3 5 0 0 18.63 0 0 0 34.03 13 0.42 0.63 0 0.45 IND VF Isaiah Jackson
54 3005228 Chris Duarte 48.90 1 10 16 5 5 2 3 8 13 7 3 3 0 0 27 2 3 0 0 38.53 0 0 0 19.92 16.50 0.63 1 0.67 0.62 IND SCW Chris Duarte
55 3005417 Herb Jones 28.80 1 5 7 0 0 1 3 4 4 4 4 3 0 2 11 1 3 0 0 37.08 0 0 0 11.13 8 0.71 0 0.33 1 NOP 3DW Herb Jones
56 3007111 Jonathan Kuminga 14.50 1 0 0 5 6 0 0 0 0 5 3 0 0 1 5 1 3 1 0 15.37 0 0 0 12.05 3 0 0.83 0 0 GSW VF Jonathan Kuminga
57 3014026 Keifer Sykes 17.30 1 4 12 0 0 2 5 2 7 4 3 0 0 2 10 2 3 0 0 31.02 0 0 0 19.33 12 0.33 0 0.40 0.29 IND DIS Keifer Sykes
58 3015595 Duane Washington 8.50 1 3 7 0 0 2 2 1 5 0 1 0 0 1 8 0 1 0 0 18.37 0 0 0 18.14 8 0.43 0 1 0.20 IND SHW Duane Washington
59 3015773 Jose Alvarado 31.00 1 6 9 0 0 1 2 5 7 0 4 4 0 0 13 0 3 0 0 19.97 0 0 0 20.67 9 0.67 0 0.50 0.71 NOP DIS Jose Alvarado
My objective is to find the best code from the dataframe.For an Id No and pdt_No i have to find out the best code using the prob value.So i tried Rank method but it gives overall rank not specific to each Id No.
Input
Id No Pdt_No code prob
1 pdt1 HHL 0.000000
1 pdt3 HHL 50.000000
1 pdt2 HHL 0.000000
1 pdt5 HHL 50.000000
1 pdt8 HHL 100.000000
1 pdt1 HHL 50.000000
1 pdt2 HHL 100.000000
3 pdt1 HHM 0.000000
3 pdt1 HHM 0.000000
3 pdt1 HHM 25.000000
3 pdt4 HHM 33.333333
3 pdt3 HHM 33.333333
3 pdt2 HHM 0.000000
3 pdt2 HHM 50.000000
4 pdt5 ERS 0.000000
4 pdt2 ERS 0.000000
4 pdt2 MKL 100.000000
4 pdt2 MKL 50.000000
4 pdt5 MKL 5.000000
5 pdt1 MKM 0.000000
5 pdt1 MKM 100.000000
5 pdt1 MKM 33.333333
5 pdt1 LPM 63.333333
5 pdt2 LPM 0.000000
5 pdt2 LPM 0.000000
5 pdt2 LPM 33.333333
5 pdt2 LPM 100.000000
what I have tried is
df['rank']=df.groupby(['Id No','Pdt_No'])['prob'].rank(ascending=False)
output
Id No Pdt_No code prob rank
1 pdt1 HHL 0.000000 2
1 pdt3 HHL 50.000000 1
1 pdt2 HHL 0.000000 2
1 pdt5 HHL 50.000000 1
1 pdt8 HHL 100.000000 1
1 pdt1 HHL 50.000000 1
1 pdt2 HHL 100.000000 1
3 pdt1 HHM 0.000000 2
3 pdt1 HHM 0.000000 2
3 pdt1 HHM 25.000000 1
3 pdt4 HHM 33.333333 1
3 pdt3 HHM 33.333333 1
3 pdt2 HHM 0.000000 2
3 pdt2 HHM 50.000000 1
4 pdt5 ERS 0.000000 2
4 pdt2 ERS 0.000000 3
4 pdt2 MKL 100.000000 1
4 pdt2 MKL 50.000000 2
4 pdt5 MKL 5.000000 1
5 pdt1 MKM 0.000000 4
5 pdt1 MKM 100.000000 1
5 pdt1 MKM 33.333333 3
5 pdt1 LPM 63.333333 2
5 pdt2 LPM 0.000000 3
5 pdt2 LPM 0.000000 3
5 pdt2 LPM 33.333333 2
5 pdt2 LPM 100.000000 1
I would like to concatenate two frames. Can do so as well.
However, while doing so date format is auto changing which is untended and needs to resolve. I've a column called EVENT_DATE in 'YYYY-MM-DD' format. But its being changed.
Here loading a sample TSV formatted data to data frame
>>>df1 = pd.read_csv('detail_trend_analysis_data.csv',delimiter='|', parse_dates=[0])
>>>df1.head()
EVENT_DATE EVENT_HOUR PRODUCT ... BONUS_VOLUME BONUS_COST RECORD_COUNT
0 2019-11-08 0 1 ... 0.0 220152.426342 287516
1 2019-11-08 0 1 ... 0.0 0.000000 3104
2 2019-11-08 0 1 ... 0.0 226544.777596 254965
3 2019-11-08 0 1 ... 0.0 0.000000 2449
4 2019-11-08 0 1 ... 0.0 0.000000 35085
[5 rows x 18 columns]
Doing Same thing
>>>df2 = pd.read_csv('detail_trend_analysis_data.csv',delimiter='|', parse_dates=[0])
Changing the date
>>>df2['EVENT_DATE']='2019-11-09'
>>>df2.head()
EVENT_DATE EVENT_HOUR PRODUCT ... BONUS_VOLUME BONUS_COST RECORD_COUNT
0 2019-11-09 0 1 ... 0.0 220152.426342 287516
1 2019-11-09 0 1 ... 0.0 0.000000 3104
2 2019-11-09 0 1 ... 0.0 226544.777596 254965
3 2019-11-09 0 1 ... 0.0 0.000000 2449
4 2019-11-09 0 1 ... 0.0 0.000000 35085
[5 rows x 18 columns]
Concatenating
>>>frames=[df1,df2]
>>>df=pd.concat(frames)
>>>df.head()
EVENT_DATE EVENT_HOUR ... BONUS_COST RECORD_COUNT
0 2019-11-08 00:00:00 0 ... 220152.426342 287516
1 2019-11-08 00:00:00 0 ... 0.000000 3104
2 2019-11-08 00:00:00 0 ... 226544.777596 254965
3 2019-11-08 00:00:00 0 ... 0.000000 2449
4 2019-11-08 00:00:00 0 ... 0.000000 35085
[5 rows x 18 columns]
But at the end time changes to 'YYY-MM-DD HH24:MI:SS' which I don't want.
How to resolve this?
what if you set df['EVENT_DATE] = df['EVENT_DATE'].dt.date on both dataframes?
I have a dataframe like this
import pandas as pd
raw_data = {'ID':['101','101','101','101','101','102','102','103'],
'Week':['W01','W02','W03','W07','W08','W01','W02','W01'],
'Orders':[15,15,10,15,15,5,10,10]}
df2 = pd.DataFrame(raw_data, columns = ['ID','Week','Orders'])
i wanted row by row percentages within groups.
How can i achieve like this
Using pct_change
df2.groupby('ID').Orders.pct_change()).add(1).fillna(0)
I find it wired in my pandas version pct_change can not do with groupby object , so that we need to do with
df2['New']=sum(l,[])
df2.New=(df2.New+1).fillna(0)
df2
Out[606]:
ID Week Orders New
0 101 W01 15 0.000000
1 101 W02 15 1.000000
2 101 W03 10 0.666667
3 101 W07 15 1.500000
4 101 W08 15 1.000000
5 102 W01 5 0.000000
6 102 W02 10 2.000000
7 103 W01 10 0.000000
Carry out a window operation shifting the value by 1 position
df2['prev']=df2.groupby(by='ID').Orders.shift(1).fillna(0)
Calculate % change individually using apply()
df2['pct'] = df2.apply(lambda x : ((x['Orders'] - x['prev']) / x['prev']) if x['prev'] != 0 else 0,axis=1)
I am not sure if there is any default pd.pct_change() within a window.
ID Week Orders prev pct
0 101 W01 15 0.0 0.000000
1 101 W02 15 15.0 0.000000
2 101 W03 10 15.0 -0.333333
3 101 W07 15 10.0 0.500000
4 101 W08 15 15.0 0.000000
5 102 W01 5 0.0 0.000000
6 102 W02 10 5.0 1.000000
7 103 W01 10 0.0 0.000000
I have this data set and I would like to make all boxplots of the 9 input variables to appear on the same plot, despite that they are in different scales. Could you please tell me if there is an easy way to accomplish this?
I am a novice SAS user so I would appreciate some advice. Thank you.
data raw;
input ID$ Family DistRd Cotton Maize Sorg Millet Bull Cattle Goats;
datalines;
FARM1 12 80 1.5 1 3 0.25 2 0 1
FARM2 54 8 6 4 0 1 6 32 5
FARM3 11 13 0.5 1 0 0 0 0 0
FARM4 21 13 2 2.5 1 0 1 0 5
FARM5 61 30 3 5 0 0 4 21 0
FARM6 20 70 0 2 3 0 2 0 3
FARM7 29 35 1.5 2 0 0 0 0 0
FARM8 29 35 2 3 2 0 0 0 0
FARM9 57 9 5 5 0 0 4 5 2
FARM10 23 33 2 2 1 0 2 1 7
FARM11 13 9 0.5 2 2 0 0 0 0
FARM12 15 9 2 2 2 0 0 0 0
FARM13 27 3 1.5 0 2 1 0 0 1
FARM14 28 5 2 0.5 2 2 2 0 5
FARM15 52 5 7 1 7 0 4 11 3
FARM16 12 10 2 2.5 3 0 0 0 0
FARM17 25 30 1 1 4 0 2 0 5
FARM18 5 3 1 0 1 0.5 0 0 3
FARM19 45 30 4.5 1 1 0 6 13 20
FARM20 6 7 1 1 1 1 2 0 5
FARM21 17 8 1.5 0.5 1.5 0.25 0 0 2
FARM22 22 6 3 2 3 1 3 0 2
FARM23 43 40 7 3 3 0.5 6 2 3
FARM24 66 36 0 0.5 5 5 0 0 0
FARM25 15 3 1 0 1.5 0.5 1 0 1
FARM26 26 5 2 1.5 2 2 1 0 0
FARM27 31 5 1.5 1 3 2 2 0 0
FARM28 37 2 3 2 3 5 3 0 5
FARM29 81 2 8 4 4 12 7 8 13
FARM30 14 10 0 0.5 3 1 0 0 0
FARM31 20 7 2 1 4 3 2 0 5
FARM32 26 7 2 1 2 2 2 0 2
FARM33 12 10 0.5 1 3 1 0 0 0
FARM34 18 35 4 3 3 3 4 0 0
FARM35 11 29 1 0.5 3 2 2 0 2
FARM36 50 29 5 3 5 4 4 8 4
FARM37 7 9 0 1 1 0 0 0 0
FARM38 26 9 2 1 3 0 0 0 0
FARM39 19 33 1 1.5 0 4 2 0 0
FARM40 43 33 3 3 4 7 4 3 0
FARM41 18 12 3 0 1 1 2 1 1
FARM42 64 20 3 5 2 2 4 0 6
FARM43 61 25 9 7 3 8 4 17 0
FARM44 18 3 0.5 0.5 2 2 0 0 4
FARM45 11 2 0.5 0 1.5 1.5 1 1 0
FARM46 30 3 4 2 4 0 4 2 0
FARM47 16 1.5 2 0.5 2 2 2 2 0
FARM48 46 1 0.75 1 3 2 0 0 2
FARM49 18 2 1.5 0.5 2 2 2 0 2
FARM50 81 3 12 1.5 10 8 11 14 15
FARM51 15 0 1.5 1.5 2.5 0 1 0 0
FARM52 26 11 3.5 2 4 0 2 2 2
FARM53 10 11 0 0 1.5 0 0 0 0
FARM54 40 12 5 3 6 1 8 17 10
FARM55 82 4 11 7 5 0.5 8 5 0
FARM56 40 5.5 6 4 2.5 1 3 0 2
FARM57 29 8 3 2 4 2 0 0 2
FARM58 23 5 5 4 3 1 1 0 0
FARM59 53 4 0 3 0 3 6 0 0
FARM60 57 3.5 9 8 0 0 10 23 0
FARM61 23 4 2 2 0.5 4 2 0 0
FARM62 9 31 2 2 0 2 1 0 0
FARM63 22 35 3 2 3 0 5 6 1
FARM64 25 35 3 1 2.5 0 4 8 10
FARM65 20 0 1.5 1 3 0 1 6 0
FARM66 27 41 1.1 0.25 1.5 1.5 0 3 1
FARM67 30 19 2 2 4 1 2 0 5
FARM68 77 18 8 4 6 4 6 8 6
FARM69 13 100 0.5 0.5 0 1 0 0 4
FARM70 24 100 2 3 0 0.5 3 14 10
FARM71 29 90 2 1.5 1.5 1.5 2 0 2
FARM72 57 90 10 7 0 1.5 7 8 7
;
run;
You need to transpose the values and use a group= statement.
Steps
1 Sort by ID
2 Transpose the data
3 Adjust the labels for display
4 Plot with PROC SGPLOT
proc sort data=raw;
by id;
run;
proc transpose data=raw out=raw_t;
by id;
run;
data raw_t;
set raw_t;
label _name_ = "Variable";
label col1 = "Value";
run;
ods html;
title "My Box Plot";
proc sgplot data=raw_t;
vbox col1 / group=_name_ ;
run;
ods html close;
Produces: