Pandas Modify Dataframe - python-3.x

I have a dataframe as below
0 1 2 3 4 5
0 0.428519 0.000000 0.0 0.541096 0.250099 0.345604
1 0.056650 0.000000 0.0 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.0 0.000000 0.000000 0.000000
3 0.849066 0.559117 0.0 0.374447 0.424247 0.586254
4 0.317644 0.000000 0.0 0.271171 0.586686 0.424560
I would like to modify it as below
0 0 0.428519
0 1 0.000000
0 2 0.0
0 3 0.541096
0 4 0.250099
0 5 0.345604
1 0 0.056650
1 1 0.000000
........

Use stack with reset_index:
df1 = df.stack().reset_index()
df1.columns = ['col1','col2','col3']
print (df1)
col1 col2 col3
0 0 0 0.428519
1 0 1 0.000000
2 0 2 0.000000
3 0 3 0.541096
4 0 4 0.250099
5 0 5 0.345604
6 1 0 0.056650
7 1 1 0.000000
8 1 2 0.000000
9 1 3 0.000000
10 1 4 0.000000
11 1 5 0.000000
12 2 0 0.000000
13 2 1 0.000000
14 2 2 0.000000
15 2 3 0.000000
16 2 4 0.000000
17 2 5 0.000000
18 3 0 0.849066
19 3 1 0.559117
20 3 2 0.000000
21 3 3 0.374447
22 3 4 0.424247
23 3 5 0.586254
24 4 0 0.317644
25 4 1 0.000000
26 4 2 0.000000
27 4 3 0.271171
28 4 4 0.586686
29 4 5 0.424560
Numpy solution with numpy.tile and numpy.repeat, flattening is by numpy.ravel:
df2 = pd.DataFrame({
"col1": np.repeat(df.index, len(df.columns)),
"col2": np.tile(df.columns, len(df.index)),
"col3": df.values.ravel()})
print (df2)
col1 col2 col3
0 0 0 0.428519
1 0 1 0.000000
2 0 2 0.000000
3 0 3 0.541096
4 0 4 0.250099
5 0 5 0.345604
6 1 0 0.056650
7 1 1 0.000000
8 1 2 0.000000
9 1 3 0.000000
10 1 4 0.000000
11 1 5 0.000000
12 2 0 0.000000
13 2 1 0.000000
14 2 2 0.000000
15 2 3 0.000000
16 2 4 0.000000
17 2 5 0.000000
18 3 0 0.849066
19 3 1 0.559117
20 3 2 0.000000
21 3 3 0.374447
22 3 4 0.424247
23 3 5 0.586254
24 4 0 0.317644
25 4 1 0.000000
26 4 2 0.000000
27 4 3 0.271171
28 4 4 0.586686
29 4 5 0.424560

Related

Trying to webscrape table data from rotogrinders. Any help would be greatly appreciated

So far I have:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Firefox()
url = r"https://rotogrinders.com/game-stats/nba-player?site=fanduel&range=yesterday"
driver.get(url)
cookies = driver.find_element_by_xpath(r'//*[#id="bc-close-cookie"]').click()
select = driver.find_element_by_xpath(r'/html/body/div[1]/div/section/div/section/div[2]/div[2]/a[2]').click()
I need to scrape the table data to a .csv file.
Any suggestions?
Because of the subscription pop-up:
When this error occurs, click on no thank you before clicking on the all button.
Selenium is overkill here because the data is within the <script> tags. You could get it through a simple request using requests, then either a) use BeautifulSoup to pull out the <script> tag, then parse it, or b) pull it straight away using regex. I chose the latter.
It comes in a nice json format meaning turning into a table is relatively easy with pandas. You'll also use pnadas to write to csv.
import requests
import pandas as pd
import re
import json
url = 'https://rotogrinders.com/game-stats/nba-player?site=fanduel&range=yesterday'
response = requests.get(url)
p = re.compile("var data = (\[.*\])")
result = p.search(response.text)
jsonStr = result.group(1)
jsonData = json.loads(jsonStr)
df = pd.DataFrame(jsonData)
df.to_csv('nba.csv', index=False)
Output:
print(df.to_string())
id name fpts gp fgm fga ftm fta 3pm 3pa 2pm 2pa reb ast stl blk to pts oreb pfoul tfoul ffoul min fouls dd td usg pace fg% ft% 3p% 2p% team pos player
0 947 Lance Stephenson 6.70 1 1 1 2 2 0 0 1 1 1 3 0 0 3 4 0 1 0 0 17.80 0 0 0 13.19 5 1 1 0 1 IND SCW Lance Stephenson
1 1079 Stephen Curry 58.00 1 12 27 9 9 6 16 6 11 5 8 1 0 2 39 0 3 0 0 43.95 0 0 0 32.40 33.50 0.44 1 0.38 0.55 GSW CG Stephen Curry
2 1087 Chris Paul 51.50 1 8 14 2 2 2 6 6 8 5 11 2 1 0 20 0 3 0 0 36.68 0 1 0 20.19 15 0.57 1 0.33 0.75 PHO DIS Chris Paul
3 1277 Taj Gibson 12.50 1 1 3 0 0 0 0 1 3 5 1 0 1 0 2 1 4 0 0 17.95 0 0 0 7.42 2 0.33 0 0 0.33 NYK PB Taj Gibson
4 1334 Andre Iguodala 24.00 1 1 2 2 2 0 0 1 2 5 4 0 4 4 4 2 2 0 0 31.32 0 0 0 10.47 5 0.50 1 0 0.50 GSW 3DW Andre Iguodala
5 1485 JaVale McGee 10.80 1 3 5 2 2 0 0 3 5 4 0 0 0 2 8 1 3 0 0 17.82 0 0 0 17.69 7 0.60 1 0 0.60 PHO PB JaVale McGee
6 13301 Jonas Valanciunas 39.00 1 8 11 1 2 1 4 7 7 10 2 1 2 3 18 1 5 0 0 33.03 0 1 0 18.82 14 0.73 0.50 0.25 1 NOP PB Jonas Valanciunas
7 13312 Bismack Biyombo 19.30 1 2 3 5 7 0 0 2 3 4 1 2 0 2 9 2 3 0 0 27.90 0 0 0 12.06 6.50 0.67 0.71 0 0.67 PHO PB Bismack Biyombo
8 13315 Klay Thompson 13.90 1 6 17 0 0 0 7 6 10 2 1 0 0 2 12 1 1 0 0 23.10 0 0 0 33.47 18 0.35 0 0 0.60 GSW 3DW Klay Thompson
9 13335 Kemba Walker 11.40 1 1 5 3 4 0 3 1 2 2 2 1 0 2 5 0 2 1 0 21.17 0 0 0 17.80 9 0.20 0.75 0 0.50 NYK CG Kemba Walker
10 13353 Alec Burks 19.60 1 3 9 5 5 2 6 1 3 3 2 1 0 3 13 1 3 0 0 22.58 0 0 0 26.32 13.50 0.33 1 0.33 0.33 NYK CG Alec Burks
11 13913 Evan Fournier 10.10 1 2 8 1 2 1 6 1 2 3 1 0 0 1 6 1 1 0 0 25.15 0 0 0 16.24 9 0.25 0.50 0.17 0.50 NYK SHW Evan Fournier
12 13942 Jae Crowder 21.60 1 4 9 2 2 3 6 1 3 3 0 2 0 1 13 0 3 0 0 32.65 0 0 0 13.33 11 0.44 1 0.50 0.33 PHO 3DW Jae Crowder
13 13955 Jeremy Lamb 27.40 1 2 5 8 10 2 3 0 2 2 2 2 1 1 14 1 2 0 0 18.88 0 0 0 23.43 10 0.40 0.80 0.67 0 IND SCW Jeremy Lamb
14 14077 Garrett Temple 0.20 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 10.88 0 0 0 3.68 1 0 0 0 0 NOP SHW Garrett Temple
15 14559 Justin Holiday 23.50 1 6 13 0 0 4 9 2 4 5 3 0 0 3 16 1 2 0 0 34.22 0 0 0 19.86 15 0.46 0 0.44 0.50 IND SHW Justin Holiday
16 16879 Tim Hardaway Jr. 18.50 1 4 10 0 0 3 7 1 3 5 1 0 0 0 11 0 3 0 0 28.28 0 0 0 14.61 10 0.40 0 0.43 0.33 DAL SHW Tim Hardaway Jr.
17 16943 Reggie Bullock 11.70 1 1 3 0 0 1 3 0 0 6 1 0 0 0 3 1 1 0 0 20.18 0 0 0 6.60 2 0.33 0 0.33 0 DAL 3DW Reggie Bullock
18 18566 Dwight Powell 8.60 1 2 3 2 3 0 0 2 3 3 0 0 0 1 6 2 1 0 0 14.35 0 0 0 14.83 3.50 0.67 0.67 0 0.67 DAL VB Dwight Powell
19 18620 Andrew Wiggins 22.80 1 5 15 0 1 1 6 4 9 4 2 1 1 2 11 2 5 0 0 38.03 0 0 0 19.04 15.50 0.33 0 0.17 0.44 GSW SCW Andrew Wiggins
20 18632 Julius Randle 24.40 1 1 9 2 4 0 2 1 7 7 6 1 1 3 4 2 1 1 0 29.48 0 0 0 21.36 12 0.11 0.50 0 0.14 NYK VB Julius Randle
21 18899 Kristaps Porzingis 37.70 1 7 15 2 2 2 4 5 11 11 1 0 2 1 18 2 5 0 0 32.28 0 1 0 21.33 15 0.47 1 0.50 0.45 DAL VB Kristaps Porzingis
22 18941 Cameron Payne 18.60 1 4 10 3 3 1 4 3 6 3 0 1 0 0 12 0 1 1 0 18.93 0 0 0 23.92 11.50 0.40 1 0.25 0.50 PHO DIS Cameron Payne
23 18945 Kevon Looney 36.50 1 5 6 3 4 0 0 5 6 15 3 2 0 5 13 6 3 0 1 28.17 0 1 0 19.52 7 0.83 0.75 0 0.83 GSW PB Kevon Looney
24 18949 Devin Booker 41.00 1 11 25 5 5 1 8 10 17 5 6 0 0 2 28 0 2 1 0 38.30 0 0 0 32.56 29.50 0.44 1 0.13 0.59 PHO SCW Devin Booker
25 31814 Nemanja Bjelica 22.20 1 4 5 0 0 0 1 4 4 6 2 1 1 2 8 1 1 0 0 14.13 0 0 0 21.68 6 0.80 0 0 1 GSW VF Nemanja Bjelica
26 35227 Brandon Ingram 30.00 1 4 10 6 8 1 1 3 9 5 6 1 0 3 15 0 2 0 0 26.88 0 0 0 27.53 17 0.40 0.75 1 0.33 NOP SCW Brandon Ingram
27 35995 Damion Lee 14.00 1 3 5 2 2 1 2 2 3 0 0 2 0 1 9 0 3 0 0 14 0 0 0 19.66 7 0.60 1 0.50 0.67 GSW CG Damion Lee
28 36032 Dorian Finney-Smith 14.20 1 2 6 0 0 1 5 1 1 6 2 0 0 1 5 1 4 0 0 32 0 0 0 9.57 6 0.33 0 0.20 1 DAL 3DW Dorian Finney-Smith
29 36041 Gary Payton II 18.00 1 3 5 0 1 0 2 3 3 5 0 2 0 0 6 1 1 1 0 17.40 0 0 0 12.51 4.50 0.60 0 0 1 GSW CG Gary Payton II
30 37796 Frank Ntilikina 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1.70 0 0 0 0.00 0 0 0 0 0 DAL DIS Frank Ntilikina
31 37845 Maxi Kleber 22.00 1 3 9 2 2 1 7 2 2 5 4 0 1 2 9 1 2 0 0 27.13 0 0 0 19.46 11 0.33 1 0.14 1 DAL VB Maxi Kleber
32 37861 Torrey Craig 32.90 1 5 10 0 0 2 5 3 5 7 1 3 1 1 12 0 3 0 0 34.22 0 0 0 13.24 11 0.50 0 0.40 0.60 IND 3DW Torrey Craig
33 37889 Josh Hart 34.40 1 5 10 5 9 2 4 3 6 7 4 1 0 0 17 1 4 0 0 35.28 0 0 0 17.32 13.50 0.50 0.56 0.50 0.50 NOP SHW Josh Hart
34 408821 Mitchell Robinson 37.00 1 6 7 5 10 0 0 6 7 15 0 0 1 1 17 7 3 0 0 30.05 0 1 0 16.51 6 0.86 0.50 0 0.86 NYK PB Mitchell Robinson
35 408841 Landry Shamet 4.20 1 0 2 0 0 0 2 0 0 1 0 1 0 0 0 0 3 0 0 10.07 0 0 0 7.94 2 0 0 0 0 PHO SHW Landry Shamet
36 408842 Mikal Bridges 42.60 1 5 11 2 2 0 4 5 7 8 6 4 0 0 12 2 1 0 0 37.18 0 0 0 14.91 10 0.45 1 0 0.71 PHO 3DW Mikal Bridges
37 408968 Devonte' Graham 19.10 1 5 16 1 1 4 10 1 6 3 1 0 0 1 15 0 1 0 0 28.03 0 0 0 25.36 17.50 0.31 1 0.40 0.17 NOP CG Devonte' Graham
38 408971 Luka Doncic 44.60 1 9 23 8 11 2 9 7 14 8 8 1 0 8 28 1 1 0 0 38.13 0 0 0 40.37 35.50 0.39 0.73 0.22 0.50 DAL CG Luka Doncic
39 409062 Jalen Brunson 20.50 1 8 13 2 2 1 2 7 11 5 1 0 0 6 19 2 3 0 0 34.75 0 0 0 23.26 18 0.62 1 0.50 0.64 DAL DIS Jalen Brunson
40 570141 Gary Clark 3.50 1 1 5 0 0 1 4 0 1 0 1 0 0 1 3 0 0 0 0 7.95 0 0 0 31.85 6 0.20 0 0.25 0 NOP VF Gary Clark
41 1115150 RJ Barrett 30.20 1 6 13 4 7 1 5 5 8 6 2 0 1 0 17 1 1 0 0 27.98 0 0 0 23.93 15.50 0.46 0.57 0.20 0.63 NYK SCW RJ Barrett
42 1115261 Goga Bitadze 32.30 1 5 12 2 5 1 4 4 8 9 5 0 1 2 13 4 2 2 0 31.35 0 0 0 22.78 12.50 0.42 0.40 0.25 0.50 IND VB Goga Bitadze
43 1115443 Jaxson Hayes 8.90 1 3 4 0 0 0 0 3 4 2 1 0 0 1 6 0 5 0 0 16.47 0 0 0 12.94 5 0.75 0 0 0.75 NOP PB Jaxson Hayes
44 1115445 Nickeil Alexander-Walker 13.60 1 1 4 2 2 0 0 1 4 3 4 0 0 0 4 0 3 0 0 24.42 0 0 0 10.16 5 0.25 1 0 0.25 NOP CG Nickeil Alexander-Walker
45 1115568 Jordan Poole 13.40 1 1 7 3 4 0 5 1 2 2 4 0 0 0 5 0 2 0 0 24.67 0 0 0 16.34 9 0.14 0.75 0 0.50 GSW CG Jordan Poole
46 1115625 Cam Johnson 19.90 1 2 7 2 2 1 6 1 1 7 1 1 0 0 7 0 2 0 0 20.47 0 0 0 16.04 8 0.29 1 0.17 1 PHO SHW Cam Johnson
47 1311751 Oshae Brissett 12.70 1 0 7 2 4 0 3 0 4 6 1 1 0 1 2 0 1 0 0 21.98 0 0 0 18.36 10 0 0.50 0 0 IND SHW Oshae Brissett
48 1333886 Juan Toscano-Anderson 15.30 1 2 5 0 0 1 3 1 2 4 1 2 0 2 5 1 4 0 0 14.87 0 0 0 19.72 6 0.40 0 0.33 0.50 GSW SHW Juan Toscano-Anderson
49 2439136 Obi Toppin 5.40 1 0 1 0 0 0 1 0 0 2 0 0 1 0 0 0 1 0 0 18.52 0 0 0 2.16 1 0 0 0 0 NYK VB Obi Toppin
50 2439366 Josh Green 4.70 1 1 3 0 0 0 0 1 3 1 1 0 0 0 2 0 1 0 0 11.18 0 0 0 11.91 3 0.33 0 0 0.33 DAL 3DW Josh Green
51 2439514 Immanuel Quickley 31.30 1 4 13 4 4 2 9 2 4 4 5 3 0 4 14 0 1 0 0 23.28 0 0 0 35.07 19 0.31 1 0.22 0.50 NYK SHW Immanuel Quickley
52 3005116 Quentin Grimes 16.20 1 5 9 0 0 3 6 2 3 1 2 0 0 1 13 1 4 0 0 23.83 0 0 0 17.89 9 0.56 0 0.50 0.67 NYK 3DW Quentin Grimes
53 3005227 Isaiah Jackson 33.90 1 5 12 5 8 0 1 5 11 7 1 3 0 0 15 3 5 0 0 18.63 0 0 0 34.03 13 0.42 0.63 0 0.45 IND VF Isaiah Jackson
54 3005228 Chris Duarte 48.90 1 10 16 5 5 2 3 8 13 7 3 3 0 0 27 2 3 0 0 38.53 0 0 0 19.92 16.50 0.63 1 0.67 0.62 IND SCW Chris Duarte
55 3005417 Herb Jones 28.80 1 5 7 0 0 1 3 4 4 4 4 3 0 2 11 1 3 0 0 37.08 0 0 0 11.13 8 0.71 0 0.33 1 NOP 3DW Herb Jones
56 3007111 Jonathan Kuminga 14.50 1 0 0 5 6 0 0 0 0 5 3 0 0 1 5 1 3 1 0 15.37 0 0 0 12.05 3 0 0.83 0 0 GSW VF Jonathan Kuminga
57 3014026 Keifer Sykes 17.30 1 4 12 0 0 2 5 2 7 4 3 0 0 2 10 2 3 0 0 31.02 0 0 0 19.33 12 0.33 0 0.40 0.29 IND DIS Keifer Sykes
58 3015595 Duane Washington 8.50 1 3 7 0 0 2 2 1 5 0 1 0 0 1 8 0 1 0 0 18.37 0 0 0 18.14 8 0.43 0 1 0.20 IND SHW Duane Washington
59 3015773 Jose Alvarado 31.00 1 6 9 0 0 1 2 5 7 0 4 4 0 0 13 0 3 0 0 19.97 0 0 0 20.67 9 0.67 0 0.50 0.71 NOP DIS Jose Alvarado

Find the Rank value based on the several columns in Pandas dataframe

My objective is to find the best code from the dataframe.For an Id No and pdt_No i have to find out the best code using the prob value.So i tried Rank method but it gives overall rank not specific to each Id No.
Input
Id No Pdt_No code prob
1 pdt1 HHL 0.000000
1 pdt3 HHL 50.000000
1 pdt2 HHL 0.000000
1 pdt5 HHL 50.000000
1 pdt8 HHL 100.000000
1 pdt1 HHL 50.000000
1 pdt2 HHL 100.000000
3 pdt1 HHM 0.000000
3 pdt1 HHM 0.000000
3 pdt1 HHM 25.000000
3 pdt4 HHM 33.333333
3 pdt3 HHM 33.333333
3 pdt2 HHM 0.000000
3 pdt2 HHM 50.000000
4 pdt5 ERS 0.000000
4 pdt2 ERS 0.000000
4 pdt2 MKL 100.000000
4 pdt2 MKL 50.000000
4 pdt5 MKL 5.000000
5 pdt1 MKM 0.000000
5 pdt1 MKM 100.000000
5 pdt1 MKM 33.333333
5 pdt1 LPM 63.333333
5 pdt2 LPM 0.000000
5 pdt2 LPM 0.000000
5 pdt2 LPM 33.333333
5 pdt2 LPM 100.000000
what I have tried is
df['rank']=df.groupby(['Id No','Pdt_No'])['prob'].rank(ascending=False)
output
Id No Pdt_No code prob rank
1 pdt1 HHL 0.000000 2
1 pdt3 HHL 50.000000 1
1 pdt2 HHL 0.000000 2
1 pdt5 HHL 50.000000 1
1 pdt8 HHL 100.000000 1
1 pdt1 HHL 50.000000 1
1 pdt2 HHL 100.000000 1
3 pdt1 HHM 0.000000 2
3 pdt1 HHM 0.000000 2
3 pdt1 HHM 25.000000 1
3 pdt4 HHM 33.333333 1
3 pdt3 HHM 33.333333 1
3 pdt2 HHM 0.000000 2
3 pdt2 HHM 50.000000 1
4 pdt5 ERS 0.000000 2
4 pdt2 ERS 0.000000 3
4 pdt2 MKL 100.000000 1
4 pdt2 MKL 50.000000 2
4 pdt5 MKL 5.000000 1
5 pdt1 MKM 0.000000 4
5 pdt1 MKM 100.000000 1
5 pdt1 MKM 33.333333 3
5 pdt1 LPM 63.333333 2
5 pdt2 LPM 0.000000 3
5 pdt2 LPM 0.000000 3
5 pdt2 LPM 33.333333 2
5 pdt2 LPM 100.000000 1

Date Format Changes while concatenating frames in Pandas Python

I would like to concatenate two frames. Can do so as well.
However, while doing so date format is auto changing which is untended and needs to resolve. I've a column called EVENT_DATE in 'YYYY-MM-DD' format. But its being changed.
Here loading a sample TSV formatted data to data frame
>>>df1 = pd.read_csv('detail_trend_analysis_data.csv',delimiter='|', parse_dates=[0])
>>>df1.head()
EVENT_DATE EVENT_HOUR PRODUCT ... BONUS_VOLUME BONUS_COST RECORD_COUNT
0 2019-11-08 0 1 ... 0.0 220152.426342 287516
1 2019-11-08 0 1 ... 0.0 0.000000 3104
2 2019-11-08 0 1 ... 0.0 226544.777596 254965
3 2019-11-08 0 1 ... 0.0 0.000000 2449
4 2019-11-08 0 1 ... 0.0 0.000000 35085
[5 rows x 18 columns]
Doing Same thing
>>>df2 = pd.read_csv('detail_trend_analysis_data.csv',delimiter='|', parse_dates=[0])
Changing the date
>>>df2['EVENT_DATE']='2019-11-09'
>>>df2.head()
EVENT_DATE EVENT_HOUR PRODUCT ... BONUS_VOLUME BONUS_COST RECORD_COUNT
0 2019-11-09 0 1 ... 0.0 220152.426342 287516
1 2019-11-09 0 1 ... 0.0 0.000000 3104
2 2019-11-09 0 1 ... 0.0 226544.777596 254965
3 2019-11-09 0 1 ... 0.0 0.000000 2449
4 2019-11-09 0 1 ... 0.0 0.000000 35085
[5 rows x 18 columns]
Concatenating
>>>frames=[df1,df2]
>>>df=pd.concat(frames)
>>>df.head()
EVENT_DATE EVENT_HOUR ... BONUS_COST RECORD_COUNT
0 2019-11-08 00:00:00 0 ... 220152.426342 287516
1 2019-11-08 00:00:00 0 ... 0.000000 3104
2 2019-11-08 00:00:00 0 ... 226544.777596 254965
3 2019-11-08 00:00:00 0 ... 0.000000 2449
4 2019-11-08 00:00:00 0 ... 0.000000 35085
[5 rows x 18 columns]
But at the end time changes to 'YYY-MM-DD HH24:MI:SS' which I don't want.
How to resolve this?
what if you set df['EVENT_DATE] = df['EVENT_DATE'].dt.date on both dataframes?

pandas group by row wise conditions

I have a dataframe like this
import pandas as pd
raw_data = {'ID':['101','101','101','101','101','102','102','103'],
'Week':['W01','W02','W03','W07','W08','W01','W02','W01'],
'Orders':[15,15,10,15,15,5,10,10]}
df2 = pd.DataFrame(raw_data, columns = ['ID','Week','Orders'])
i wanted row by row percentages within groups.
How can i achieve like this
Using pct_change
df2.groupby('ID').Orders.pct_change()).add(1).fillna(0)
I find it wired in my pandas version pct_change can not do with groupby object , so that we need to do with
df2['New']=sum(l,[])
df2.New=(df2.New+1).fillna(0)
df2
Out[606]:
ID Week Orders New
0 101 W01 15 0.000000
1 101 W02 15 1.000000
2 101 W03 10 0.666667
3 101 W07 15 1.500000
4 101 W08 15 1.000000
5 102 W01 5 0.000000
6 102 W02 10 2.000000
7 103 W01 10 0.000000
Carry out a window operation shifting the value by 1 position
df2['prev']=df2.groupby(by='ID').Orders.shift(1).fillna(0)
Calculate % change individually using apply()
df2['pct'] = df2.apply(lambda x : ((x['Orders'] - x['prev']) / x['prev']) if x['prev'] != 0 else 0,axis=1)
I am not sure if there is any default pd.pct_change() within a window.
ID Week Orders prev pct
0 101 W01 15 0.0 0.000000
1 101 W02 15 15.0 0.000000
2 101 W03 10 15.0 -0.333333
3 101 W07 15 10.0 0.500000
4 101 W08 15 15.0 0.000000
5 102 W01 5 0.0 0.000000
6 102 W02 10 5.0 1.000000
7 103 W01 10 0.0 0.000000

Multiple boxplots in SAS

I have this data set and I would like to make all boxplots of the 9 input variables to appear on the same plot, despite that they are in different scales. Could you please tell me if there is an easy way to accomplish this?
I am a novice SAS user so I would appreciate some advice. Thank you.
data raw;
input ID$ Family DistRd Cotton Maize Sorg Millet Bull Cattle Goats;
datalines;
FARM1 12 80 1.5 1 3 0.25 2 0 1
FARM2 54 8 6 4 0 1 6 32 5
FARM3 11 13 0.5 1 0 0 0 0 0
FARM4 21 13 2 2.5 1 0 1 0 5
FARM5 61 30 3 5 0 0 4 21 0
FARM6 20 70 0 2 3 0 2 0 3
FARM7 29 35 1.5 2 0 0 0 0 0
FARM8 29 35 2 3 2 0 0 0 0
FARM9 57 9 5 5 0 0 4 5 2
FARM10 23 33 2 2 1 0 2 1 7
FARM11 13 9 0.5 2 2 0 0 0 0
FARM12 15 9 2 2 2 0 0 0 0
FARM13 27 3 1.5 0 2 1 0 0 1
FARM14 28 5 2 0.5 2 2 2 0 5
FARM15 52 5 7 1 7 0 4 11 3
FARM16 12 10 2 2.5 3 0 0 0 0
FARM17 25 30 1 1 4 0 2 0 5
FARM18 5 3 1 0 1 0.5 0 0 3
FARM19 45 30 4.5 1 1 0 6 13 20
FARM20 6 7 1 1 1 1 2 0 5
FARM21 17 8 1.5 0.5 1.5 0.25 0 0 2
FARM22 22 6 3 2 3 1 3 0 2
FARM23 43 40 7 3 3 0.5 6 2 3
FARM24 66 36 0 0.5 5 5 0 0 0
FARM25 15 3 1 0 1.5 0.5 1 0 1
FARM26 26 5 2 1.5 2 2 1 0 0
FARM27 31 5 1.5 1 3 2 2 0 0
FARM28 37 2 3 2 3 5 3 0 5
FARM29 81 2 8 4 4 12 7 8 13
FARM30 14 10 0 0.5 3 1 0 0 0
FARM31 20 7 2 1 4 3 2 0 5
FARM32 26 7 2 1 2 2 2 0 2
FARM33 12 10 0.5 1 3 1 0 0 0
FARM34 18 35 4 3 3 3 4 0 0
FARM35 11 29 1 0.5 3 2 2 0 2
FARM36 50 29 5 3 5 4 4 8 4
FARM37 7 9 0 1 1 0 0 0 0
FARM38 26 9 2 1 3 0 0 0 0
FARM39 19 33 1 1.5 0 4 2 0 0
FARM40 43 33 3 3 4 7 4 3 0
FARM41 18 12 3 0 1 1 2 1 1
FARM42 64 20 3 5 2 2 4 0 6
FARM43 61 25 9 7 3 8 4 17 0
FARM44 18 3 0.5 0.5 2 2 0 0 4
FARM45 11 2 0.5 0 1.5 1.5 1 1 0
FARM46 30 3 4 2 4 0 4 2 0
FARM47 16 1.5 2 0.5 2 2 2 2 0
FARM48 46 1 0.75 1 3 2 0 0 2
FARM49 18 2 1.5 0.5 2 2 2 0 2
FARM50 81 3 12 1.5 10 8 11 14 15
FARM51 15 0 1.5 1.5 2.5 0 1 0 0
FARM52 26 11 3.5 2 4 0 2 2 2
FARM53 10 11 0 0 1.5 0 0 0 0
FARM54 40 12 5 3 6 1 8 17 10
FARM55 82 4 11 7 5 0.5 8 5 0
FARM56 40 5.5 6 4 2.5 1 3 0 2
FARM57 29 8 3 2 4 2 0 0 2
FARM58 23 5 5 4 3 1 1 0 0
FARM59 53 4 0 3 0 3 6 0 0
FARM60 57 3.5 9 8 0 0 10 23 0
FARM61 23 4 2 2 0.5 4 2 0 0
FARM62 9 31 2 2 0 2 1 0 0
FARM63 22 35 3 2 3 0 5 6 1
FARM64 25 35 3 1 2.5 0 4 8 10
FARM65 20 0 1.5 1 3 0 1 6 0
FARM66 27 41 1.1 0.25 1.5 1.5 0 3 1
FARM67 30 19 2 2 4 1 2 0 5
FARM68 77 18 8 4 6 4 6 8 6
FARM69 13 100 0.5 0.5 0 1 0 0 4
FARM70 24 100 2 3 0 0.5 3 14 10
FARM71 29 90 2 1.5 1.5 1.5 2 0 2
FARM72 57 90 10 7 0 1.5 7 8 7
;
run;
You need to transpose the values and use a group= statement.
Steps
1 Sort by ID
2 Transpose the data
3 Adjust the labels for display
4 Plot with PROC SGPLOT
proc sort data=raw;
by id;
run;
proc transpose data=raw out=raw_t;
by id;
run;
data raw_t;
set raw_t;
label _name_ = "Variable";
label col1 = "Value";
run;
ods html;
title "My Box Plot";
proc sgplot data=raw_t;
vbox col1 / group=_name_ ;
run;
ods html close;
Produces:

Resources