How to set a benchmark based on data - statistics

I have a huge dataset of people's daily productivity for X period of time, lets say from Jan 2022 to Aug 2022. And you have to identify a standard productivity that all of them should achieve that is nominal and achievable based on the data. What test would you use, and how would you identify the standard benchmark ?

Related

Daily anomalies using climate data operator (CDO) in Cygwin (Windows 10)

I'm complete beginner with Cywing and CDO, which both have been installed in Windows 10. I'm working with 3 variables from ERA 5 Land hourly data: 2m temperature, total precipitation and runoff. Some facts about these vars:
three variables are in netCDF format.
2m temperature: contains hourly values and its units are in Kelvin.
total precipitation and runoff: contains hourly values and their units are depth in metres.
I want to obtain daily anomalies of 2017 arising from 30-year period (1981-2010). This post gave me a general idea what to do but I'm not pretty sure how to replicate. Intuitively, I think this would be the setps:
Convert units according to each var (e.g. K to C for 2m temperature, metres to mm for total precipitation)
Convert data from hourly to daily values
Obtain mean values for 2017 data and 1981-2010 data
Substract: 30-year mean values minus 2017 mean value
Download the file containing 2017 anomalies
Not sure about the order of procedures.
What the coding would be like in Cygwin terminal?
before you start I would recommend strongly to abandon cygwin and install the linux subsystem under windows (i.e. not parallel boot), if you do a quick search you will see that it is very easy to install ubuntu directly within windows itself, that way you can open a linux terminal and easily install anything you want with sudo apt install , e.g.
sudo apt install cdo
Once you have done that to answer some of your questions:
Convert units according to each var (e.g. K to C for 2m temperature, metres to mm for total precipitation)
e.g. to convert temperature:
cdo subc,273.15 in.nc out.nc
similar for rain using mulc [recall that this doesn't change the metadata "units", you need to use nco for that]
Convert data from hourly to daily values
for instantaneous fields like temperature
cdo daysum in.nc daymean.nc
for flux field (like rain)
cdo daymean -shifttime,-1hour in.nc raindaymean.nc
Obtain mean values for 2017 data and 1981-2010 data.
cdo selyear,2017 -yearmean in.nc year2017_anom.nc
Substract: 30-year mean values minus 2017 mean value
Erm, usually you want to do this the other way round no? 2017-long term mean, so you can see if it is warmer or cooler?
cdo sub year2017_anom.nc -timmean alldata_daymean.nc
Download the file containing 2017 anomalies
I don't understand this question, haven't you already downloaded the hourly data from the CDS platform ? This question only makes sense if you are using the CDS toolbox, which doesn't seem to be the case - anyway, for the downloading step, if this is not clear then you can take a look at my video on this topic on my youtube channel here: https://www.youtube.com/watch?v=AXG97K6NYD8&t=469s

How to extract a particular element/text inside an HTML using Python 3.x

This is my code:
import requests
from bs4 import BeautifulSoup
r=requests.get('https://www.morningstar.com/stocks/xtse/enb/quote')
c=r.content
soup=BeautifulSoup(c,"html.parser")
print(soup.prettify())
for item in soup.find("byId: {}".text):
print(item.text)
Once I ran that, on the bottom most of the whole html file it shows:
window.__NUXT__=(function(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N){J.id=1014028;J.title="Enbridge Expects a Rebound in 2021, Increases Dividend by 3%";J.deck=y;J.locale=K;J.publishedDate=L;J.updatedDate=L;J.paywalled=b;J.authors=[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}];J.authorDisclosure=[];J.body=[{type:w,contentObject:[{type:x,content:"Wide-moat Enbridge announced 2021 full-year adjusted EBITDA guidance of CAD 13.9 billion-CAD 14.3 billion at its annual investor day on Dec. 8, which is above our previous forecasts. The midpoint of the guidance also implies a 6% increase from 2020 expected EBITDA. Enbridge believes that the increased performance will be driven by a recovery of Mainline volumes and the associated downstream pipelines; customer growth in the gas utilities business; rate increases on its gas pipelines; and the impact of new projects, including the Line 3 replacement. At this point, the Mainline’s heavy oil capacity is full, and demand for light capacity continues to increase. Accordingly, Enbridge expects first-quarter 2021 volumes to average 2.7 million barrels of oil per day, which is also above our previous expectations and compares favorably with 2.56 mmbbl\u002Fd in third-quarter 2020.",gated:a}],gated:a}];M.name=H;M.performanceId=A;M.secId=A;M.ticker=N;M.exchange=I;M.type=F;return {layout:"default",data:[{marketPrice:{value:43.55,filtered:a,date:{value:v,filtered:a}},premiumDiscount:{value:"-92.6",filtered:b,date:{value:v,filtered:a},text:{value:"Discount",filtered:a},type:{value:D,filtered:a}},threeStarRatingPrice:{value:"86.9",filtered:b},headquarterAddress1:{value:"425-1st Street SW",filtered:a},headquarterAddress2:{value:"Suite 200, Fifth Avenue Place",filtered:a},industry:{value:"Oil & Gas Midstream",filtered:a},stockStarRating:{value:"6",filtered:b,date:{value:v,filtered:a},text:{value:"Undervalued",filtered:a},type:{value:D,filtered:a}},fiscalYearEndDate:{value:"2020-12-31",filtered:a},headquarterState:{value:"AB",filtered:a},reportDate:{value:"2020-09-30",filtered:a},headquarterPostalCode:{value:"T2P 3L8",filtered:a},stewardshipRating:{value:"Bkwvsknl",filtered:b,date:{value:v,filtered:a}},companyProfile:{value:"Enbridge is an energy generation, distribution, and transportation company in the U.S. and Canada. Its pipeline network consists of the Canadian Mainline system, regional oil sands pipelines, and natural gas pipelines. The company also owns and operates a regulated natural gas utility and Canada’s largest natural gas distribution company. Additionally, Enbridge generates renewable and alternative energy with 2,000 megawatts of capacity.",filtered:a},fairValue:{value:"23.3",filtered:b,date:{value:"2020-12-08",filtered:a},type:{value:D,filtered:a}},fax:{value:"+1 403 231-5929",filtered:a},fiveStarRatingPrice:{value:"58.3",filtered:b},sector:{value:"Energy",filtered:a},economicMoat:{value:"Nsfq",filtered:b,date:{value:v,filtered:a}},ticker:N,website:{value:"https:\u002F\u002Fwww.enbridge.com",filtered:a},headquarterCountry:{value:"Canada",filtered:a},contactEmail:{value:"investor.relations#enbridge.com",filtered:a},fourStarRatingPrice:{value:"77.0",filtered:b},economicMoatTrend:{value:"Zcbqwh",filtered:b,date:{value:v,filtered:a}},headquarterCity:{value:"Calgary",filtered:a},phone:{value:"+1 403 231-3900",filtered:a},universe:{value:"EQ",filtered:a},exchange:I,totalEmployees:{value:11300,filtered:a},twoStarRatingPrice:{value:"54.91",filtered:b},fairValueUncertainty:{value:"Nnczrm",filtered:b},name:H,performanceId:A,secId:A,type:F,articles:[{title:"Epic Oil Crash Sets Up Brutal Downturn for Energy Sector",link:"\u002Farticles\u002F980350\u002Fepic-oil-crash-sets-up-brutal-downturn-for-energy-sector",caption:"But recovery is inevitable, and stocks look very cheap--just watch out for bankruptcy risk.",author:"Preston Caldwell",label:C,isVideo:a},{title:"Enbridge's Sell-Off Looks Exaggerated",link:"\u002Farticles\u002F978902\u002Fenbridges-sell-off-looks-exaggerated",caption:"The market is underestimating long-term cash flows once oil prices normalize.",author:c,label:z,isVideo:a},{title:"Executive Orders More Symbolic Than Material for Pipelines",link:"\u002Farticles\u002F923945\u002Fexecutive-orders-more-symbolic-than-material-for-pipelines",caption:"We're not changing our outlook for our midstream coverage.",author:"Stephen Ellis",label:C,isVideo:a},{title:"New Permit Moves Keystone XL Forward",link:"\u002Farticles\u002F922171\u002Fnew-permit-moves-keystone-xl-forward",caption:"Our fair value estimates for TransCanada and Enbridge are unchanged.",author:c,label:z,isVideo:a},{title:"Enbridge Hikes Dividend, Remains Deeply Undervalued",link:"\u002Farticles\u002F904922\u002Fenbridge-hikes-dividend-remains-deeply-undervalued",caption:"It has a wide moat and an attractive yield, and now's the time to invest.",author:c,label:z,isVideo:a},{title:"Concerns About Enbridge's Dividend Are Overblown",link:"\u002Farticles\u002F870105\u002Fconcerns-about-enbridges-dividend-are-overblown",caption:"The wide-moat company is on course to boost its dividend and offers hefty upside.",author:c,label:z,isVideo:a},{title:"Enbridge's Growth Portfolio Is Underappreciated",link:"\u002Farticles\u002F841990\u002Fenbridges-growth-portfolio-is-underappreciated",caption:"And the company continues to reward investors with annual dividend growth.",author:c,label:z,isVideo:a},{title:"Enbridge's Economic Moat Widens",link:"\u002Farticles\u002F565094\u002Fenbridges-economic-moat-widens",caption:"Shifting economics and supply dynamics provide growth opportunities.",author:"David McColl",label:C,isVideo:a}],analysis:{id:1014019,title:"Enbridge Increases Its Dividend by 3%",locale:K,publishedDate:B,updatedDate:B,paywalled:b,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],authorDisclosure:[],body:[],pillars:{investmentThesis:{title:"Business Strategy and Outlook",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge is an energy distribution and transportation company in the United States and Canada. It operates crude and natural gas pipelines, including the Canadian Mainline system. It also owns and operates Canada's largest natural gas distribution company.",gated:a}],gated:a}]},moat:{title:"Economic Moat",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Midstream companies process, transport, and store natural gas, natural gas liquids, crude oil, and refined products. There are multiple ways for midstream companies to build moats, but efficient scale is the dominant source. Hydrocarbons are produced and consumed in different places and in different forms from how they come out of the ground. Midstream firms transport and process hydrocarbons. Once a transport route is established, there's usually little need to build a competing route. Doing so would drive returns for both routes below the cost of capital. Thus, pipelines are generally moaty because they efficiently serve markets of limited size.",gated:a}],gated:a}]},managementAndStewardship:{title:"Stewardship",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"President and CEO Al Monaco has been with Enbridge since 1995, serving in his current role since 2012. During his tenure at Enbridge, Monaco has experience in all business segments, international business development, corporate planning, and finance. His experience in various business segments, corporate development, growth projects, and finance positions him to successfully lead the proposed pipeline and gas distribution growth projects.",gated:a}],gated:a}]},enterpriseRisk:{title:"Risk and Uncertainty",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge’s profitability is not directly tied to commodity prices, as pipeline transportation costs are not tied to the price of natural gas and crude oil. However, the cyclical supply and demand nature of commodities and related pricing can have an indirect impact on the business as shippers may choose to accelerate or delay certain projects. This can affect the timing for the demand of transportation services and\u002For new gas pipeline infrastructure.",gated:a}],gated:a}]},valuation:{title:"Fair Value and Profit Drivers",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Our fair value estimate of $43 (CAD 57) per share is based on a discounted cash flow model. We believe that Enbridge’s broad network of midstream assets and geographic diversification will serve it well in the low oil and gas price environment, and crude and natural gas pipeline expansions in growing regions will fuel EBITDA growth. Our cash flow forecasts incorporate the addition of the Line 3 replacement pipeline, but we adjusted our Canadian fair value downward to a reflect a risk-weighted probability of 80% that the pipeline is built.",gated:a}],gated:a}]},notes:J},notes:J},listedCurrency:{value:"CAD",filtered:a}},{}],error:y,state:{history:{currentRoute:"\u002Fstocks\u002Fxtse\u002Fenb\u002Fquote",previousRoute:y,returnRoute:y},ids:{byTicker:{"ST::XTSE::ENB":M},byId:{"0P0000681O":M}},markets:{movers:{gainers:[],losers:[],actives:[]},quotes:{},trailingReturns:{},intradayTimeSeries:{},lastRefreshed:y},player:{nowPlaying:y},siteAlert:{message:G,type:G},user:{userType:"visitor",isAdvisor:a,contentType:"e7FDDltrTy+tA2HnLovvGL0LFMwT+KkEptGju5wXVTU="}},serverRendered:b,serverDate:new Date(1607916811524)}}(false,true,"Joe Gemino","https:\u002F\u002Fim.mstar.com\u002FContent\u002FCMSImages\u002F78x78\u002F2008-jgemino-78x78.jpg","Joe Gemino, CPA, is a senior equity analyst for Morningstar.","Joe Gemino, CPA","Senior Equity Analyst","2008","MST50","SGDLX","FB","TRRNX","DODIX","DODGX","MORN","MOAT","AAPL","V","T","DIS","DODBX","2020-12-11","p","text",null,"Stock Strategist","0P0000681O","2020-12-08T18:13:00Z","Stock Strategist Industry Reports","Qual","2020-04-16T14:59:00Z","ST","","Enbridge Inc","XTSE",{},"en-US","2020-12-08T18:28:00Z",{},"ENB"));
My question: how do I extract the information inside "byID:" so that print(item.text) will give me "0P0000681O" only.
If you only need to get a value, use a string find.
from simplified_scrapy import utils, SimplifiedDoc, req
html = req.get(
'https://www.morningstar.com/stocks/xtse/enb/quote')
start = html.find('byId:{"')
html = html[start+len('byId:{"'):]
end = html.find('":')
print(html[:end])
Result:
0P0000681O

Performing T-Test on Time Series

My boss asked me to perform a T-Test to test the significance for a certain metric we use called conversion rate.
I have collected 18 months worth of data for this metric dating April 1, 2017 - September 30th, 2018.
He initially told me to collect 12 - 14 months of the data and run a t-test to to look for significance of the metric. (Higher conversion rate means better!).
I'm not really sure how to go about it. Do I split the data up into 9 month samples i.e. Sample 1: April 2017 - December 2017, Sample 2: January 2018 - September 2018 and run a two sample t-test? Or would it make sense to compare all of the data against a mean like 0?
Is there a better approach to this? The bottom line is he wants to see that the conversion rate has significantly increased over time.
Thanks,
- Keith
My advice is to dump the t-test and look only at the magnitude of the change in the conversion rate. After all, the conversion rate is what's important to your business. By the way, looking at the magnitude of something practically relevant is called "effect size analysis"; a web search for that should turn up a lot of resources. To get started, just make a plot of the available data -- is conversion rate going up or going down or what?
Further questions should be directed to stats.stackexchange.com instead of SO. Good luck and have fun.

Why it's not a good idea to set different population sizing for test and holdout?

So we are trying to run an A/B testing with changes applying to 10% of the population. My PM told me we should set a separate control team at 10% too so we can compare the 10% test and 10% control later. My question is why is it a better idea to compare the 10% test vs and 10% control instead of 10% test vs the 90% rest population, which is essentially everyone who don't see the changes?

Mobile Data Analysis in Excel

I collected the mobile data consumption using DATA USAGE in android. Spread over days of the weeks (Monday to Sunday), I want to analyse two apps i.e. Facebook and Messenger, to check whether there was a significant data usage difference depending on the days of the weeks. Should I be using t-test or some other method? What's the best method that can be used in excel to analyse this.
P.s. Help will be much appreciated. Thanks
If you believe your data is normally distributed then statistically speaking it sounds like you're going to want to use the t-test method. You don't know the population's standard deviation so that would be my choice. However, this data should be taken over at least 30 weeks if you want the data for each weekday to be somewhat accurate.

Resources