How to calculate the correlation coefficient between an equity asset and a crypto asset, which have differing numbers of data points? - statistics

My understanding is that a equivalent number of data points between two sets is a requirement to calculating a correlation coefficient. As such, I am able to calculate this for two sets of crypto assets over a defined period, since they will have the same number of data points. (The same holds true for two equities).
But this won't hold when comparing a traditional equity with a crypto asset, since traditional assets don't trade on weekends or holidays. How is this handled then?
Should I clone the Friday closing prices for the equity and simply set them to the same value for the Saturday and Sunday that follows?

Related

Planning sampling date using several criteria in Excel

I am trying to create a sampling plan for testing of raw materials. To provide some context, the criteria I am planning to factor in are the Priority level of the materials (e.g. P1 is the highest while P3 is the lowest), the delivery date of the material (Actual date of arrival), and the sampling capacity of each day (currently the capacity is at 30 SUs). If the sampling capacity of the day is exceeded, the date will be decided by back calculating using the lead times from the production need by date.
The main issue with the previous plan was that the dates were only planned around the priority level. This meant that regardless of the delivery date, whenever a new P1 item came in, it would push all the other lower priority items back, and create a backlog of untested items.
Using arbitrary values, if the material is P1, the date will be 12 days after the delivery date; P2, 13 days and P3, 14 days. After that, check if the sampling capacity is exceeded for the day. I tried to use Vlookup, but it only returns the first value rather than checking every row. I do not have Xlookup because I'm using Excel2016.
I want to create a code (either in VBA or excel) to factor in all these criteria to automatically plan a date.
The formula for sampling date based on just the priority level is =IF(C337="1", B337+12,IF(C337="2", B337+13,IF(C337="3",B337+14,""))), where C is the priority and B is the delivery date. This is what I currently have:
See below screenshot:

Value At Risk Calculation with consistently falling prices

My question is twofold:
When calculating value at risk, let's say the price of an asset or portfolio is falling 1% every month for 12 months. If I tried to calculate standard deviation, it will show up as 0 and no meaningful VAR calculation possible. Is there another way to measure VAR that takes into account consistent declines like this?
Can you recommend a source that explains simply how to calculate portfolio VAR in Python?
Thanks you.

Facing difficulty in Excel when creating a logic for moving average of different variables

So, the point is, in my dataset I have to create a variable "Moving Avg. Amt paid per sq. ft." and the formula or the logic I need is to calculate the last five values as per most recent transactions. i.e. most recent sales by date. but this average should only return value in case it matches the same building and same area variable.
This is what my data looks like
Area ID has three categories. Building number has 5 categories. Date is sorted in ascending order. Now my variable moving average should calculate last 5 averages w.r.t date but for the same building in the same area. e.g. there are buildings 1 and 2 in area 102. I need my Mov Avg. variable to calculate using conditions when it matches criteria of building 1 in 102 for past five sales and when it finds building 2 in the building number variable, it should calculate average of last 5 sales of that building in area 102.
So my approach to this issue was (which is flawed at the moment):
I calculate average of amount paid per sq. foot w.r.t area & building based on dates using the formula
=AVERAGEIFS($N$2:$N$6547,$D$2:$D$6547,D14,$C$2:$C$6547,C14,$B$2:$B$6547,B14)
but I cannot make this formula work, to calculate moving average whenever it meets the criteria. I tried the offset the point as well by 5 but the logic is not right and hence its not working and returning #value in the cells. The formula I used to offset the above condition is
=AVERAGEIFS((OFFSET(N13,5,,5)),$D$2:$D$6547,D13,$C$2:$C$6547,C13,$B$2:$B$6547,B13)
(These formulae are used in column Q of my data)
Need a support from the community as I am badly stuck in making this data useful and I am out of any ideas to make this work.
Edit 1: I am not sure how I can attach my excel file here so you may review the dataset. I have uploaded it on a third party site, for which the link is shared below, so you can view the file in detail.
https://file.io/hlciAHJOHzWA
Expected result is as I have mentioned the instruction said
"Create a variable called "mov. avg amt. paid per sq ft". For each row, this variable should calculate average amt paid per sq ft for the most recent past five sales (by date) for the same building in the same area."
And my approach to build a logic or formula to make this variable calculate moving average w.r.t date for same building in the same area doesn't seem to work because there might be some flaws.
In Office 365 you could use:
=LET(f,FILTER($N$1:N13,($B$1:B13=B14)*($C$1:C13=C14),""),
c,COUNTA(f),
s,SEQUENCE(5,,c-5),
IFERROR(IF(c<5,SUM(f)/c,SUM(INDEX(f,s))/5),""))
If there's less than 5 matches prior to the current sales it'll calculate the average of the count. If 5 or more matches it'll calculate the average of the last 5 prior to the current sale.

Tests to Compare Sales Mix Percent between Periods

Background
I wish to compare menu sales mix ratios for two periods.
A menu is defined as a collection of products. (i.e., a hamburger, a club sandwich, etc.)
A sales mix ratio is defined as a product's sales volume in units (i.e., 20 hamburgers) relative to the total number of menu units sold (i.e., 100 menu items were sold). In the hamburger example, the sales mix ratio for hamburgers is 20% (20 burgers / 100 menu items). This represents the share of total menu unit sales.
A period is defined as a time range used for comparative purposes (i.e., lunch versus dinner, Mondays versus Fridays, etc.).
I am not interested in overall changes in the volume (I don't care whether I sold 20 hamburgers in one period and 25 in another). I am only interested in changes in the distribution of the ratios (20% of my units sold were hamburgers in one period and 25% were hamburgers in another period).
Because the sales mix represents a share of the whole, the mean average for each period will be the same; the mean difference between the periods will always be 0%; and, the sum total for each set of data will always be 100%.
Objective:
Test whether the sales distribution (sales mix percentage of each menu item relative to other menu items) changed significantly from one period to another.
Null Hypothesis: the purchase patterns and preferences of customers in period A are the same as those for customers in period B.
Example of potential data input:
[Menu Item] [Period A] [Period B]
Hamburger 25% 28%
Cheeseburger 25% 20%
Salad 20% 25%
Club Sandwich 30% 27%
Question:
Do common methods exist to test whether the distribution of share-of-total is significantly different between two sets of data?
A paired T-Test would have worked if I was measuring a change in the number of actual units sold, but not (I believe) for a change in share of total units.
I've been searching online and a few text books for a while with no luck. I may be looking for the wrong terminology.
Any direction, be it search terms or (preferably) the actual names appropriate tests, are appreciated.
Thanks,
Andrew
EDIT: I am considering a Pearson Correlation test as a possible solution - forgetting that each row of data are independent menu items, the math shouldn't care. A perfect match (identical sales mix) would receive a coefficient of 1 and the greater the change the lower the coefficient would be. One potential issue is that unlike a regular correlation test, the changes may be amplified because any change to one number automatically impacts the others. Is this a viable solution? If so, is there a way to temper the amplification issue?
Consider using a Chi Squared Goodness-of-Fit test as a simple solution to this problem:
H0: the proportion of menu items for month B is the same as month A
Ha: at least one of the proportions of menu items for month B is
different to month A
There is a nice tutorial here.

Likelihood of a Distribution of Values Occurring Randomly

I have a data matrix depicting the number of telephone calls from one telephone to another, all calls are unidirectional. The rows represent days and the columns represent hours. The data is not a sample - it is the full population. Rows are days of the week and columns are one hour blocks of a 24 hour clock. Values in the cells represent the number of telephone calls from telephone A to telephone B for that specific hour.
I would like to have a repeatable measure that enables me to tell my audience that the likelihood of this distribution occurring randomly is <x.
I'd like the formula for Excel 2007 or, as a last resort, VBA code.
I've searched and found answers that tell me how to statistically determine the significance of differences between two different data sets but not how to measure for just one data set against a random outcome.
Thanx in advance.
If the total number of calls in a given hour is T, and the total calling population is P; then the number of calls from A to B should be about T/P if "random". To test whether this is really the case you'd use the Chi-squared test. I'm afraid I don't have time to give you the full answer - but it'd be the testvalue=sum((observed_i/P - (T/P))^2/(T/P)) where you check the testvalue against the chi-squared table, and you can pick off the probability too. Excel can calculate these values. Refer Chi-Squared Test for more details.

Resources