Calculating the sum of individual squared deviations in SAS - statistics

I'm trying to calculate the individual squared deviations to perform some calculations based on these values.
I have the following dataset:
data have;
input testid $ level $ values;
datalines;
HITT1D LC1 0.45
HITT1D LC1 0.49
HITT1D LC1 0.47
HITT1D LC2 0.43
HITT1D LC2 0.39
HITT1D LC2 0.42
HITT1D LC3 0.66
HITT1D LC3 0.63
HITT1D LC3 0.64
HBEF5D LC1 0.45
HBEF5D LC1 0.49
HBEF5D LC1 0.47
HBEF5D LC2 0.43
HBEF5D LC2 0.39
HBEF5D LC2 0.42
HBEF5D LC3 0.66
HBEF5D LC3 0.63
HBEF5D LC3 0.64
;
run;
I need to calculate:
the sum of all individual squared deviations for each combination of testid and level
and the sum of of all individual squared deviations for each testid
I used the following to calculate the average for a given testid and level:
proc means data=have ;
var values;
class testid level;
output out=class_stats mean = /autoname ;
run;
to subtract from the observations in order to replicate the following example:
in which "LOT" is equivalent to the level in the data and "subject" is equivalent to testid.
I have looked into the examples given here:
Calculate mean and std of a variable, in a datastep in SAS
but I can not replicate them as these data only has one class.
unfortunately, I can not find an easy way of doing this as I am an R user and I'm new to SAS. does anyone has a cleaver idea on how to do this in a simple way in SAS?
thanks

Sounds like you are asking for the CSS statistic.
Example:
proc means data=have css ;
class testid level ;
types testid testid*level ;
var values;
run;
Results:
The MEANS Procedure
Analysis Variable : values
N
testid Obs Corrected SS
-------------------------------
HBEF5D 9 0.0882889
HITT1D 9 0.0882889
-------------------------------
Analysis Variable : values
N
testid level Obs Corrected SS
-------------------------------------------
HBEF5D LC1 3 0.000800000
LC2 3 0.000866667
LC3 3 0.000466667
HITT1D LC1 3 0.000800000
LC2 3 0.000866667
LC3 3 0.000466667
-------------------------------------------

Related

Count unique values in a MS Excel column based on values of other column

I am trying to find the unique number of Customers, O (Orders), Q (Quotations) and D (Drafts) our team has dealt with on a particular day from this sample dataset. Please note that there are repeated "Quote/Order #"s in the dataset. I need to figure out the unique numbers of Q/O/D on a given day.
I have figured out all the values except the fields highlighted in light orange color of my Expected output table. Can someone help me figure out the MS Excel formula for these four values as requested above?
Below is the given dataset. Please note that there can be empty values against a date. But those will always be found in the bottom few rows of the table:
Date
Job #
Job Type
Quote/Ordr #
Parts
Customer
man-hr
4-Apr-22
1
O
307585
1
FRU
0.35
4-Apr-22
2
D
307267
28
ATM
4.00
4-Apr-22
2
D
307267
25
ATM
3.75
4-Apr-22
2
D
307267
6
ATM
0.17
4-Apr-22
3
D
307438
3
ELCTRC
0.45
4-Apr-22
4
D
307515
7
ATM
0.60
4-Apr-22
4
D
307515
5
ATM
0.55
4-Apr-22
4
D
307515
4
ATM
0.35
4-Apr-22
5
O
307587
4
PULSE
0.30
4-Apr-22
6
O
307588
3
PULSE
0.40
5-Apr-22
1
O
307623
1
WST
0.45
5-Apr-22
2
O
307629
4
CG
0.50
5-Apr-22
3
O
307630
10
SUPER
1.50
5-Apr-22
4
O
307631
3
SUPER
0.60
5-Apr-22
5
O
307640
7
CAM
0.40
5-Apr-22
6
Q
307527
6
WG
0.55
5-Apr-22
6
Q
307527
3
WG
0.30
5-Apr-22
To figure out the unique "Number of Jobs" on Apr 4, I used the Excel formula:
=MAXIFS($K$3:$K$20,$J$3:$J$20,R3) Where, R3 ='4-Apr-22'
To figure out the unique "Number of D (Draft) Jobs" I used the Excel formula:
=SUMIFS($P$3:$P$20,$J$3:$J$20,R3,$L$3:$L$20,"D")
[1
[2

Create unique list from 2 columns and sum values per row based on that unique list from 2 value columns

Having scoured numerous posts I am still struggling to find a solution for a report I am trying to transition over to PowerBI, from MS Excel.
Problem
Create a table in the report section of PowerBI, which has a unique list of currencies (based on 2 columns) and their corresponding FXexposure, which are defined based on each currency leg from 2 columns. Below I have shown the source data and workings I use in Excel, which i am trying to replicate.
Source data (from database table)
a
b
d
d
e
f
g
Instrument
Currency 1
Currency 2
FX nominal 1
FX nominal 2
FXNom1 - Gross
FXNom2 - Gross
FWD EUR/USD
EUR
USD
-7.965264529
7.90296523
7.97
7.90
FWD USD/JPY
USD
JPY
1.030513307
-1.070305687
1.03
1.07
Instrument 1
USD
1.75862819
1.76
0.00
Instrument 2
USD
TRY
0
3.45E-04
0.00
0.00
Instrument 3
JPY
1.121782037
1.12
0.00
Instrument 4
EUR
6.2505079
6.25
0.00
FWD EUR/CNH
EUR
CNH
0.007591392
3.00E-09
0.01
0.00
Instrument 5
RUB
6.209882675
6.21
0.00
F2 = ABS(FX nominal 1)
G2 = ABS(FX nominal 2)
Report output in excel
a
b
c
d
e
FX
Long
Short
Net
**Gross **
0
0.00
0.00
0.00
0.00
RUB
6.21
0.00
6.21
6.21
EUR
6.26
-7.97
-1.71
14.22
JPY
1.12
-1.07
0.05
2.19
USD
10.69
0.00
10.69
10.69
CNH
0.00
0.00
0.00
0.00
TRY
0.00
0.00
0.00
0.00
My Excel formulas are below to recreate what i am looking for.
A2: =IFERROR(LOOKUP(2, 1/(COUNTIF(Report!$A$1:A1,Data!$B$2:$B$553)=0), Data!$B$2:$B$553), LOOKUP(2, 1/(COUNTIF(Report!$A$1:A1, Data!$C$2:$C$553)=0), Data!$C$2:$C$553))
B2: =((SUMIFS(Data!$D$2:$D$553, Data!$B$2:$B$553, Report!$A2, Data!$D$2:$D$553, ">0"))+(SUMIFS(Data!$E$2:$E$553, Data!$C$2:$C$553, Report!$A2, Data!$E$2:$E$553, ">0")))
C2: =((SUMIFS(Data!$D$2:$D$553, Data!$B$2:$B$553, Report!$A3, Data!$D$2:$D$553, "<0"))+(SUMIFS(Data!$E$2:$E$553, Data!$C$2:$C$553, Report!$A3, Data!$E$2:$E$553, "<0")))
D2: =(SUMIF(Data!$B$1:$B$553,Report!$A3,Data!$D$1:$D$553)+SUMIF(Data!$C$1:$C$553,Report!$A3,Data!$E$1:$E$553))
E2: =(SUMIF(Data!$B$1:$B$554,Report!$A3,Data!$F$1:$F$554)+SUMIF(Data!$C$1:$C$554,Report!$A3,Data!$G$1:$G$554))
Now I believe I've managed to find a hack by using the UNIQUE/SELECTCOLUMNS function, but when you try and graph the output it is very small (as if there is other data it is trying to find behind the scenes). Note i tend to filter on date to get the output I need (this is mapped using relationships across other data tables).
FX =
DISTINCT (
UNION (
SELECTCOLUMNS ( DATA, "Date", [DATE], "Currency", [CURRENCY1], "FXNom", [FXNOMINAL1] ),
SELECTCOLUMNS ( DATA, "Date", [DATE], "Currency", [CURRENCY2], ,"FXNom", [FXNOMINAL2] )
)
)
If anyone has any ideas I would be very grateful as I still feel my workaround is more of a lucky hack.
Thanks!
The approach that you're using looks nearly ideal. From a dimensional model perspective, you want one column for values and one column for currency labels. So selecting those pairs as different tables and appending with UNION is the right way to go. Generally, I think it's better to do all the transformation you can in power query, using DAX this way can lead to some limitations.
But if we're going with DAX, I do think you want to get rid of DISTINCT. This could cause identical positions to be collapsed into a single row and you'd lose data this way.
FX =
UNION (
SELECTCOLUMNS ( FX_Raw, "Date", "FakeDate", "Currency", [CURRENCY 1], "FXNom", [FX nominal 1] ),
SELECTCOLUMNS ( FX_Raw, "Date", "FakeDate", "Currency", [CURRENCY 2], "FXNom", [FX nominal 2] )
)
And then a few measures:
Long =
CALCULATE(sum(FX[FXNom]), FX[FXNom] >= 0)
Short =
CALCULATE(sum(FX[FXNom]), FX[FXNom] < 0)
Gross =
SUMX( FX, if(FX[FXNom] > 0, FX[FXNom], 0-FX[FXNom]))
Net =
SUM(FX[FXNom])
Seems to produce the desired result:

Add sections of a table column as new Columns - Power Query

I am having the following Unpivoted table that contains Stat-tested % values and their Stat-letters and Stat-Letters position indicators on separate rows.
----------------------------------------
CODE | ATTR | TEXT | VALUE
----------------------------------------
1 mean I love it 0.45
2 mean I love it 0.67
3 mean I love it 0.49
4 mean I love it 0.21
5 mean I love it 0.66
1 mean I love it abd
2 mean I love it e
3 mean I love it cd
4 mean I love it a
5 mean I love it ab
1 mean I love it 1
2 mean I love it 1
3 mean I love it 1
4 mean I love it 1
5 mean I love it 1
1 wt-mean I hate it 0.22
2 wt-mean I hate it 0.56
3 wt-mean I hate it 0.13
4 wt-mean I hate it 0.89
5 wt-mean I hate it 0.50
1 wt-mean I hate it ab
2 wt-mean I hate it ae
3 wt-mean I hate it c
4 wt-mean I hate it b
5 wt-mean I hate it de
1 wt-mean I hate it 1
2 wt-mean I hate it 1
3 wt-mean I hate it 1
4 wt-mean I hate it 1
5 wt-mean I hate it 1
I want to group on the CODE column and add the Stat-tested Letters and position indicators as separate columns like below:
----------------------------------------------------------------
CODE | ATTR | TEXT | VALUE LETTERS POSITION
----------------------------------------------------------------
1 mean I love it 0.45 abd 1
2 mean I love it 0.67 e 1
3 mean I love it 0.49 cd 1
4 mean I love it 0.21 a 1
5 mean I love it 0.66 ab 1
1 wt-mean I hate it 0.22 ab 1
2 wt-mean I hate it 0.56 ae 1
3 wt-mean I hate it 0.13 c 1
4 wt-mean I hate it 0.89 b 1
5 wt-mean I hate it 0.50 de 1
The problem i am encountering while grouping the data on Value column, is that the column has mixed data types (text, number). How to split these into individual columns as shown below?
You can insert new custom-columns for this, check with try if the value is a number.
My Source is Tabelle1
let
Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
Change_Type = Table.TransformColumnTypes(Quelle,{{"CODE", Int64.Type}, {"ATTR", type text}, {"TEXT", type text}, {"VALUE", type any}}),
Custom_Number = Table.AddColumn(Change_Type, "Number", each if try Number.From([VALUE]) < 1 otherwise null = true then [VALUE] else null),
Custom_Letters = Table.AddColumn(Custom_Number, "Letters", each if (try Number.From([VALUE]) >= 1 otherwise null) = null then [VALUE] else null),
#"Hinzugefügte benutzerdefinierte Spalte" = Table.AddColumn(Custom_Letters, "POSITION", each if [Number] = null and [Letters]= null then [VALUE] else null),
Grouped_Rows = Table.Group(#"Hinzugefügte benutzerdefinierte Spalte", {"CODE", "ATTR", "TEXT"}, {{"VALUE", each List.Max([Number]), type nullable number}, {"LETTERS", each List.Max([Letters]), type nullable text}, {"POSITION", each List.Max([POSITION]), type nullable number}})
in
Grouped_Rows

pandas custom sorting multilevel index

I have the following example dataset, and I'd like to sort the index columns by a custom order that is not contained within the dataframe. So far looking on SO I haven't been able to solve this. Example:
import pandas as pd
data = {'s':[1,1,1,1],
'am':['cap', 'cap', 'sea', 'sea'],
'cat':['i', 'o', 'i', 'o'],
'col1':[.55, .44, .33, .22],
'col2':[.77, .66, .55, .44]}
df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)
Out[1]:
col1 col2
s am cat
1 cap i 0.55 0.77
o 0.44 0.66
sea i 0.33 0.55
o 0.22 0.44
What I would like is the following:
Out[2]:
col1 col2
s am cat
1 sea i 0.33 0.55
o 0.22 0.44
cap i 0.55 0.77
o 0.44 0.66
and I might also want to sort by 'cat' with the order ['o', 'i'], as well.
Use sort_values and sort_index
df.sort_values(df.columns.tolist()).sort_index(level=1, ascending=False,
sort_remaining=False)
col1 col2
s am cat
1 sea i 0.33 0.55
o 0.22 0.44
cap i 0.55 0.77
o 0.44 0.66
Convert the index to categorical to get the custom order.
data = {'s':[1,1,1,1],
'am':['cap', 'cap', 'sea', 'sea'],
'cat':['i', 'j', 'k', 'l'],
'col1':[.55, .44, .33, .22],
'col2':[.77, .66, .55, .44]}
df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)
idx = pd.Categorical(df.index.get_level_values(2).values,
categories=['j','i','k','l'],
ordered=True)
df.index.set_levels(idx, level='cat', inplace=True)
df.reset_index().sort_values('cat').set_index(['s','am','cat'])
col1 col2
s am cat
1 cap j 0.44 0.66
i 0.55 0.77
sea k 0.33 0.55
l 0.22 0.44
As of Pandas 1.1 there is another option with the key param of sort_values.
SORT_VALS = {"am": ["sea", "cap"]}
def sorter(column):
if column.name not in SORT_VALS:
return column
mapper = {val: order for order, val in enumerate(SORT_VALS[column.name])}
return column.map(mapper)
new_df = df.sort_values(by=["s", "am", "cat"], key=sorter)
# col1 col2
# s am cat
# 1 sea i 0.33 0.55
# o 0.22 0.44
# cap i 0.55 0.77
# o 0.44 0.66
You can also use pd.Categorical in the sorter and return a categorical Series for custom sort columns which may have different performance implications depending on your scenario, but note that there is a soon-to-be-fixed bug in pandas that can prevent multi-column sorts with Categorical sorting.

How to count number of peaks in graph ? -graph analysis-

I have this curve that contains certain peaks - I want to know how to get the number of these peaks.
Sample Data:
0.10 76792
0.15 35578
0.20 44675
0.25 52723
0.30 27099
0.35 113931
0.40 111043
0.45 34312
0.50 101947
0.55 100824
0.60 20546
0.65 114430
0.70 113764
0.75 15713
0.80 83133
0.85 79754
0.90 17420
0.95 121094
1.00 117346
1.05 22841
1.10 95095
1.15 94999
1.20 18986
1.25 111226
1.30 106640
1.35 34781
1.40 66356
1.45 68706
1.50 21247
1.55 117604
1.60 114268
1.65 26292
1.70 88486
1.75 89841
1.80 49863
1.85 111938
The 1st column is the X values, the 2nd column is the y values.
I want to write a macro or formula that tell me how many peaks in this graph.
Note: this graph is actualy ploted and exported from matlab, so if there is a way i can tell my code to do it for me from matlab it would be also great!
if your data was in A1:B36 then this formula
=SUMPRODUCT(--(B2:B35>B1:B34),--(B2:B35>B3:B36))
returns 11 peaks
It checks if
B2 is higher than B1 and B3, if so counts it as a peak
then if B3 is higher than B2 and B4, if so counts it as a peak and so on
[Updated: VBA request added]
Sub GetMax()
Dim chr As ChartObject
Dim chrSeries As Series
Dim lngrow As Long
On Error Resume Next
Set chr = ActiveSheet.ChartObjects(1)
Set chrSeries = chr.Chart.SeriesCollection(1)
On Error GoTo 0
If chrSeries Is Nothing Then Exit Sub
For lngrow = 2 To UBound(chrSeries.Values) - 1
If chrSeries.Values(lngrow) > chrSeries.Values(lngrow - 1) Then
If chrSeries.Values(lngrow) > chrSeries.Values(lngrow + 1) Then
chrSeries.Points(lngrow).ApplyDataLabels
With chrSeries.Points(lngrow).DataLabel
.Position = xlLabelPositionCenter
.Border.Color = 1
End With
End If
End If
Next
End Sub

Resources