Calculations within a spotfire column - calculated-columns

My firm decided on spotfire and then appointed me spotfire guru (cos I was there) and now I am figuring it out. We work with really large data sets (the one i'm asking about a couple of million rows). Anyway, the column is Z data and the difference between each sequential cell will show how far the machine it represents has moved. Eg:
Name Time Stamp X Y Lat Long Z Delta Z
Name 28.3.2018 10:59 0,02438 0,02888 60,49 26,96 0,037794693
Name 28.3.2018 10:59 0,02671 0,03768 60,49 26,96 0,046186649 0,00839
Name 28.3.2018 10:59 0,02409 0,0294 60,49 26,96 0,038009053 0,00818
Name 28.3.2018 11:00 0,02676 0,03768 60,49 26,96 0,046215582 0,00821
Name 28.3.2018 11:00 0,02393 0,02941 60,49 26,96 0,037915604 0,00830
Name 28.3.2018 11:00 0,02669 0,03761 60,49 26,96 0,046117981 0,00820
Name 28.3.2018 11:00 0,02341 0,02966 60,49 26,96 0,037785496 0,00833
Name 28.3.2018 11:00 0,02673 0,03758 60,49 26,96 0,046116692 0,00833
Name 28.3.2018 11:00 0,02329 0,0297 60,49 26,96 0,037742736 0,00837
Name 28.3.2018 11:00 0,02205 0,0306 60,49 26,96 0,037716873 0,00003
So, above is a dump of a few rows from the output that I have taken from excel. I run a python script on the source data (which is JSON) and it outputs the above except for the last two columns which i need to calculate. I can use spotfire to make the Z column (that's simple Pythagoras as X and Y are along and up from the reference point) but what I need is the change in Z (delta Z) through the day. In excel its easy as the formula is "=ABS(G3-G2)" and then paste it along the whole column it becomes "=ABS(G4-G3)", "=ABS(G5-G4)" and so on. I can't make it in excel as the file is too big.
The formula doesn't take the very first Z as a fixed, anchor point, it uses each one along. The data then lets me see how far the machine has moved in a certain period.
It is this that I can't solve in spotfire. All help appreciated.

UPDATE: thanks for including the timestamp column and clarifications. I still needed to use the [Row] column I created because I wanted to make sure things go in the correct order, and the timestamp isn't granular enough to ensure that. if you have a timestamp with seconds or milliseconds in your actual dataset, I suggest to use that over a [Row].
that said, I don't see too big a difference from the original results, and I think my answer below still almost completely works. the biggest difference is that the blank row for [Delta Z] is at the top of the data set instead of the bottom. I've accounted for this by changing the expression to:
Abs([Z] - First([Z]) OVER Previous([Row]))
here's the resulting table. [Delta Z] is the results column you posted above and [DZ_1] is my new column:
DeltaZ Z Row DZ_1
0.037794693 1
0.00839 0.046186649 2 0.008391956
0.00818 0.038009053 3 0.008177596
0.00821 0.046215582 4 0.008206529
0.0083 0.037915604 5 0.008299978
0.0082 0.046117981 6 0.008202377
0.00833 0.037785496 7 0.008332485
0.00833 0.046116692 8 0.008331196
0.00837 0.037742736 9 0.008373956
0.00003 0.037716873 10 2.5863000000001E-05
as an aside, you can adjust the number of decimals shown to whatever you like by going to EditĀ»Column Properties, selecting the column in question, choosing the Formatting tab, and finally setting the Decimals dropdown as desired.
first, welcome to StackOverflow. please in the future be prepared to provide a complete Minimally Complete, Verifiable Example. in terms of Spotfire, that means a sample dataset (in text) that I can copy-paste into Spotfire, including a column showing your expected results. you can create this in Excel or Notepad. please understand that I'm taking time out of my day to help you with your problem, and request that you are compelled to make it as simple as possible for me to do so.
second, welcome to Spotfire! I learned the same way as you. I strongly recommend asking your employer to pay for the TIBCO Spotfire online courses as they will provide a great base of understanding for using the tool.
with that out of the way, I've made the following assumptions about your dataset since you've not fully answered my questions about your dataset. if my assumptions are incorrect, please answer my questions about your dataset.
there is no column indicating some kind of order, such as a timestamp or row number
you do not expect any result in the final row of the dataset
to satisfy your requirements, first I needed to create a column that removes assumption #1 above. I've called this column [Row] and its expression is simply:
RowId()
this will output the literal row number for that row (as opposed to the BaseRowId() function, which shows the visual row number, after any marking and filters are applied).
I created this because in order to compare rows against one another, Spotfire requires some kind of indicator as to which row comes before the next one.
then I created a second column, [Delta Z] with the following expression:
Abs([Z] - First([Z]) OVER Next([Row]))
in other words, "for each row, take the current value of [Z] for that row and subtract it from the first value of [Z] found over all of the following rows (i.e., the next row)."
this produces the following:
Z Row Delta Z
0.24157 1 0.03424
0.27581 2 0.03195
0.24386 3 0.000149999999999983
0.24371 4
you can hide [Row] in any table visualization through the Properties dialog for that visualization, but you cannot delete it completely.

Related

Excel: How to find six different combinations of words in string?

I have been working for several days on this and have researched everything looking for this answer. I'd appreciate any help you can give.
In Excel I am searching a string of text in column A:
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
I am detecting the first word (in this case "Bought") and detecting the last word before "#" symbol (in this case "Call").
I am then detecting the price following the "#" symbol (in this case "2.75"). This number will go into column B (header "Open") or column C (header "Close") depending on the combination of words found:
Sold/Put=Close
Sold/Call=Open
Bought/Put=Open
Bought/Call=Close
Sold (by itself)=Open
Sold (by itself)=Close.
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
The combination found in the above string is: "Bought Call". Therefore the number at the end ("2.75"), goes into "Open" column.
Here's another example:
Sold 4 AI Sep 17 2021 50.0 Put # 1.5
The combination found in the above string is: "Sold Put". Therefore the number at the end ("1.5") goes into "Close" column.
I am currently using this formula to determine if the string contains "Sold" and "Call" and get the desired number and it does work:
=IF(AND(
ISNUMBER(SEARCH({"Sold","Call"},A10))),
TRIM(MID(A10,SEARCH("#",A10)+LEN("#"),255))," ")
But, I don't know how to search for all the other possible combinations.
The point behind this is to be able to paste the transaction from the broker and have most of the entry process automated. I'm sure many will benefit from this as I've not found anything like this.
I'd appreciate any help and if possible, an explanation of the formula so I can better learn.
Thanks!
I think you have the right idea, but would just extend the IF statement.
Something like the below might work for you:
=IF(ISNUMBER(SEARCH("Call", $A1)),
IF(ISNUMBER(SEARCH({"Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""),
IF(ISNUMBER(SEARCH({"!!!","!!!","Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""))
Just enter in column B and drag down; columns B through E should fill as needed.
For example:
Note that the search for "!!!" is just random characters, it can be anything that you don't think has a good chance of appearing in the string.
Here/screenshots refer:
(requires Office 365 compatible version Excel)
Main lookup
=LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))
EDIT:
Other Excel versions:
=IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)
(all that falls away is the 'Let' formula, replacing fn_1 and fn_2 with respective functions in index formula within the let making first equation somewhat longer, but otherwise identical)
Example applications
Have provided 2 examples of how one might customize to insert numeric in one of the columns (the key part to this question is really how to do lookup in first instance, from thereon it's a matter of finetuning/taking appropriate action)...
Assuming calls/buys are "long" position and strike price go in first col (here, D), and puts/sales are "short" position with strike price going in 2nd col (here, E):
Long - insert strike price col D
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
=IF(IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
Short - insert strike price col E
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=2,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
Follow same routine in previous Edits (remove Let, replace fn_1 & fn_2 with respective formulae...)
Note similarity in all 3 equations above: 2nd and 3rd contain 1st (effectively they just wrap a big old 'if' statement around 1st, use lookup_2 col (here, col K), and use mid/search to extract rate after the hashtag.
Assumes you don't have other hashtags in the sentence..
Customize as required.

Identify overlapping configurations in Excel

I'm setting up a configuration excel sheet to be imported into a database.
It has four columns
Equipment Fleet Start Date End Date Highlight Me
=============================================
A X 1-Jan-20 5-Jan-20 X
A Y 6-Jan-20
B C 1-Jan-20 3-Jan-20
B D 4-Jan-20 10-Jan-20
A Z 3-Jan-20 X
A Z 5-Jan-20 X
I need to identify and highlight overlapping configs
I'd like lines 1, 5 and 6 to be highlighted.
They are all a configuration for the same Equipment, but their configuration dates overlap
Fleet is that attribute we are configuring for the date range but has no bearing on the validation
Constraints:
I'd like to use tables (not named ranges) for this. My table is called tblFleetConfig
Yes I could do this in VBA but I don't want to deal with trusted workbooks etc. etc.
So far I have pasted this into a column on the right
=
(tblFleetConfig[#[Start Date]] >= tblFleetConfig[Start Date])
*
(tblFleetConfig[#[Start Date]] <= tblFleetConfig[End Date])
*
(tblFleetConfig[#Equipment]=tblFleetConfig[Equipment])
The result I'm getting is a 1 for the first line and 0 for every other line.
Clearly I don't understand this syntax and I'm interested in learning.
You asked a complicated one, those blanks throw a wrench into it.
Your formula is the right syntax, you need to wrap it in a SUMPRODUCT() though.
=SUMPRODUCT((tblFleetConfig[End Date]<=tblFleetConfig[#[End Date]])*(tblFleetConfig[Start Date]>=tblFleetConfig[#[Start Date]])*(tblFleetConfig[Equipment]=tblFleetConfig[#Equipment]))
This is your formula wrapped it in a SUMPRODUCT().
This will return a 1 if there is a single occurrence and a number greater than 1 if multiple.
Let me know if it works.

Excel CUBEVALUE & CUBESET count records greater than a number

I am writing a series of queries to my workbook's data model to retrieve the number of documents by Category_Name which are greater than a certain numbers of days old (e.g. >=650).
Currently this formula (entered in celll C3) returns the correct number for a single Days Old value (=3).
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]",
"[EDD_Report_10-01-18].[Days Old].[34]")
How do I return the number of documents for Days Old values >=650?
The worksheet looks like:
A B C
1 Date PL Count of Docs
2 10/1/2018 ALD 3
3 ...
UPDATE: As suggested in #ama 's answer below, the expression in step B did not work.
However, I created a subset of the Days Old values using
=CUBESET("ThisWorkbookDataModel",
"{[EDD_Report_10-01-18].[Days Old].[all].[650]:[EDD_Report_10-01-18].[Days Old].[All].[3647]}")
The cell containing this cubeset is referenced as the third Member_expression of the original CUBEVALUE formula. The limitation is now that the values for the beginning and end must be members of the Days Old set.
This is limiting, in that, I was hoping for a more general test for >=650 and there is no way to guarantee that specific values of Days Old will be in the query.
First time I hear about CUBE, so you got me curious and I did some digging. Definitely not an expert, but here is what I found:
MDX language should allow you to provide value ranges in the form of {[Table].[Field].[All].[LowerBound]:[Table].[Field].[All].[UpperBound]}.
A. Get the total number of entries:
D3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All]")
B. Get the number of entries less than 650:
E3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[0]:[EDD_Report_10-01-18].[Days Old].[All].[649]}")
Note I found something about using .[All].[650].lag(1)} but I think for it to work properly your data might need to be sorted?
C. Substract
C3 =D3-E3
Alternatively, go for the quick and dirty:
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[650]:[EDD_Report_10-01-18].[Days Old].[All].[99999]}")
Hope this helps and do let me know, I am still curious!

Excel - averaging an n amount of rows based on condition in prior column

I have this table in excel:
Date value
1/2/1970 100.00
1/5/1970 99.99
1/6/1970 100.37
1/7/1970 100.74
1/8/1970 101.26
1/9/1970 100.74
1/12/1970 100.79
1/13/1970 101.27
1/14/1970 101.95
1/15/1970 101.97
1/16/1970 101.76
1/19/1970 102.21
1/20/1970 102.70
1/21/1970 102.00
1/22/1970 101.46
1/23/1970 101.49
1/26/1970 100.97
1/27/1970 101.45
1/28/1970 101.70
1/29/1970 102.08
1/30/1970 102.19
2/2/1970 102.02
2/3/1970 101.85
These are values that I have daily, and I need to construct a sheet that takes a monthly index of the daily values, example below:
date index
1/31/1970 some_index
2/28/1970 some_index
3/31/1970 some_index
4/30/1970 some_index
I could only get this far when it came to getting the index of 30 days:
=AVERAGE(INDEX(B:B,1+30*(ROW()-ROW($C$1))):INDEX(B:B,30*(ROW()-ROW($C$1)+1)))
I'm just not sure how to structure this in the most efficient, yet correct way possible. Not all months are the same amount of days, so I was hoping to check to get all the next n rows where the date starts with a "1" for example, sometimes certain days are also missing. I can't think of a catch all approach.
With 1/31/1970 in C1 try this,
=averageifs(daily!b:b, daily!a:a, "<="&c1, daily!a:a, ">="&eomonth(c1, -1)+1)
A PivotTable might be more convenient:

Any simple way to do VLOOKUP combine "linear interpolation" in excel?

I'm making an excel sheet for calculating z-score for infant weight/age (Input: "Baby Month Age", and "Baby weight"). To do that, I need get LMS parameters first for a specific month, from below table.
http://www.who.int/childgrowth/standards/tab_wfa_boys_p_0_5.txt
(For Integer Month number, this can be done by vlookup Method without issue.) For Non-Integer Month number, I need use some kind of "linear interpolation" approach to get an approximate LMS data.
The question is, both Trend method and Vlookup method are not working for me. For Trend method, it is not working as the raw data, like L parameters is not linear data, if I use Trend method, for the several top month, return data will far from existing data. As for Vlookup method, it just finds the closest month data.
I had to use multiple "Match" and "Index" Method to do the "linear interpolation" for myself. However, I wonder whether there is any existing function for that?
My current formula for L parameters is below:
=MOD([Month Age],1)*(INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A)+1,2)-INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A),2))+INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A),2)
If we assume that months increment always by 1 (no gap in month data), you can use something like this formula to interpolate between the two values surrounding the give non-integer value:
=(1-MOD(2.3, 1))*VLOOKUP(2.3,A:S,2)+MOD(2.3, 1)*VLOOKUP(2.3+1,A:S, 2)
Which interpolates L(2.3) from data of L(2) = .197 and L(3) = .1738, resulting in .19004.
You can replace 2.3 by any cell reference. You can also change the lookup column 2 for L into 3 for M, 4 for S etc.
To answer the question whether there is some direct "interpolate" function in Excel, not that I know about, although there is good artillery for statistical estimation.

Resources