identify when values in a column change in spotfire - calculated-columns

I am trying to create a calculated column that flags/counts the changes in values across rows in another column, in Spotfire. Below is an example of the data types I'm looking at and the desired results.
My hope is that for each Location, and ordered along Time, I can identify when the values of "colors" changes and have running count so that each cluster of similar values between changes is given the same label (Cluster Desire 1) for each Location. It would be best if the running count of clusters can restart at each location but this is not crucial. Any help would be more than appreciated!

I thought of a way to do it, relying on one intermediate column (I used two just to make it a bit clearer).
First: the concatenation of values for each row within its Location: called [concatString]
Concatenate(Concatenate([Color]) over (Intersect([Location],AllPrevious([Time]))),', ')
Spotfire defaults to comma followed by space as a separator: I could not find a way of changing that in this kind of expression.
Then within each [concatString] I remove repeated values. The complication is that the last one did not have the comma+space, and I did not manage to make the regular expression I am using understand that. So my workaround was to add a final comma+space to [concatString]. Hence the extra Concatenate(..).
The formula for the column without repetitions, [consolidatString] is:
RXReplace([concatString],"(\\w+\,\\s)\\1+","$1","g")
Then what we have achieved is an individual value for each line we want to group. We can then simply rank [consolidatString] to achieve the desired column:
DenseRank([consolidatString],[Location])

Related

Taking means of irregular amounts data

I'm not able to take the means for a large dataset given that the amount of attributes is irregular.
I have posted a simplified case for the problem. It explains the problem very well.
An idea that I came up with: Make a filter to condition on a single attribute. However, still, I don't see a way to do this in an efficient way (other then doing it all by hand).
see excel file:
All help is much appreciated.
I'm basically looking for a function/method to achieve taking means of all different attributes conditioned on each person for a large dataset without doing it by hand.
You can use AVERAGEIFS() inside an IF:
=IF(OR(A2<>A1,B2<>B1),AVERAGEIFS(C:C,A:A,A2,B:B,B2),"")
the ifrst part of the if tests whether the row starts a new group either by the person or the attribute changing. Then it uses AVERAGEIFS() to return the correct average of that group. otherwise it returns a blank
What you want to do can be accomplished very simply with a pivot table.
Simply select one of the cells inside the range of data you want to process(See the video for general use of a pivot table https://www.youtube.com/watch?v=iCiayB6GrpQ )
go the insert tab and insert pivot table.
Once you have it, simply check people, attribute, and values. Then drag people and attribute into rows, drag valut into the values window, select the drop down list and change it from sum of value to average and you should be done. https://i.stack.imgur.com/nYEzw.png

Spotfire DenseRank by category, do I use OVER?

I'm trying to rank some data in spotfire, and I'm having a bit of trouble writing a formula to calculate it. Here's a breakdown of what I am working with.
Group: the test group
SNP: what SNP I am looking at
Count: how many counts I get for the specific SNP
What I'd like to do is rank the average # of counts that are present for each SNP, within the group. Thus, I could then see, within a group, which SNP ranks #1, #2, etc.
Thanks!
TL;DR Disclaimer: You can do this, though if you are changing your cross table frequently, it may become a giant hassle. Make sure to double-check that logic is what you'd expect after any modification. Proceed with caution.
The basis of the Custom Expression you seem to be looking for is as follows:
Max(DenseRank(Count() OVER (Intersect([Group],[SNP])),"desc",[Group]))
This gives the total count of rows instead of the average; I was uncertain if "Count" was supposed to be a column or not. If you really do want to turn it into an average, make sure to adjust accordingly.
If all you have is the Group and the SNP nested on the left, you're done and good to go.
First issue, when you want to filter it down, it gives you the dense rank of only those in the filtered set. In some cases this is good, and what you're looking for; in others, it isn't. If you want it to hold fast to its value, regardless of filtering, you can use the same logic, but throw it in a Calculated column, instead of in the custom expression. Then, in your CrossTable Aggregation, get the max of the Calculated Column value.
Calculated Column:
DenseRank(Count() OVER (Intersect([Group],[SNP])),"desc",[Group])
Second Issue: You want to pivot by something other than Group and SNP. Perhaps, for example, by date? If you throw the Date across the top, it's going to show the same numbers for every month -- the overall numbers. This is not particularly helpful.
To a certain extent, Spotfire's Custom Expressions can handle this modification. If you switch between using a single column, you could use the following:
Max(DenseRank(Count() OVER (Intersect([${Axis.Columns.ShortDisplayName}],[Group],[SNP])),"desc",[Group],[${Axis.Columns.ShortDisplayName}]))
That would automatically pull in the column from the top, and show you the ranking for each individual process date.
However, if you start nesting, using hierarchies, renaming your columns, or having multiple aggregations and throwing (Column Names) across the top, you're going to start having to pay a great deal to your custom expression. You'll need to do some form of string replacement around the Axis.Column, or use expression instead of Short Names, and get rid of Nests, etc.
Any layer of complexity will require this sort of analysis, so if your end-users have access to modify the pivot table... honestly, I probably wouldn't give them this column.
Third Issue: I don't know if this is an issue, exactly, but you said "Average Counts" -- Average per day? Per Month? When averaging, you will need to decide if, for example, a month is the total number of days in month or the number of days that particular payor had data. However you decide to aggregate it, make sure you're doing it on the right level.
For the record, I liked the premise of this question; it's something I'd thought would be useful before, but never took the time to try to implement, since sorting a column or limiting a table to only show the top 10 values is much simpler

Fix the structure of the SSRS Matrix

I have been working on a small project. I am trying to display all the results in the same row without NULL values. I've written a small expression to remove the Null values already "=IIF(IsNothing(Fields!RegisterNo.Value),True,False)". However, the rows seem to be moving one level down as it is displayed in the picture ResultMatrix1. I want the results to be on the same level. Can you please tell me if this is possible and how I can achieve it. Is it something to do with the groupings or something else?
Design Groupings
By default, when you create a table, there is a Row Group called "Details" that is not actually grouped by anything. This causes it to produce one row for each row from the dataset. Since you are trying to group these, you need to make sure that innermost group is grouped by your Staff Ref No.
In the lower-right cell, you may need to change the expression to use a Max function. This will simply avoid arbitrarily showing blanks when they happen to be sorted before a real value within that group.

Is there a way to replace cells of a particular value with multiple cells?

Let's say I need to replace any cell that has a value of "outgoing" with multiple cells such as (0), (1), (0), (0), (2), in Excel. Is there a way to actually make this happen? I am doing this for a research project. Every item in my data needs to be coded on five different scales. There are 30-or-so items make up for almost half of the data. It would be enormously helpful to be able to simply replace the high frequency items with the five values at once.
I am not sure I completely understand the result you are looking for but here goes:
How about using the Find and Replace functionality to replace all instances of "outgoing" with "(0),(1),(0),(0),(2)" and then use the Text to Columns functionality to split the single column with "(0),(1),(0),(0),(2)" in to five separate columns, thus each value would be in its own cell.
You would need to split based on a delimiter (probably ",") and you should do all your replacing before you start splitting. Obviously you should test on some sample data first - Find and Replace is not your friend if you are not certain about your data set.

Find when two different values match in a table

I have a table with location in one column and a code in other. I want to choose a location and code and when both are on the same row, pull out this data.
How would i go about this?
The location and code correspond to a value which will then be plotted on a bar graph as i want to compare all the different combinations of location and code.
I have about 1000 values so some are repeated together many times.
If there is many repeated this might help you with that.
When that's solved I think INDEX and MATCHwould be sufficient

Resources