In Tableau, how do I use two parts of a pivotted column for x and y values on a graph? - pivot

I'm trying to plot some data (standard curves for analytical chemistry) where the x axis is the mass of a compound I added to a solution, and the y axis is the signal recorded from an instrument (peak height on a mass spectrometer). I'd like Tableau to color code the data by compound (compound A, compound B, compound C, etc.), so that I'd wind up with a graph that looks something like this:
The original structure of my data was like this:
SampleID | Mass A | Mass B | ... | Signal A | Signal B | ...
standard 0 | 0| 0| ... | 0| 0| ...
standard 5 | 2.535| 2.555| ... | 0.494| 1.240| ...
standard 25| 12.675| 12.775| ... | 2.426| 7.235| ...
I know how to make graphs one compound at a time with these original data, but for the purposes of other analyses I'm doing with these data and because I want multiple compounds on the same graph, I've pivotted them so that the structure is now like this:
SampleID | Compound | Parameter | Value
standard 0 | A | Mass | 0
standard 0 | A | Signal | 0
standard 5 | A | Mass | 2.535
etc.
How do I make a graph where the mass is on the x axis, the signal is on the y axis, and the points are colored by compound? I don't see a good way to do it when my data are in this format. I've tried making new calculated variables where the value = NULL if the parameter is not equal to "Mass" and another calculated variable where the value = NULL if the parameter is not equal to "Signal" and then putting those pills on the columns and rows, but that's not working. Is there a way to do this in Tableau with data structured like this pivotted form?
Alternatively, is there a way to spread my pivotted data so that the new structure is like this:
SampleID | Compound | Mass | Signal
standard 0 | A | 0| 0
standard 5 | A | 2.535| 0.494
standard 25| A | 12.675| 2.426
standard 0 | B | 0| 0
etc.
and would that work better?
(For R users, that last bit would be the equivalent of the tidyr package gather and spread functions.)

To make the second structure appear like the third, add a calculated field called Mass defined as if Parameter = "Mass" then Value end. Do the same for Signal.
You can then hide the fields Parameter and Value if you like, and work with Mass and Value instead.
Put AVG(Mass) on the Columns Shelf and AVG(Signal) on the Rows shelf -- AVG, not ATTR. Then finally, put [Sample Id] on detail.

If I had to deal with this, I'd prefer to pre-process the data so that it has the format "SampleID | Compound | Mass | Signal", that would make Tableau chart straightforward.
I think there's a way to achieve the same with the data structure you have, but it's more tricky. So, if I understand correctly, you have the data it this form:
SampleId Compound Parameter Value
standard 5 A Mass 2.535
standard 5 A Signal 0.494
standard 5 B Mass 2.555
standard 5 B Signal 1.24
standard 25 A Mass 12.675
standard 25 A Signal 2.426
standard 25 B Mass 12.775
standard 25 B Signal 7.235
1) You can create calculated fields for Mass and Signal using level of detail expressions, that exclude the Parameter granularity:
Mass
{exclude [Parameter] : min(if [Parameter] = 'Mass' then [Value] else NULL end)}
Signal
{exclude [Parameter] : min(if [Parameter] = 'Signal' then [Value] else NULL end)}
That will "collapse" nulls in case Parameter is not included in the view.
2) Using the Scatter Plot visualization, you can pull Mass to columns and Signal to rows, add Compound to Color pane and SampleId to Detail pane. The plot will look like this:

Related

Compute Correlation Dataframe for each Vector Row by Index Python

I have a dataframe with 500 columns indexed by date, with four years of data.
| Date | A | AAL | AAP | AAPL | ABC ......
| 1/2/2004 | 18.442521 |25.954398 |1.38449 |11.528444......
| 1/5/2004 | 18.922795 |25.718507 |1.442394 |11.919131...
| 1/6/2004 | 19.518334 |26.177538 |1.437189 |11.870028....
.
.
. etc...
I would like to calculate the Pearson correlation matrix for each day, so each row. I want to save the matrices by date, in the most space efficient manner readable by R. (Right now my goal is separate sheets, by index date, in Excel. I am open to suggestions.)
I have tried several ways, but this seemed the most promising, because I could not apply the corr() to a df.groupby.
However this method returned empty dataframes, and now I am stuck!
I am looking for a method that doesn't involve iteration.
def do_Corr(df_group):
"""Apply the function to each group in the data and return one result."""
X = df_group.corr()
return X
df.groupby([df.index.year,df.index.month,df.index.day]).apply(do_Corr).dropna()
You probably want df.T.corr(). .T transposes the dataframe, so rows becomes columns, then you can apply .corr() method.

Dynamic Data Validation lists based on VLookup

I'm trying to add a custom 'discount' list to my spreadsheet.
I've got a table that contains all the data, and has costs for the standard 'used' value, then also the values at a 5% discount and a 10% discount.
Example:
+---------+-------------------+------+------------+-------------+
| Code | Role | Used | Used - 5% | Used - 10% |
+=========+===================+======+============+=============+
| Test001 | Employee | 5.67 | | |
+---------+-------------------+------+------------+-------------+
| Test002 | Junior Technician | 9.80 | 9.31 | 8.38 |
+---------+-------------------+------+------------+-------------+
| Test003 | Project Manager | 15 | | |
+---------+-------------------+------+------------+-------------+
| Test004 | Engineer | 20 | 19 | 17.10 |
+---------+-------------------+------+------------+-------------+
I've then got a Data validation list which returns all other the 'Roles' to select from. On the back of this this populates the Cost cell.
Example:
+----------+----------+----------+-------+
| Role | VLOOKUP | Discount | Cost |
+==========+==========+==========+=======+
| Employee | | | 5.67 |
+----------+----------+----------+-------+
| Engineer | 5%,10% | 10% | 15.10 |
+----------+----------+----------+-------+
What I want to do is have a list to be populated with 5%, 10% if there is that option. I'd like to achieve this without vba (I could easily achieve this with vba but trying to keep it all in the worksheet)
My VLOOKUP Column is populated using:
=CONCATENATE(IF(VLOOKUP(A2,INDIRECT("Test[[Role]:[Used - 10%]]"), 3, FALSE) <> "", "5%", ""),
IF(VLOOKUP(A2,INDIRECT("Test[[Role]:[Used - 10%]]"), 4, FALSE) <> "", ",10%", ""))
The issue comes when trying to do the data validation. It accepts the formula (tried using the above to no avail in the data validation) but populates the drop down list with just the one value of 5%,10% instead of interpreting it as a csv.
I'm currently using this to attempt to populate the Discount Drop Down
=OFFSET(INDIRECT(ADDRESS(ROW(), COLUMN())),0, -1)
It is possible assuming your version of Excel has access to the dynamic functions FILTER and UNIQUE. Let's go through a couple of things, and here is a google doc where this is demonstrated. I also included an online excel file*.
It isn't necessary to calculate the cost in the setup table (A:E). You can just use a character to mark availability (and in some versions it was difficult to make the FILTER work with comparisons like <>"", etc, when ="x" worked fine).
You can get an array of available discounts by using FILTER, INDEX and MATCH. See Col P. You use INDEX/MATCH to return a single row of the array containing the discounts (in this case D:E), and then use that row to filter the top row (D1:E1) which has the friendly discount names and return it as an array.
It isn't necessary to concat the discount list the way you're doing. You can use TEXTJOIN, FILTER, INDEX and MATCH. See Col I. You just wrap the calculation that generates the array of discount names (step 2) in TEXTJOIN to get a string.
The validation is accomplished by referencing the output of step 2. I don't think that the data validation dialog can handle the full formula, so I pointed it to Cols O:Q. Col O is included in the validation so that you can get an empty spot at the top of the list, but Google Docs seems to strip it out.
You can just calculate the discounted cost from the selected option. See Col K. I included the original cost in Col L so you can see it.
you will need a microsoft account to view

EXCEL find the last relative maximum in a array (formula, not VBA)

I have a range containing values such as:
169.7978
168.633
168.5479
168.7819
167.7407
165.4146
165.1232
I don't need the maximum value of the range, i.e., the first cell in this example), but the last relative maximum, which in this case is the fourth cell. Is there a way to get this value without having to write a VBA macro? The formula must be general enough to work with a multiple number of maxima.
It may be a bit limited, but you may start somewhere as below.
Stated array in the OP is:
+----------+---+
| y | x |
+----------+---+
| 169.7978 | 1 |
| 168.633 | 2 |
| 168.5479 | 3 |
| 168.7819 | 4 |
| 167.7407 | 5 |
| 165.4146 | 6 |
| 165.1232 | 7 |
+----------+---+
Given this, you can find direct adjacency relative min/max with the following helper columns
Assign a Global_Rank helper column and look for y distro identical trend on both adjacent f(x) with the following formulas ( assuming your data is sorted by the x index )( formulas from Row 2 and filled down ).
RelativeMax:
=IF(AND(D2<=D1,D2<=D3),"RelativeMax","")
RelativeMin:
=IF(AND(D2>=D1,D2>=D3),"RelativeMin","")
Modify as needed. Hope this helps.
Edit:
Although...
If you're going to assume the data is ordered properly, you could also just use =IF(AND(B2>=B1,B2>=B3),"RelativeMin",IF(AND(B2<=B1,B2<=B3),"RelativeMax","")) and skip all the malarkey. This should work with multiple maxima/minima. Please report back with results from your dataset!

Convert Excel Raster into Shapefile

I have an excel table in which each cell represents (NOT CONTAINS) a coordinate-pair and a value. For example Sheet 1:Cell A1 contains an X-coordinate which increases 25m downwards; Sheet 2: Cell A1 contains a Y-Coordinate which increases along; and Sheet 3: Cell A1 contains a value. Thus in effect, this is a Raster file made up of 3 Excel data sheets with a resolution of 25m with Sheet 1 representing the X-Axis, Sheet 2 the Y-Axis and sheet 3 a value within the cell.
Table structure excerpt - 200 Columns / 2000 Rows
Table "XCoord"
3544399.00 | 3544399.25 | 3544399.50 | 3544399.75 | 3544340.00 | ...etc
3544231.00 | 3544231.25 | 3544231.50 | 3544231.75 | 3544232.00 | ...etc
3544135.00 | 3544135.25 | 3544135.50 | 3544135.75 | 3544136.00 | ...etc
Table "YCoord"
584449.00 | 584449.25 | 584449.50 | 584449.75 | 584449.00 | ...etc
584431.00 | 5844431.25 | 584431.50 | 584431.75 | 584431.00 | ...etc
584429.00 | 584429.25 | 584429.50 | 584429.75 | 584429.00 | ...etc
Table "Concentration"
0.0023 | 0.0025 | 0.0020 | 0.0027 | 0.0066 | ...etc
0.0011 | 0.0034 | 0.0056 | 0.0078 | 0.0033 | ...etc
0.0016 | 0.0026 | 0.0046 | 0.0003 | 0.0005 | ...etc
So you see, for the cells - the xcoord the ycoord and the concentration can be determined.
This is a raster built with 3 tables. My problem is how to map this into a GIS application. The values in the Table "concentration" are derivitives calculated out of other tables which include the plume-dispersion parameters. So in effect, this worksheet is a very ingenious way of calculating plumes without using expensive plume modelling software. I am using ArcGIS Advanced (Info), Safe fme, Excel.
I have to convert this into a raster image or point feature class. Does anyone know how I could translate this data out of excel?
Thanks for any tips,
RB
I'm not sure what you've tried so far and what other technology's available to you, but there are many examples on the internet of using Excel + VBA to generate shapefiles, or using other programming languages like Python.
You may want to take a look at the Geographic Information Systems Stack Exchange site - there are some good examples there of how to interface between Excel and common GIS tools, for example, "How can I convert an Excel file with X and Y columns to a shapefile". The basic info is there, you'll just need to adjust your specific code to work for having data across sheets instead of down rows.
If you wanted to convert your sheets data to a single sheet with x, y, data columns, you could write a VBA script to loop over each cell in the first sheet and extract the data from the same address on the other sheets.
To get a more specific answer, you'll need to post a more specific question regarding what method you're trying to do, and what part you need help on.

Count number of rows where multiple criteria are met

I'm trying to generate a table that shows a count of how many items are in any given status on any given day. My result table has a set of Dates down column A and column headers are various statuses. A sample of my data table with headers looks like this:
Product | Notice | Assigned | Complete | In Office | In Accounting
1 | 5/5/13 | 5/7/13 | 5/9/13 | 5/10/13 | 5/11/13
2 | 5/5/13 | 5/6/13 | 5/8/13 | 5/9/13 | 5/10/13
3 | 5/6/13 | 5/9/13 | 5/10/13 | 5/10/13 | 5/10/13
4 | 5/4/13 | 5/5/13 | 5/7/13 | 5/8/13 | 5/9/13
5 | 5/7/13 | 5/8/13 | 5/10/13 | 5/11/13 | 5/11/13
If my output table were to contain a set of dates in the first column with the statuses as headers, I need a count of how many rows were at the given status and had not yet transitioned to the next status so that in the Notice column, I'd have a count of rows where the Notice Date was <= X AND where the Assigned, Complete, In Office, In Accounting are all greater than X.
I've used a Sum(if(frequency(if statement to get me REALLY close but I feel like I need to have an AND statement within the second IF like this =SUM(IF(FREQUENCY(IF(AND
Here's what I have that won't work:
=SUM(IF(FREQUENCY(IF(AND(Table1[Assigned]<=A279,Table1[[Complete]:[In Accounting]]<=A279),ROW(Table1[[Complete]:[In Accounting]])),ROW(Table1[[Complete]:[In Accounting]]))>0,1))
If I take the "AND" portion out, this works fine except I need it to ONLY count rows where the given status actually has a date so if an "Assigned" date is empty, I don't want that row to be counted for the Assigned column.
Here's an example of what I'd expect to see in the results. I've listed the count in the each column as well as the corresponding product numbers in parenthesis. The corresponding product numbers are for reference only and won't actually be in the result table.
Date | Notice | Assigned | Complete
5/6 | 2 (1,3) | 2 (2,4) | 0
5/7 | 2 (3,5) | 2 (1,2) | 1 (4)
5/8 | 1 (3) | 2 (1,5) | 1 (2)
OK, assuming you have the original data in A1:F6 then with 2nd table headers in B9:D9 and row labels in A10:A12 then you can use this "array formula" in B10
=SUM((B$2:B$6<=$A10)*(MMULT((C$2:$F$6>$A10)+(C$2:$F$6=""),TRANSPOSE(COLUMN(C$2:$F$6)^0))=COLUMNS(C$2:$F$6)))
confirmed with CTRL+SHIFT+ENTER and copied down and across (see screenshot below)
As you can see the results are as per your requirement. If you replace dates with blanks it will still work
MMULTis a way to get a single value from each row even when you are looking at multiple columns.
I used cell references because I think that's easier, especially when copying the formula across and having a reducing range.......but you can use structured references if you want
Have you tried using COUNTIFS to count based on multiple criteria. It is fairly well documented here: http://office.microsoft.com/en-us/excel-help/countifs-function-HA010047494.aspx (2007+ only)
Basically, you use it like
=COUNTIFS(first_range_to_check, value_you_want_in_first_range, ...)
where the ... represents as many pairs as you want (up to 127 total pairs), note the conditions are AND connection so if you have two pairs, the first pair AND the second pair must return true for that row to count.

Resources