Count how many times a particular value appears with respect to value in another column - spotfire

I want to count how many times a particular value appears with respect to value in another column. ( Apologies, as i am struggling to put it in words properly. Maybe that's why I couldn't google it)
I am using spotfire and actual data set is quite big.
As per my dummy data - i want 5 more columns - a,b,c,d,e which will give me counts like table - 'what I want'
Please if someone can help.
Thanks,
AP

what you're looking for is called a Pivot Table. it doesn't look quite like what you've got in your example, and because you haven't provided a lot of information about what you're trying to do in the end, I'm working under the assumption it's just a quick example you put together? if that's not the case, please clarify your question with your end goal and I'll update my answer.
to create a Pivot Table in Spotfire:
click the Insert menu at the top of the screen
choose Transformation...
in the Insert Transformation dialog that appears, choose your data table from the top dropdown, and choose Pivot from the bottom one, then click Add...
configure the pivot like I've done in the screenshot below
click OK and confirm the Insert Transformation dialog

Related

How to get specific cell contents of an Excel sheet at another sheet dynamically?

I have an Excel file including thousands rows as follow. First column includes names and second column include group of each. I want to have all names belong to group "A" at another sheet dynamically. because name and group list may be changes. In other words, what command or function I should use to list all names belong to group "A"?
There are 3 ways to do this. The options are below. One thing you did not include in your question is what the results should look like.
Equations like =FILTER(A:B,B:B="A")
Pivot tables to use this convert data to table, then create pivot table. This requires refresh when new data is added.
Power query to use this convert data to table, then go to Data>From Table/Range. This requires refresh when data is added but you can change the "connection properties" under Data>Refresh All˅.
Now if you want all in group "A" to be listed with in one cell, then I would use option 3 with groupby as talk about here.
If the answer works for you the expectation is that you checkmark it and upvote it. If the answer does not work for you add a comment at the bottom and the problem you experience. For your situation you will need to adapt the answer.

Two tables with no unique values that I need to match and return value based on date range and zipcode fields

I have two sets of data; the first (Wind Claims) contains a StartDate, EndDate, and Zip Code field. The second (PLRB Wind) contains a Date, Zip Code, and Wind Speed field.
My goal is to get the Wind Speed from the PLRB Wind tab to the Wind Claims tab if the Date from the PLRB Wind tab is between the StartDate and EndDate on the wind Claims tab AND the Zip Code from the PLRB Wind tab matches the Zip Code on the Wind Claims tab. The point is to identify the wind speed where damage was reported.
I have tried a couple formulas; this one I actually got results but only 1227 out of 16822. I wouldnt expect to have a 100% match but definitely much more than what I am getting. I think the reason is because this formula is looking for the specific date and not looking at the date range:
=XLOOKUP(Z2&N2,'PLRB Wind'!$I$2:$I$78525&'PLRB Wind'!$D$2:$D$78525,'PLRB Wind'!$M$2:$M$78525,"")
I also tried an Index Match (this is just the Match piece of the formula)
=MATCH(1,IF('PLRB Wind'!D2>=$B$2:$B$16823,IF('PLRB Wind'!D2<='Wind Claims'!$C$2:$C$16823,IF('PLRB Wind'!I2='Wind Claims'!$Z$2:$Z$16823,1))),0)
Thank you in advance for looking at this. I appreciate any help you might be able to provide!
I'd use power query for this. Do you know what power queries are?  I was upset when I found out because of all the useful ways I could have been using it before.
You might feel differently, though. Create a new copy of your workbook for this just in case you hate it. :-)
In the "Data" ribbon of Excel, in the Get & Transform section, there's a "From Table" button. Highlight your PLRB table (including the column titles) and click that "From Table" button to create a new query from it. It will create the table and the query.
A power query editor window will pop up, presenting your query as two steps, listed in the middle of the right sidebar. The first step is to get the information from your worksheet. The second step changes the data types. Click the icon to the left of each date column's title to change the type from datetime to date because why not. On the right sidebar, change the query name to PLRB.
Now click "Close & Load" on the home ribbon. It will create a new tab with the results of your table. Leave it for now. You can delete that tab later and it won't delete the query.
So, back to your worksheet, highlight the column-title row and data rows for first three columns of the wind claims table. Create another query from table. Call it WindClaimsInput. Again, correct the datetime columns to date columns
OKAY, so now you have two queries. They both read from your workbook but they could have been from another file or text file, etc. If you like this solution then your final form might be a worksheet that doesn't actually have any source data in it, just queries that gets the raw data from elsewhere and a tab that presents the third query we're about to make.
Now for the fun part.
While still in the power query editor editing your WindClaimsInput query, near the left edge of the "Home" ribbon there's a button named "manage...". Click it, then click "Reference" to create a third query that starts with the old one. Remember, queries are only instructions. We aren't copying data until we run the queries.
Now, find the button to add a column. It should open a dialog box asking the column name and formula. Name it "PLRB" and use this formula: Table.SelectRows(PLRB, (r) => (r[Date] >= [CATFromDt] and r[Date] <= [CATThruDt] and r[ZipCode] = [ClaimZip])) Table.SelectRows is a power query function that takes two arguments:
The table (or query that returns a table), and,
A function to run on each record (aka row) of the table and return true/false. In this case, we created a function that takes one argument (r) and returns true or false.
So the above formula says "Give me a table of all rows in PLRB for the given ClaimZip zip code that also has a Date between CATFromDt and CATThruDt." Since it's a column formula, it runs once per row in. Wind Claims.
Now you have a table where the last column is another table! Specifically, the rows from PLRB that are relevant for that Wind claims row. You can single-click on any of those cells in that last column to see the subtable.
To right of the last column's title will be a little "expand" icon. Click it, choose to aggregate by max wind speed. (The right edge of the "wind speed" choice will let you change it to maximum, or average, or whatever you like.) Unclick "Use original column name as prefix". Click okay. Don't worry, you can delete this new step and try again if I didn't describe it well.
Hit "Close and Load" to see it in your workbook. If it looks right, great! Otherwise, feel free to go back and edit some more.
And now you're done! Unlike formulas it doesn't automatically refresh but when you want to refresh your output based on your input tables you can refresh that query or, in the "Data" ribbon you can click "refresh all".
In the data ribbon of Excel, in the "Get & Transform" section, there's a "Show Queries" button that toggles a sidebar that displays your queries you've made. You probably only want to keep loading your third query, so you can change the "Load to..." of the other two queries to "Connection Only".
Sorry I can't do screenshots right now.

Is there a way to match values in Excel when an individual cell has multiple values?

Above is a picture of my Excel sheet. I have 2 columns of data that have multiple data points in them (separated by commas). This is how my data is spit out after running an online psychology experiment. I'm hesitant to split text to columns because some lines only have 3 values and other lines have 20+. Essentially, I need to match values in one column to values in the second column. For example, the first value in column G needs to match with the first value in column H. The second value needs to match with the second value, etc. I don't need to match up every value in both columns, however. I only need a (defined) subset of values.
I'm not sure if this is possible to do in Excel (or any Excel add-on) without separating the values into separate columns, but any help is appreciated!
I've seen this before in survey data - the output uses "packed data" where each cell contains many values. You will need Excel 2010+ for Windows (or Excel 365) for this solution. Otherwise, there a solution that is also Mac compatible that does not involve VBA, but it takes time to construct. This approach should take you 10 mins to do - a lot of steps, but it is just clicking.
Let's say that these are your data in two columns in a table.
Click anywhere inside the table. Open the Data tab and click on From Table/Range:
This will convert your data into an Excel Table and ask you if your table has headers - yes it does. Click OK.
This will open the Power Query (PQ) editor (congratulations, you are now a step closer to data scientist, so take a selfy with this screen in the back and share on social media).
You will see in the Applied Steps on the right hand side that PQ has helpfully detected the data type in a step called Changed Type. You need to undo that because it will likely think that your comma separated numbers are just one giant number. So click the X on the left side of that step.
On the right side, you can expand out Queries as shown above. Right click on your table and select Duplicate.
NB: This is not the most efficient way to do this, but I think this is something you just want do one time and you probably don't want to go hacking through the Advanced Editor.
So now you have two tables:
Rename Table1 (2) to Output in the box on the right hand side just to create some clarity.
Right Click on the Response RT column in Output and Remove it. Click on Table1 and do the same thing to the Response column. So now you have Table1 with only the Response RT and Output with only the Responses. Now we will parse these into rows of cleaned data.
Parse Table1
First, in Table1, click on the Response RT column and in the Home tab you will see Split Column. 1) Click on that and choose By Delimiter.
2) It will default to Comma, but you need to click on Advanced options and choose the Rows radio button.
Click OK and it should turn your data into rows of separated numbers and change to the type (this time helpfully) to decimal.
Now you need to add an index. 3) Go to the Add Column tab and click on Add Index, starting from 1.
Parse Ouput Table
Now go back to Output and repeat steps 1), 2) and 3) for it as well. Then you will have to take an extra step to clean up your text column. Right-Click on the Response column and choose Transform > Trim on the data.
That will get rid of those spurious spaces.
Merge Them Back Together
While you still have the Output table selected, go to the Home tab and choose Merge Queries.
It will bring up this window:
Choose Table1 from the bottom dropdown. Click on Index on both tables and click OK. You will get something like this:
Click on the button on the top right of the Table1 column and then unselect Index and Use original column name as prefix.
Click OK. Right click the Index column and Remove it. You now have your answer, but you still need to bring it back to Excel.
Putting it back in Excel
Click on Close and Load to on the left hand of the Home tab. To keep things simple, just click OK.
It will put both Output and Table1 as worksheets into your workbook, (this is where I said it is not the most efficient approach - you can always delete the Table1 worksheet. Excel will complain when you do, but you can ignore it.) Output is your answer.
Congratulations, you just did an ETL (extract transform and load) operation in data analytics. Do another selfy with the answer and share on social media.

Excel: find and order matches by column

I´m currently working with a huge epidemiological dataset with several Excel-files. The files contain pathology and clinical report for almost 30k patients. Each patient can have several pathology and clinical reports. The patients are assigned an unique ID.
I want combine all files into one so that ID for patient X001 would contain all the information form all the files. I cannot just copy/paste because the number of rows (IDs) in the files vary.
Here is an example of what I want to accomplish.
I want to combine two lists as follows.
As you can see that List1 and List 2 vary in row numbers. Also there are IDs in list1 that are not found in list2 and vice versa.
I want to merge them so that they align and match, see image below. Can someone provide a code for this? I cannot do this manually since I have 100k rows in list1 and 30k rows in list2...that would take several weeks to do with a risk of errors.
You can merge tables combined utilizing Excels built in Power Query, which can be found under the Data tab.
Note: Photos are taken from Excel 2016
The first step is to create the queries:
Within the Get & Transform section under the Data click on New Query -> From File -> From Workbook and select the appropriate workbook that has the table you want to merge
Select the appropriate sheets in which your tables are found, and confirm that they are displaying properly
If you notice that the table is not correct, you can make changes to it via the Edit button below.
For example, if you notice that your Column headers are being treated as a normal value, you can click Use First Row as Headers under the Power Query Editor Home -> Transform
I would also recommend changing the name of the query so it makes more sense down the line
Once you are happy with the way the query is looking, click on the Close and Load Dropdown menu under the Power Query Editor Home and select Close and Load To...
Select Only Create Connection to add it into your Workbook Queries without duplicating the table.
Repeat the above steps for each table in which you are looking to merge.
Once you have all of your tables linked via Queries, you can now move on to merging them:
Under the same section of New Query select Combine Queries -> Merge
Select the two queries you are looking to merge in each of the respective boxes
Confirm that they are correct via the preview window (don't worry if not all rows show)
Rule of thumb would also be to select your largest query first, and the smaller second
Next, highlight the columns in which you are looking to merge based on. For your example it would be the ID. This is done simply by clicking on the column within the preview
Finally change the Join Kind to Full Outer and click OK
From here you should be back in the Power Query Editor
The final steps are modifying this merged query to your desired output
You should notice that there is a new column added next to your first original table with the name of the query at the top, next to the name is a button that allows you to expand out this query.
Select the appropriate columns you would like to merge into the other table and click OK
If at any point you make a mistake, you can retrace your changes under Applied Steps within the Query Settings Pane
Once you are happy with the way your newly merged query looks, go ahead and click on Close and Load
Your should now have access to your new merged query that will update based on changes made to the original connected files
If you want to make any additional changes going forward from this point just click anywhere inside of the table and you should see both the Table Tools and Query Tools tabs appear at the top

Drill down when a single cell is clicked on a cross table

I currently have a cross table that I have built that contains quite a few aggregations (using custom expressions) broken down by names on the vertical. Upon clicking on a cell I would like to show the details behind that data point. However, when clicking in the cross table it automatically selects the entire row and this shows all of the details behind that row, which is not what I want. Is there any way to setup a cross table so that you can click on a single cell (sort of like you'd have the ability to do in Excel)? The only solution I can think of would be to build multiple cross tables with single calculations, so that when clicked it will show the detail data behind that single value.
Thanks so much for the help and possible solutions!
You can write your custom expressions with an if statement - sum(if([name]='a',1,null)) write one for each name on your vertical then remove the vertical break down leaving only the expressions that have each category already split. this way you can select a single cell value. this method does require a lot more expression in your cross table.

Resources