Can I get relational data into an Excel Pivot Table - excel

I have a sheet (let's go with wines as an example) that lists every bottle of wine in my cellar, when I bought it, how much I paid etc.
There's a column that describes the wine in comma-separated tags such as "Fruity, White".
I've created a pivot table from that data, with the description as a filter column. However I can't filter it by "White". I have to find every description that contains "White" such as "Dry, White", "White, Crisp" etc.
Being from an RDBMS background, my natural inclination is to put the tags in their own table keyed against the wine row so there's zero-or-more tag rows per wine row.
How, how on earth can I use that to filter the wine rows?

Yes you can do it within Excel and the description fields can remain as "Dry, White" etc as you do not need to split the comma separated values.
Lets say the Table source comprises a text column for Description, a number column for Value and a number column for Year Bought.
Your pivot is setup with the the following
Fields: Description, Value and Year Bought.
Column labels: Year Bought
Row Labels: Description
Sum of values: Sum of Value
There is a drop down label filter on the row labels - click on this and there should be an option to select Label Filters. Select this and then select Contains. You can enter say "White" which will select all your descriptions that contain white e.g. "Dry, White", "White, Crisp". The filter includes ? to represent a single character and * to represent any series of characters.
There are similar label filters for "begins with" and "ends with" as well as there negation.
I tried this in Excel 2007 and it should also work in 2003. I think in Excel 2003 you could even combine the filters e.g. contains "White" and does not contain "Dry" but in 2007 I could not find a way of doing this.

Forgive me if I'm stating the obvious, but the reason you're having problems here is that the description column is not in 1NF, and the Excel pivot interface isn't flexible enough to allow pattern-based searching.
The simplest option will be to normalise the CSV into a series of columns, each of which represents a single attribute - one column for wine colour, one for sweetness, one for country of origin and so on - and apply the filter across multiple columns. However, if (as your comment on the question suggests) wine is a metaphor for your real problem, you may not have the luxury of revisiting the design of the source data.
Another possibility might be to use a macro (or a database query - I'm not clear from your question whether you have implemented the tag system already) to pre-filter the input data on the pivot table's source sheet based on the tag values you want to search for, then re-refresh the pivot table based on that data.
A third possibility is the VBA used in this question, which looks like it will custom-filter the pivot table's visible rows.

=IF(ISERR(FIND("WHITE",UPPER(B5))),0,1)
create an extra column and add a formula. There are 2 tricks to this. One is to search for WHITE in the description column using upper - to beat the fact that excel find is case sensitive. Two is that it returns a value error if the string does not exist - so iserr will allow you to trap that and return in this example 0 if it doesn't or 1 if it does. You could substitute white and blank for 1 and 0.

you could write a script that loops through the data and adds new lines for each comma separated item in the description column. This would allow the pivot table to filter better.

Related

Aligning vertically a series of tables with text

Hi I need the text to be in a specific format in a spreadsheet to be able to upload it on a translation tool.
I have already used the text split function to separate the text in a cell with bullet points, moving each bullet point to a separate cell.
enter image description here
Then I used the transpose function to separate each set of data. For context, you are looking at fashion products.
The name of the product is on the first row, followed by a list of features (e.g. "Bracciale" means bracelet and it is followed by the list of materials)
enter image description here
Now for the last step, I need these sets to be vertical, not horizontal. Like this:
enter image description here
I would like to set up an automatic system so that every time we receive a list with hundreds of these products we do not need to copy-paste them one below the other.
With pivot tables maybe? Keep in mind that if it is too complex it might be hard to train the translators to do it each time. Please let me know your suggestions. Thank you!
I am not a programmer. I tried pivot tables but the data was in the wrong order and I am not sure how to get the data out from the pivot table with values only without the sub-menus.
My suggestion would be to use the 'Unpivot Columns' feature in the Power Query Editor - it would be really simple.
Steps:
Select the whole range
Go to Data // Get & Transform Data // From Table/Range
Uncheck 'My Table has headers' (unless it does - but doesn't look like it?)
Press OK. This will open Power Query Editor and will have actually given you column names Col1/2/3 etc, but ignore that.
Go to Add Column // Index column
Select all columns EXCEPT the new index column by Shift+clicking on those headers
Go to Transform // Unpivot Columns
Assuming the order is important, click in the Attribute column and Sort Ascending
Click in the Index column and Sort Ascending
Remove the Attribute and Index columns if you want (right click header)
Go to File // Close & Load
You will get a new table - dynamically linked to the first (ie. can be updated/refreshed) - in the unpivoted format.
Let me know if you need more details / screenshot?
Based of this trick, maybe the following is helpfull:
Formula in A5:
=DROP(REDUCE(0,A1:A3,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,,HSTACK(CHAR(10),"^"),1)))),1)
TEXTSPLIT() will use a combination of newline chars and the circumflex to split the input directly into a vertical array;
Iteration in REDUCE() will allow for stacked results;
DROP() the initial value from results.

How to Perform Row-by-Row List Operations in Power Query

I'm trying to compare CSVs from one column with CSVs in another column in the same row in Power Query. I need to ensure that all the CSVs in one column are in the other.
I tried using List.ContainsAll, but it seems like the syntax I'm using is not working. The solution shared here is very close to what I need, but it's comparing all values in a column, not the cell's values.
Here is my sample code, but I think this picture explains the parent-child columns better. This picture shows another scenario where the function also needs to work.
Table.AddColumn(#"Replaced Value", "Contractual and Technical Types Match?", each List.ContainsAll({[Technical Turbine Type]},{[Contractual Turbine Type]}))
You say your column contains a list but your image is not showing a list. Your image is showing text with commas separating them. This is what a list looks like
Assuming you really have columns of comma separated text, this ensures that everything in the Contractual Turbine Type column is also in the Technical Turbine Type column
Add custom column with formula
= List.ContainsAll(
List.Transform(Text.Split([Technical Turbine Type],","), each Text.Trim(_)),
List.Transform(Text.Split([Contractual Turbine Type],","), each Text.Trim(_))
)
You could just use this if you are not worried about spaces after the commas
= List.ContainsAll(
Text.Split([Technical Turbine Type],","),
Text.Split([Contractual Turbine Type],",")
)

Extracting text in excel

I have some text which I receive daily that I need to seperate. I have hundreds of lines similar to the extract below:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
I need to extract individual snippets from this text, so for each in a seperate cell, I the result needs to be the date, month, company, size, and price. In the case, the result would be:
FEB50-40
APR
COMPANY A
100
0.40
The issue I'm struggling with is uniformity. For example one line might have FEB50-FEB40, another FEB5-FEB40, or FEB50-FEB4. Another example giving me difficult is that some rows might have 'COMPANY A' and the other 'COMPANYA' (one word instead of two).
Any ideas? I've been trying combinations of the below but I'm not able to have uniform results.
=TRIM(MID(SUBSTITUTE($D7," ",REPT(" ",LEN($D7))), (5)*LEN($D7)+1,LEN($D7)))
=MID($D7,20,21-10)
=TRIM(RIGHT(SUBSTITUTE($D6,"$",REPT("$",2)),4))
Sometimes I get
FEB40-50(' OR 'FEB40-FEB5'
when it should be
'FEB40-FEB50'`
Thank you to who is able to help.
You might get to the limits of formulas with this scenario, but with Power Query you can still work.
As I see it, you want to apply the following logic to extract text from this string:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
text after the first : and before the first (
text between the brackets
text after the word OFFERS and before AT
text after 'AT`
These can be easily translated into several "Split" scenarios inside Power Query.
split by custom delimiter : - that's colon and space - for each ocurrence
remove first column
Split new first column by ( - that's space and bracket - for leftmost
Replace ) with nothing in second column
Split third column by delimiter OFFERS
split new fourth column by delimiter AT
The screenshot shows the input data and the result in the Power Query editor after renaming the columns and before loading the query into the worksheet.
Once you have loaded the query, you can add / remove data in the input table and simply refresh the query to get your results. No formulas, just clicking ribbon commands.
You can take this further by removing the "KB" from the column, convert it to a number, divide it by 100. Your business processing logic will drive what you want to do. Just take it one step at a time.

Column to rows and highlight difference between values in the same group

I have a huge table with data structured like this:
And I would like to display them in Spotfire Analyst 7.11 as follows:
Basically I need to display the columns that contain "ANTE" below the others in order to make a comparison. Values that have variations for the same ID must be highlighted.
I also have the fields "START_DATE_ANTE" and "END_DATE_ANTE" which have been omitted in the example image.
Amusingly, if you were limited to just what the title asks, this would be a very simple answer.
If you wanted this in a table where the rows are displayed as usual, and the cells are highlighted, you can do this by going to properties, adding a newGrouping where you select VAL_1 and VAL_1_ANTE and add a Rule, Rule type "Boolean expression", where the value is:
[VAL_1] - [VAL_1_ANTE] <> 0
This will highlight the affected cells, which you can place next to each other. You can even throw in a calculated column showing the difference between the two columns, and slap it on right next to it. This gives you the further option to filter down to only showing rows with discrepancies, or sorting by these values.
However, if you actually need it to display the POSTs on different lines from the ANTEs, as formatted above, things get a little tricky.
My personal preference would be to pivot (split/union/etc) the data before pulling it in to Spotfire, with an indicator flag on "is this different", yes/no. However, I know a lot of Spotfire users either aren't using a database or don't have leeway to perform the SQL themselves.
In fact, if you try to do it in Spotfire using custom expressions alone, it becomes so tricky, I'm not sure how to answer it right off. I'm inclined to think you should be able to do it in a cross table, using Subsets, but I haven't figured out a way to identify which subset you're in while inside the custom expressions.
Other options include generating a table using IronPython, if you're up to that.

Filtering a PivotTable by Boolean

The source data for my Excel PivotTable looks like the following (this is a simplification):
id name score
1 john 15
2 james 2
3 pat 14
4 jake 12
...
I have a PivotTable that uses this as a data source. Now, what I want to do is have the PivotTable only consider entries if their id is less than 100. This is theoretically achievable by having a Report Filter on id, and de-selecting any number greater than 100. But that's rather absurd.
How can I filter out data using a Boolean constraint? I've tried various methods, none of which worked. It seems like calculated fields are the key, but it doesn't seem possible to create a filter on calculated fields.
I'm using Excel 2011 for Mac, if that makes a difference. I'm a programmer, but I've never programmed in Excel, so if that's the solution, I'd request baby steps. :) Thank you!
AFAIK, In Excel 2011, you cannot use a report filter to apply any kind of filter. You have to manually check/uncheck the values that you want or don't want.
The alternative that I can think of is to insert a column before your data and enter the formula
=If(B2<100,TRUE,FALSE)
and copy it down using Autofill. (See screenshot below)
Now create a pivot and put the field "Less Than 100" in the report filter and simply select TRUE (See screenshot below)
If you don't want to go down that path then move the ID field to ROW LABEL from REPORT FILTER where you can use a filter.
A Report Filter is exactly what I would do, but rather than manually de-selecting the fields as you suggest you would do I would apply a Label filter to be less than the cut-off point, which in your example is 100.
I haven't used Excel on Mac, but on Windows on the PivotTable Field List, to the right of the id field click the little black arrow, and select Label Filters -> Less Than and then enter 100 in the dialogue that pops up.
Given the inherent value of PivotTables is the ability to apply filters exactly for this sort of scenario I don't think I'd do anything more complicated.

Resources