How do I extract particular data from a large table which hasn't been normalized? - excel

The original table in excel looks like this. It has around 30000 records. I want to copy a particular value for each record given by their unique FPS ID (which is a primary key). However, with an unclean data like this I can't put my finger on how to approach a problem like this. I want the data pulled from this table to be arranged like this.
Basically what I want here is the figure highlighted by grey.
There are two of those, I only want the one without the address but this doesn't have the primary key.
Approach 1: Select and filter them to a new sheet and manually adjust for redundant data
Approach 2: Select and filter them to a new sheet, this time however, strip the contents of the cell to just have the primary key enclosed within parenthesis.
How should I go about this? I am very weak at VLOOKUP, VBA etc.

I don't think VLOOKUP will be helpful in here. I think you should do it this way:
First of all look at the sequence. Rice totals are located in "B8", "B14", "B20"... +6
FPS names in cells "A9", "A15", "A21"... +6
FPS IDs in cells "A4", "A10", "A16"... +6
You can use cell references for rice totals by typing =B8, then under it =B14 an so on. But it will take you a lot of time. You can use this trick: Type B8, and under it B14. Select them both and drag down until the end of the data.Then select them all, press Ctrl + H. In the 1st field type in B. In the 2nd field type in =B and press "Replace All". It will activate the formula.

1: Learn about the difference between a spreadsheet and an RDBMS.
2: Design an RDBMS schema that meets your needs. (You can use SQL Server Express)
3: Copy the data that you need into the RDBMS.
4: Write SQL Queries to extract the data that you need.
5: Put a big reminder notice on your office wall:
"I must not use spreadsheets for database work".

Related

Aligning vertically a series of tables with text

Hi I need the text to be in a specific format in a spreadsheet to be able to upload it on a translation tool.
I have already used the text split function to separate the text in a cell with bullet points, moving each bullet point to a separate cell.
enter image description here
Then I used the transpose function to separate each set of data. For context, you are looking at fashion products.
The name of the product is on the first row, followed by a list of features (e.g. "Bracciale" means bracelet and it is followed by the list of materials)
enter image description here
Now for the last step, I need these sets to be vertical, not horizontal. Like this:
enter image description here
I would like to set up an automatic system so that every time we receive a list with hundreds of these products we do not need to copy-paste them one below the other.
With pivot tables maybe? Keep in mind that if it is too complex it might be hard to train the translators to do it each time. Please let me know your suggestions. Thank you!
I am not a programmer. I tried pivot tables but the data was in the wrong order and I am not sure how to get the data out from the pivot table with values only without the sub-menus.
My suggestion would be to use the 'Unpivot Columns' feature in the Power Query Editor - it would be really simple.
Steps:
Select the whole range
Go to Data // Get & Transform Data // From Table/Range
Uncheck 'My Table has headers' (unless it does - but doesn't look like it?)
Press OK. This will open Power Query Editor and will have actually given you column names Col1/2/3 etc, but ignore that.
Go to Add Column // Index column
Select all columns EXCEPT the new index column by Shift+clicking on those headers
Go to Transform // Unpivot Columns
Assuming the order is important, click in the Attribute column and Sort Ascending
Click in the Index column and Sort Ascending
Remove the Attribute and Index columns if you want (right click header)
Go to File // Close & Load
You will get a new table - dynamically linked to the first (ie. can be updated/refreshed) - in the unpivoted format.
Let me know if you need more details / screenshot?
Based of this trick, maybe the following is helpfull:
Formula in A5:
=DROP(REDUCE(0,A1:A3,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,,HSTACK(CHAR(10),"^"),1)))),1)
TEXTSPLIT() will use a combination of newline chars and the circumflex to split the input directly into a vertical array;
Iteration in REDUCE() will allow for stacked results;
DROP() the initial value from results.

EXCEL: How to automatically add serial number in Excel Table using formula that is immune to filtering / sorting?

I want to create an Excel Table where the first column is the "SL" (serial number) column that starts from 1 and then increases by 1 for each subsequent entry. I want the serial number to automatically increase as I add more rows to the table.
I have tried using all manners of "=ROWS" functions, all manners of "=COUNTA" functions, and all other functions used in tutorial that I found in the web. None of them are immune from sorting or filtering. That is, if I sort the "Name" column from A to Z, the serial number that was assigned to its respective row entry changes because of how these formulae are written. For example:
This is the Original List. As you can see, Dragon Fruit's serial number is 1. I have used the "=COUNTA(B$2:[#[NAME]])" function in this example.
As you can see, when I sorted the "Name" column from A to Z, Dragon Fruit's serial number went from 1 to 2, Acai went from 4 to 1, Guava went from 9 to 3, and so on. But I want the serial numbers to be static and locked to their corresponding "Name".
Is this possible to do in Excel without manually typing the numbers in the SL column?
Good question and a tricky situation to deal with. I'm not sure if the question is better suited for SuperUser though.
The trick here is to somehow use absolute cell-references instead of relative ones. As you have now experienced, Excel filters won't work well with relative references. However, manually adding absolute references is not what we want to do.
To mimic the absolute cell-reference behaviour you can preceed the row reference with a sheet-reference which magically should counter the normal formula-behaviour and turn them into actual absolute cell-references:
Formula in A2 (which will auto-fill the 'SL' column):
=ROW(Sheet1!A1)
Data when filtered A-Z on 'Name':
Data when tabbed a new row:
You could use PowerQuery in Excel to add an index in front.
Remove the ID from your source.
Make your source a table
Import into PowerQuery and add an index
Load the output to another sheet. In this sheet you can filter and sort and everything you want.

Taking means of irregular amounts data

I'm not able to take the means for a large dataset given that the amount of attributes is irregular.
I have posted a simplified case for the problem. It explains the problem very well.
An idea that I came up with: Make a filter to condition on a single attribute. However, still, I don't see a way to do this in an efficient way (other then doing it all by hand).
see excel file:
All help is much appreciated.
I'm basically looking for a function/method to achieve taking means of all different attributes conditioned on each person for a large dataset without doing it by hand.
You can use AVERAGEIFS() inside an IF:
=IF(OR(A2<>A1,B2<>B1),AVERAGEIFS(C:C,A:A,A2,B:B,B2),"")
the ifrst part of the if tests whether the row starts a new group either by the person or the attribute changing. Then it uses AVERAGEIFS() to return the correct average of that group. otherwise it returns a blank
What you want to do can be accomplished very simply with a pivot table.
Simply select one of the cells inside the range of data you want to process(See the video for general use of a pivot table https://www.youtube.com/watch?v=iCiayB6GrpQ )
go the insert tab and insert pivot table.
Once you have it, simply check people, attribute, and values. Then drag people and attribute into rows, drag valut into the values window, select the drop down list and change it from sum of value to average and you should be done. https://i.stack.imgur.com/nYEzw.png

Sort AlphaNumeric with structured references in Excel

I Have an Excel Sheet with data that looks like that.
Data
x=1.1
x=11.2
x=10.3
x=1.4
x=2.5;2.6
x=2.1
x=4.7
x=6.8
x=6.2;6.3
x=1.10
What i want to do is, to sort the List that it Looks like that.
DataSort
x=1.1
x=1.4
x=1.10
x=2.1
x=2.5;2.6
x=4.7
x=6.2;6.3
x=6.8
x=10.3
x=11.2
I tried to do that with that Formula
=LEFT(Tabelle1[[#this row];[Data]];2) & TEXT(SUBSTITUTE(Tabelle1[[#this row];[Data]];LEFT(Tabelle1[[#this row];[Data]];2);"");"#0.0#")
But that did not work.
Can someone give me a hint to the right Direction?
Copy the data into a new column say B
If you are using excel 2007 or higher, go to data tab --> text to columns option after selecting your data in B
Here you can choose delimited and it will separate your data into 2 columns.
Then apply the sorting based on this column
In an unused column to the right, use this formula starting in row 2,
=LEFT(A2, 2)&TEXT(--MID(A2, 3, FIND(".", A2&".")-3), "000;#")&TEXT(--MID(A2, FIND(".", A2&".")+1, 99), "000;#")
Fill down as necessary then sort conventionally using the helper column as the primary sort key.
        
If you put a 0 prior to the numbers, that would take care of it.
X=01.1
Or, in a convoluted way, split the column as recommended above, sort the way you want and then reassemble. I would also create a column with the right 1-n sequence, just in case I need to sort in a particular way, but the come back to the original sort.
The Solution to the Problem is:
=LEFT(IF(ISNUMBER(SEARCH(";";A1));LEFT(A1;FIND(";";A1;1)-1);A1);2)&TEXT(--MID(IF(ISNUMBER(SEARCH(";";A1));LEFT(A1;FIND(";";A1;1)-1);A1);3;FIND(".";IF(ISNUMBER(SEARCH(";";A1));LEFT(A1;FIND(";";A1;1)-1);A1)&".")-3);"000;#") &TEXT(--MID(IF(ISNUMBER(SEARCH(";";A1));LEFT(A1;FIND(";";A1;1)-1);A1);FIND(".";IF(ISNUMBER(SEARCH(";";A1));LEFT(A1;FIND(";";A1;1)-1);A1)&".")+1;99);"000;#")

Can I get relational data into an Excel Pivot Table

I have a sheet (let's go with wines as an example) that lists every bottle of wine in my cellar, when I bought it, how much I paid etc.
There's a column that describes the wine in comma-separated tags such as "Fruity, White".
I've created a pivot table from that data, with the description as a filter column. However I can't filter it by "White". I have to find every description that contains "White" such as "Dry, White", "White, Crisp" etc.
Being from an RDBMS background, my natural inclination is to put the tags in their own table keyed against the wine row so there's zero-or-more tag rows per wine row.
How, how on earth can I use that to filter the wine rows?
Yes you can do it within Excel and the description fields can remain as "Dry, White" etc as you do not need to split the comma separated values.
Lets say the Table source comprises a text column for Description, a number column for Value and a number column for Year Bought.
Your pivot is setup with the the following
Fields: Description, Value and Year Bought.
Column labels: Year Bought
Row Labels: Description
Sum of values: Sum of Value
There is a drop down label filter on the row labels - click on this and there should be an option to select Label Filters. Select this and then select Contains. You can enter say "White" which will select all your descriptions that contain white e.g. "Dry, White", "White, Crisp". The filter includes ? to represent a single character and * to represent any series of characters.
There are similar label filters for "begins with" and "ends with" as well as there negation.
I tried this in Excel 2007 and it should also work in 2003. I think in Excel 2003 you could even combine the filters e.g. contains "White" and does not contain "Dry" but in 2007 I could not find a way of doing this.
Forgive me if I'm stating the obvious, but the reason you're having problems here is that the description column is not in 1NF, and the Excel pivot interface isn't flexible enough to allow pattern-based searching.
The simplest option will be to normalise the CSV into a series of columns, each of which represents a single attribute - one column for wine colour, one for sweetness, one for country of origin and so on - and apply the filter across multiple columns. However, if (as your comment on the question suggests) wine is a metaphor for your real problem, you may not have the luxury of revisiting the design of the source data.
Another possibility might be to use a macro (or a database query - I'm not clear from your question whether you have implemented the tag system already) to pre-filter the input data on the pivot table's source sheet based on the tag values you want to search for, then re-refresh the pivot table based on that data.
A third possibility is the VBA used in this question, which looks like it will custom-filter the pivot table's visible rows.
=IF(ISERR(FIND("WHITE",UPPER(B5))),0,1)
create an extra column and add a formula. There are 2 tricks to this. One is to search for WHITE in the description column using upper - to beat the fact that excel find is case sensitive. Two is that it returns a value error if the string does not exist - so iserr will allow you to trap that and return in this example 0 if it doesn't or 1 if it does. You could substitute white and blank for 1 and 0.
you could write a script that loops through the data and adds new lines for each comma separated item in the description column. This would allow the pivot table to filter better.

Resources