What is the best way to filter a large list in Excel? - excel

I have a table in Excel that I want to filter. It will have a maximum of 1 million rows and 80 columns. All the calculations etc are done programatically in arrays to cut dwn processing time. However, I want to also filter the results to display only certain results based on one column value, followed by a top 5% based on another filter value.
When I first did the sheet, it was limited to 65000 results so there were no problems with the size of the data set. I just invoked the worksheet filter functions from code and did it that way. Can I do it that way with a larger data set or is there a way to filter an array the way you d a dataset on a sheet?
Thanks

As already mentioned by everyone, excel 2007 will take you to a million rows, but its slower than the excel 2003 that I presume you're using at the moment so filtering using it wouldn't be advisable.
Along with mysql, ms access is also an option.

You really should put that data in an Access table and use Excel's Database Query to do the job. Since it can also filter retrieved data based on a cell value, it's a great combination.
Storing the data in a database brings you another interesting option (depending on what you want to do): to query your database using PowerPivot.

Although using a relational DB would be preferable in many ways, if you don't have any formulas then filtering your data (1 million rows by 80 columns) using Excel will be reasonably fast (< 1 or 2 seconds depending on what sort of filtering you want to do, which will probably be faster than an un-indexed DB table) assuming that you have enough RAM. If you do have any formulas then you will probably need to be in Manual calculation mode to avoid the filtering process triggering multiple recalculations.

Related

Classification and Grouping of Sorted Data

I have a Dataset H3:J12 where components are classified based on Type. I have summed the count for similar components and sorted based on Count with Unique and Sort formulae and the result is L3:M7.
In my actual case, there are several thousands of such components which are sorted as in L:L and now I would like to add the Type column next to the Component with sorted Count as shown in P3:R12. Is it possible to extract them directly from L3:M7 or directly from H3:J12, as I will not be able to do them manually.
Screenshots/here refer:
Mechanical approaches include pivots, VB, etc. However, you could consider a more dynamic approach that doesn't require constant updating / refreshing (of code, pivots, etc.) whenever underlying data is appended/amended.
YES: you can retrieve directly from source data as follows:
=SORT(UNIQUE(C3:D20))
=SUMIFS(E3:E30,C3:C30,H3:H17,D3:D30,I3:I17)
Notes:
Could make this averageif(s), countif(s), percentile etc. etc. - another
significant advantage over alternative methods (VB, pivots, etc.), besides dynamic nature, is the flexibility re: measures; restrictions present in pivots are substantive in relation to Excel direct calculation
Disadvantage is inability to automatically chart data using pivot chart functionality that accompanies pivot tables
See linked sheet opening line for conditional formatting sample used to recreate 'look / feel' you might otherwise be losing out on with this approach
It seemed as though you were almost there! I'm not sure what relevance other tables have (ignored this in light of question).

I am looking to delete rows in a table based on multiple filter criteria

I have been using Excel VBA for some time but am now looking to do something with the filters or similar.
I have a large table of data (in excess of 11000 rows) and I need to select up to 5 different criteria in a filter and basically delete each row which does not apply to this filter. (So effectively delete the invisible rows)
The filter needs to be set from an array of information in another listobject
The data in question is a list of staff members in departments set as a listobject - I need to only pull out whichever departments are selected from my criteria on another worksheet. The depts are Management, Warehouse, Stores, Admin, Transport. I need to delete all records which are NOT Warehouse, Stores or Admin
*I haven't really tried anything as I have been scouring the internet - I've had some thoughts aroung looping through the filter options
*Sorry - I have tried different things such as the for each row in table loop but this timed at over 15 minutes! (Apologies to the person who commented as I should've advised on this)
Sorry I have no code - barring "for each row in table" loop which I need to avoid using as this is a very slow process with this many records
Had no actual results - been using VBA for years but this is the first time I've been asked for this type of thing and I am at a loss.
Please be kind as I am new to the forum and obviously just looking for some help
Having trawled massive amounts of items I am sure there is an easier way but I have resolved this by converting the data back to a range, performaing the loop from the last to first row and deleting the row if the value is not found!
I'm happy with this as it now takes less than 15 seconds to complete - not sure why the list object was slowing it but I'm happy enough with it as is

Excel pivot table with ranking

I'm in the processing of creating a report for the company I work at that has a rather complicated survey export file that needs to have the data extracted in meaningful ways.
The table headers are as follow https://docs.google.com/spreadsheets/d/1Et9Pg6k9CJA3HTO0aHcnSnOWVU05bmHYUsPS0wB2Nr8/edit?usp=sharing
It has respondents listing there top 3 most important options and the rest are left blank.
If anyone can help me figure out a way to potentially summarize this in a pivot table that would be great.
You're data is in a crosstab. Pivot's don't like that kind of layout. You need to unpivot your data.
If you've got the PowerQuery add-in installed (or have Excel 2016 or Excel/Office 365 subscription) then you can use PowerQuery to do this. Google "PowerQuery" and "Unpivot" and you'll turn up a whole heap of videos.
Otherwise you can use VBA such as my Unpivot routine I've previously blogged about at http://dailydoseofexcel.com/archives/2013/11/21/unpivot-shootout/
As always it depends what questions you want to ask in your analysis. Here are two suggestions.
What are the commonest first/second/third choices?
This assumes that the ranking is important, i.e. the first choice is ranked significantly higher than the second choice, so you want to analyse them separately.
You could add three extra columns to your data using this formula to convert the first choice to a single variable with 11 categories
=IFERROR(MATCH(COLUMNS($A:A),$A3:$K3,0),"")
in L3 and likewise with the second and third choices in M3 and N3.
in the event that a respondent (row) has less than three choices, it will give a blank for the second and/or third choice.
What are the commonest choices regardless of ranking?
This assumes that the ranking isn't so important - you just want to know which columns have been picked overall.
=INDEX($L$3:$N$10,INT((ROWS($1:1)-1)/3)+1,MOD(INT(ROWS($1:1)-1),3)+1)
In N3. This would have to pulled down for 3N rows, where N is the number of rows in the original dataset.
Then it would be a simple case of setting up pivot tables or charts for the four new variables.

Access query returns a different result when read from Excel

I have a query in an ACCDB that works fine in Access.
I can successfully copy/paste its data to Excel.
However, from Excel, if I try to insert a Pivot Table using External Data Source, pointing to the very same query, then some numeric fields have weird formatting and some calculated numeric columns (formula in the query) have their value divided by 100 compared to the source.
Never seen that behaviour. Any suggestion ?
The whole MS-Office setup is in 2010.
What I have already done in the source query (without visible improvement):
used CCur() to make sure the figures are in a coherent data type
set the Format property of those culprit columns to "Standard"
The behaviour is exactly the same on other PCs in the same bank.
I could solve the problem which was due to 2 different bugs, probably in JetOLEDB.
Like is not handled properly by Excel
The query contained some formulae using Like:
iif(someField Like "XX*";0;anotherField).
Changing this to iif(Left(somefield;2) = "XX";0;anotherField) solved calculation differences between Excel but and Access.
Reference to another calculated column is handled differently
Say you have 2 query columns:
Rate: i.Rate *100 (i is a table alias)
Amount: Rate*Price
Access calculates Amount using the Rate calculated column, while Excel uses the Rate field from table i.Therefore I had to change the Amount expression to:
Rate: i.Rate *100
Amount: i.Rate *100*Price
since Excel does not seem to make always use Rate from the table (i.Rate).
Use the query in Access to first Make Table in Access then import the table to excel.

Access - Calculated field (running average)

I am trying to generate an Access database with information which is currently in endless sheets and tables in Excel.
I would like to know if there is any way to add a field to one table which is a calculation (average value) based on several other cells.
I need to calculate the running 6 months average value of another field which contains 1 value per month.
Hopefully the previous image shows what I mean.
What is the best approach to import this functionality into access?
You wouldn't normally store a calculated field in Access, you would run a query that provides you the calculation on the fly.
Without seeing your data structure it is impossible to tell you how to calculate the answer you need, but you would need your data correctly normalised in order to make this simple.

Resources