How can this lookup (find the last relevant item) be improved?

How can this lookup (find the last relevant item) be improved? - excel

One of the reports that wastes a bunch of my time at work is the Roster. It's a multi-site, multi-contract listing of every employee currently assigned to a specific client. Currently, it has a little over 6,000 lines by 20-something columns, indexed against 3 different datasets. Not the largest mess in the world, but still a pain. And it's almost all in excel, because I somehow don't have a business case for Access.
But one part of this monster stands apart. One tab per site Site Totals, listing off every time any agent has gone through training. A second tab (again, one per site) Site Data displaying only the most recent training class, and the credentials they had during that class.
That second tab is driven by variations of this array formula - Last_Row is a named range on another tab, and column A is a pivot of the UID column on Site Totals. I've broken it apart for readability:
=IF(INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1))="Trainer",
"",
INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1)))
I know what this formula does, but I don't know how to improve it. This formula needs to be changed, because it currently is on the order of 500 Million calculations (I'm not allowed to delete historical data), and it takes me 3 hours to calculate the workbook ... if it doesn't crash Excel first.
I'm open to VBA and / or custom functions, but would prefer to have native Excel functions. I'm not able to install anything, so any solution must be native Excel, and Must be compatible to Excel 2007.

If your source is a pivot table, try is the GETPIVOTDATA function. You might be able to accomplish what you want without INDIRECT and INDEX.

What i have understood is that every person has/has not attended a training and you want to retrive the name of that training, in case he has not, you want a blank space in the cell. If this description is correct you can try this formua, press ctrl+shift+enter to execute.
=IFERROR(INDEX('Site Totals'!B$1:B$12,MATCH(A2&"Trainer",'Site Totals'!A$1:A$12&'Site Totals'!B$1:B$12)),"")
Here A2 contians the name of the person. I can be more precise with this formula if you can provide some sample data butI would recommend to not to use entire B & Columns in Site Total workssheete as this will definately slow down computing process, instead you can use B1:B8000 or smaller range, to speed up process. Hope that helps.

Related

How do one extract information from a dynamic table, automatically through excel functions?

I have been searching high and low for a way to solve my dilemma, in different ways, so I am trying to post both of the things I've been trying to do:
The challenge version 1:
I want to extract the entire row with information tied to the name which is the latest entry of that name in the table. So from the table below I would want to collect the entire row which contains the information: "A, Jack Black, 01.01.2029, 10:20". I simply want to copy the entire row to another sheet. But one important factor is that it has to happen automatically.
So i need functions which can check if: Is there another entry with the same name, higher up in the table? If so, DO NOT COPY THE ROW. If there ain't another entry with the exact same name higher up in the table, COPY THE ENTIRE ROW, to another table, within another sheet.
The challenge version 2:
What I really want to do is count the number of unique people(unique names) per. department, and summarize this in another table. Basically this means that "Jack Black" should be counted as 1 person, in department A.
So the result I want, is a table looking like this (the one beneath), where the number of people does not contain any duplicate people (names). OR it does not function with a dynamic table, which updates the information it contains on the fly. I can make this happen if I am copying from a static table, but as stated above, the table is dynamic and updates with new information every minute...
So far i've tried excel's built in filtering, but this does not work automatically. I've also tried using functions like in this guide: https://excel-bytes.com/how-to-extract-a-dynamic-list-from-a-data-range-based-on-a-criteria-without-filters-in-excel/. However every solution i find seems to need criteria for filtering out duplicates or does not function when copying information from a dynamic table.
Does anyone know how to reach my desired result, without implementing criteria for selecting the rows or counting rows as stated above? VBA code is not an option at the moment :(
In advance, THANK YOU, I've really tried solving this, but I feel like this just might break my head wide open soon if I can't solve it. HEEEEELP!
Sincerely
haakonlu

Excel design consideration - infrequent change and impact for VBA/formula/add-on

nothing stuck or broken, just I am inspired after a discussion with another Excel author.
His situation:
Read from an existing Excel monster file (column FG), and hard-coded the following
Range("FF:FG").Copy
Potential issue:
data in FF:FG will be pushed to GF:GG every couple of months because newer columns will be inserted in between. (It's a pivot-like design... sorry, but end-users need this appearance, but categories are increasing, summary need to be at right end side)
He has 2 other choices (if he don't want to maintain VBA code every few months):
A: Store "FF","FG" in a Cell (fixed location!), then read the location parameter using VBA
B: Read a second dedicated CSV file (copy/paste from the monster file, consumed by another user so available readily), it only has the 2 columns required..
To me, none is obviously better than the other, just a matter of preference.
Similar but simpler Scene of mine
I produce the monstrous file by lots of Vlookup from manual data sources (inherited the design... and I refactored the design using another automation tool but there is license consideration atm).
In a column there is a formula doing something like
=if(A1="SALES PERSON SICK","void result",(if(A1="MACHINE BROKEN",C2*0.8),"")..
say 5000 rows with this formula
To reduce hard-coding I moved
"SALES PERSON SICK","MACHINE BROKEN" to a reference sheet cell A1,A2, and changed formula to:
=if(A1=Ref!$A$1,"void result",(if(A1=Ref!$A$2,C2*0.8),"")
I feel it's a good practice.
Question: Is method A or B better? Considering column position will move every ~3-6 months, still worth choosing 1 from A/B?

data in FF:FG will be pushed to GF:GG every couple of months because newer columns will be inserted in between
Then you should use named ranges in what you call "monster file" (see Define and use names in formulas) and use them in your VBA.
Eg define a name for Columns FF:FG like CopySource (use a name that describes the data in that columns) and finally you can use that in your VBA code.
Range("CopySource").Copy
Whenever the range moves because new columns are inseted before, the named range moves too, so it still points to the same data.

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d

I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.

I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Which is the most flexible way for end-user to create a report querying an SSAS Cube?

I know that you can query a cube by connecting from Excel to Analysis database and using formulas like cubevalue() or cubemember().
I also know that after converting the power pivot to formulas, you can access the attribute and the value related only by writing a text.
Example: for Branch Dimension, instead of writing
cubemember("connections";"[DimBranch].[Name].[All].[London])" )
you can write in the cell only "London". However, this won't work if you have a parent-child dimension and want to retrieve the amount for one of the intermediate levels.
Did anyone know about how can you avoid writing these formulas directly by the end-user ?

I´ve done this several times. I can describe the approach I´ve used in the past.
You need to define the input from the user, for example Fiscal Year, Fiscal Period,... and create a tab with a friendly layout for this solely purpose.
After that you create your pivot/pivots against the cube/s you want to query with the data extraction that you need (obiously these pivots will refer to the Fiscal Year and Period). Define the desired layout for these pivots (as you want them you be shown). Once You´ve done this, you need to convert into formulas these pivots, including parameters. You´ll see that values placed in the measure box contain a formula cubevalue which makes reference to the connection itself and several cubemembers. It makes reference to the measure as well.
The trickiest part comes now, when you need to modify your cubemember function to take value from the input, something like cubemember("connections";"[DimBranch].[Name].[All].["+MyReferenceCell+")" ), and you have your dimension picked up from a value entered by a user.
If you have as a matter of fact, serveral pivots referencing common dimensions, you need to make the reference this modified cubemember cell affected by user input, otherwise you have to modify all affected cubemembers in your pivots converted to formulas.
This has a really good advantage: depending on the excel´s version, all pivots are automatically recalulated if you change the value in the user´s input cell without hitting the refresh all button.
On the other hand, the disadvantage comes when your report in excel contains a list of values from a dimension which can grow. After converting pivots into formulas those new values for the dimension are lost. That´s why you can always place a control if you have the total value int he pivots regardless the dimension value.
I hope this helps.
Regards

Grouping rows by area codes

I have a table of customers to which my company ships products. The problem is that these customers need to be sorted by their area codes, so that the products can be sent to the appropriate shipping companies (we have two partner companies that ship to certain parts of the country). Each company sent us a list of area code numbers to which they can ship and I need to divide the Excel sheet into two sheets, each containing the customers with the area codes compatible with the respective company.
I tried to solve this problem with VLOOKUP function, but it only works on individual row basis, and I need a solution that will find all rows that contain a number from the specified group of area codes.
Another way would be IF function that would put a True or False (one IF function for each company) value in new column and then I could sort by that value, and copy the data into a new sheet. This approach would work, but the IF function would be extremely long and hard to control.
Can you suggest a way to solve this problem?
Edit to incorporate details provided via Comment:
Presently I have about 5,000 rows but in future it might be more though I doubt over 10,000 rows.

A VLOOKUP seems very promising, of the kind =VLOOKUP($B2,F:G,1,0) in C2 copied across and down as required, with a layout as below:
This does not group as you say you require (but do you really need to?) because it seems possible some locations will be served by both shippers. You might resolve this by flagging those rows where both are viable and then by sorting to split into three groups (Shipper1 only, Shipper2 only, both) before transferring the ranges as desired.
Edit in response to OPs comment
If you can be certain there is no overlap between Shippers, a single column with this formula, say in E2copied down, might be preferable:
=IF(ISERROR(MATCH(B2,F:F,0)>0),"Shipper2","Shipper1")
and would not routinely show #N/A. (This assumes no area is outside the range of both shippers.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string