I have a very large excel file with approximately 200 sheets with fields. Each sheet is a ranking of a subset of values which was output from an R program. There are 2 versions for about each entry. The subset data is not in the original sheet - only the name of the sheet, and the summary table i'm trying to build. I'd like to automatically determine which range (sheet) the lookup queries.
The Manual answer is to sort, filter create a lookup and consolidate the summary data, copy the formula, find replace the range reference, fill, repeat. hopefully there is a solution rather than copy-pasting, editing, hundreds of times.
You may want to re-think the data architecture. If possible, let the golden rule apply to have data in one sheet and reporting on other sheets.
Find a way to have all the 200 sheet's data in just one sheet. You may have to introduce a few additional columns to distinguish each row.
Then you can start building reports on all the data, using Pivot Tables, or more sophisticated tools like Power Pivot.
With the next to nothing info you provide about your data set it is hard to suggest more concrete advice.
Related
When building a financial model that is set up using multiple unrelated tables, however on the same rows across the worksheet, I am having trouble with structuring formulas the same way as you would with a single Table in Excel.
The very first table is a summary table of different projects and columns with different characteristics of each project, and the subsequent tables for the same rows to the right are structured with months/years as table headers and financial information for each period. I am now trying to build formulas that summarize information in these tables by using SUMIFS formulas that take characteristics of the projects as criteria. Typically, this would look like:
=SUMIFS(BacklogTable[des.21],ProjectsTable[Segment],"Healthcare")
Under circumstances where all this data would be part of the same table, this would yield the result I am looking for. However, as these two tables are not connected in any way, the formula returns #VALUE.
I have noticed that referencing the same row of another Table works just fine. Standing in the cell of Table2 and referencing the same row of Table1 (Table1[#[Segment]]) returns the correct value from Table1.
Is there a way to modify the formula such that Excel can interpret criterias that stretch across an entire Table column and match it with the corresponding same rows of a different Table?
Screenshot of the problem
Data Relationships in Excel does not seem like a solution. The problem can of course be resolved by adding the same information in the summary table to every other table, but for future projects it would be nice to know if using multiple tables can be a way to structure large models.
While I am a longtime 'lurker' and have gleaned a great amount of Excel knowledge through this site, I've never asked a question before. This time I'm really stumped...
I have a table with multiple columns. The data I'm working with is obtained from a multiple VLOOKUPs of another workbook for which I do not own. That workbook has some issues with hidden characters and rows that are hidden. I'm trying to create a dashboard from this data and was successful using the AGGREGATE function (2,7) to COUNT and omit the data I don't want.
The next tier for this data needs to incorporate a lookup against another column (image attached). For this, I want to look up data in column B and subtract cells in column C if the data in column B matches a cell in column A.
I have tried multiple ways of doing this but I think that AGGREGATE might be the best approach since it can ignore hidden data and rows.
My latest attempt was to add an IF statement as follows, but that results in a SPILL condition:
=IF('Transport Calendar'!E2:E830='Transport Calcs'!M6,AGGREGATE(2,7,'Transport Calendar'!X2:X830)-AGGREGATE(2,7,'Transport Calendar'!Z2:Z830),"FALSE")
Any help would most certainly be appreciated!
So basically, I have been trying to make excel scan one column and print out the row for cells which have specific letters.
For example I have the following data in my sheet.
What is required is to be have excel search for the values which contain T and I, and then print the new rows. Kind of like separate the two into two different tables, because then so that I can do further analysis on them.
So far I have been trying to use the VLOOKUP() function, but the problem with VLOOKUP() is that excel required a proper match and not just a letter in the cell. I had tried with both FALSE and TRUE. Then instead I tried to use the =INDEX($B$4:$K$9;MATCH($A$17;$A$5:$A$9;0);COLUMN(A4)) to make it work. But that also does not work, since it also requires a full match. Also another problem which I didn't realise before is that how can excel recognise each cell, because I will have different number after the letters everytime and then so how can one make excel not repeat the same row twice?
I have used another approach where I copy the data in a separated sheet and then I simply filter out the Ts and then copy/paste the Is into another sheet and vice versa. it is time consuming and so it would be much better if I can simply copy/paste my new data and it would generate the division on its own.
any suggestions or link would be really helpful.
UPDATE
I had a new idea on how to approach this problem. I was thinking that is it somehow possible to have VBA code running for filtering the data. Is there a way to specify in VBA code to filter the data by "Starts With" and Make the results be printed in another cell block?
Looks simple enough. First step is to make sure you have headers over your data and that it is in proper table format similar to my picture. Then select Data set and press CTRL+T. That should turn you data in to table object with stripes. Use the Formula =LEFT(C2,1) to take out the first letter which will be L or T.
Select the table and press ALT+D+P which will generate a pivot table based off original data set.
Drag the column with the formula I suggested over to the FILTERS area of the pivot table ID column to ROWS and all others to VALUES. Simply refresh and as new data is added you will get new pivot tables. Do not put the pivot tables on top of each other as I did that is only for the picture so you can see it. If you have too many filters to apply you can right click the helper column in the pivot table fields area to produce a slicer which is a button that helps you change the report quickly. Any other questions do ask.
I'm having some serious performance issues with an excel workbook I created. I need to pull data from another worksheet in the book that has 7 columns of data and about 300 rows.
The amount of data should be no problem - I think the issue I'm having comes down to an index|match array that has multiple match conditions. I'm wondering if there's another approach I can take, because the workbook is becoming aggravating to work with.
Here's some made up data:
This data is aggregated in a separate program from a database, and I output it to an excel file.
Here's a sample of a made up segment of a report:
Where the formula for the rows "Active Accounts" and "Online Enabled Accounts" are:
{=IFERROR(INDEX($D:$G,MATCH($K$2&M$1,$C:$C&$A:$A,0),MATCH($L2,$D$1:$G$1,0)),0)}
and the formula for the rows "Both", "Online", and "Paper" are as follows:
{=IFERROR(INDEX($G:$G,MATCH($K2&M$1&$L6,$C:$C&$A:$A&$F:$F,0)),0)}
I have about 5 other "segments" that reflect similar data by different in this format across 13 months. With only 300 data records this workbook is still painfully slow to even apply formatting, so I'm hoping there's a better approach than to just use these arrows with Index|Match.
Select any range in your dataset and hit CTRL + A then hit CTRL + T. This will create a table you can reference as a named range.
Write the index match formula as you normally would except make sure to only select the data you're looking for this means do not select the entire column, this is what's weighing down your system, simply choose the ranges with the data you're looking at.
You'll notice as you highlight your data it will say something like Table1[Accounts] what this means is it's accessing that named column's datarange. This will allow your formula to scale as the table grows (or shrink as needed), and not calculate any farther than needed. This will save your computer a tremendous amount of computing power while it calculates.
I am finding inserting rows in table structures or in normal cells - manually or otherwise - very very slow. Like it takes more than 10 mins to insert 7 rows in a table (containing literal strings only) or in adjacent cells, in a sheet with no conditional formatting.
The workbook has 45 worksheets and 20 tables, with the bigger tables having XML files of about 10KB. There are 33MB worth of spreadsheet XMLs with most being around 300KB with 5 more than 1MB and one being 15MB. Its fairly complex but not massive. All of the calculations flow nicely from left to right up to down, right sheet to left sheet and I've mostly managed to avoid array formulas. All of the tables have regular structures, with the calculated columns having one only formula. Most of the table columns are calculated, with only a couple of smaller ones containing literal data.
I do have a lot of conditional formatting on a couple of sheets but I've been very careful to keep it rational and stopped it from fragmenting: I have about 45 rules for the whole sheet and these are generalised to cover all columns. The main processing for the formating decisions are moved into the tables as helper columns and as I said, very regular in structure.
It seems that these type of edits are not thread safe so only one processor is loading up and there is very light disc activity. I can't understand what excel is doing all that time.
Of course I set calculation to manual...
I've seen comments attributing this type of thing to the increased row and column limits, but I don't understand why this should be a factor. If I look at the XML files of the spreadsheets, there is only code for rows and columns that are occupied with values or formulas. So why are the unoccupied cells in play?
This is having a massive effect on my productivity - although I'm learning a lot by reading in sites like this in my new-found spare time. I really need to figure out what the problem is so that I can avoid or work around this issue if possible.
Can anybody help me on that?
Just in case people are wondering about this, the answer is to use power query and power view in excel. I find medium (500k lines) datasets and complex structures and transformations all work without a hitch. I never use formulae in tables anymore. The other thing is that this naturally leads you to power bi which is great. That's my tip.
Long insertion times may be due to INDEX (or other functions) that reference a whole column, or a whole row.
I had a very similar problem: not too complex worksheet (about 2500 rows, with 15 columns of data (results from a query), and about 10 columns of formulas to extract data from the query results. when I inserted a column, the first columns might insert within 4 seconds or so, but the second insert would take over a minute. Yikes! I searched the internet and found this site http://support.microsoft.com/kb/2755145.
My experience:
I was using a formula like =INDEX(11:11,1,MATCH(AC$5,$10:$10,0)), about 25000 times in my worksheet. You can see that each formula references an entire row twice. Apparently, when I added a column, since each row is affected, and therefore each of my formulas was affected, Excel would dutifully go to work trying to figure out what to do about that.
Based on what I learned form the microsoft website, I changed the formula to =INDEX(QueryResults,ROW()-ROW(QueryHeaders),MATCH(AC$5,QueryHeaders,0)), where the QueryResults and QueryHeaders are simple named ranges.
After I made this change throughout the sheet, inserting a column became almost instantaneous - less than a second.
This sounds like the problem described here http://fastexcel.wordpress.com/2012/01/30/excel-2010-tableslistobject-slow-update-and-how-to-bypass/
If so you have to break one of the conditions to bypass it:
For this slowdown to occur each of the following conditions must be true:
A cell within the Table must be selected
The sheet containing the Table must be the Active Sheet
The cell being updated must be on the same sheet as the table, but does not have to be within the table
There must be a reasonable number of formulas in the workbook.
Maybe you could do the update indirectly via VBA with another sheet active. Or Maybe moveing all the formulas to a separate workbook would bypass it. Or convert your Tables back to normal ranges (& use dynamic range names if neccessary)
Try removing conditional formatting and then reapplying it with vba after main code is through. Worked for me.