How to join 1.5 million rows of data based on 15 fields - excel

I have a fact table of sales data from the year 2019, which has about 1.5 million rows of data. I need to compare 2019 sales with 2018 sales. The 2018 sales fact table also has about 1.5 million rows of data.
Each fact table has 15 of the same columns which include fields such as date, category, department, location, etc.
Date
Field 1
Field 2..
…field 15
Sales
01.01.18
ABC
XYZ
A12
100
01.02.18
ABCD
XXY
A13
200
01.03.18
ABB
XYY
A14
300
01.04.18
ACC
ZXX
A15
400
Date
Field 1
Field 2..
…field 15
Sales
01.01.19
ABC
XYZ
A12
110
01.02.19
ABCD
XXY
A13
210
01.03.19
ABB
XYY
A14
310
01.04.19
ACC
ZXX
A15
410
I need to have 2018 sales and 2019 sales in two columns that are next to each other.
I have tried this through a left join (matching the minimum amount of fields needed for a correct mapping) , but then my PC ran out of memory. I also tried doing this through power pivot, but my PC also ran out of energy while attempting to load the second fact table to the data model.
How can I have 2018 Sales and 2019 Sales, with the correct mapping, in columns next to each other?
Date '18
Date '19
Field 1
Field 2..
…field 15
Sales 2018
Sales 2019
01.01.18
01.01.19
ABC
XYZ
A12
100
110
01.02.18
01.02.19
ABCD
XXY
A13
200
210
01.03.18
01.03.19
ABB
XYY
A14
300
310
01.04.18
01.04.19
ACC
ZXX
A15
400
410

Assuming the csv data is imported to Sheet2(2018) & Sheet3(2019) via Data > Get External > From text. Put this in Sheet A1 :
=OFFSET(INDIRECT(CHOOSE(2-MOD(COLUMN(),2),"Sheet2","Sheet3")&"!A1",TRUE),ROW()-1,INT(COLUMN()/2+0.5)-1)
and drag right+downwards.
Idea : use column() with mod(), to 'drive' offset cell selection. choose() do the sheet selection.
Please share if it works/not. ( :

Related

Compare Data from Two Columns and Match Figures Against Each Other

I'm trying to compare figures from Sales data where Data set 1 (column A) comes in before the data set 2 (column B). Generally this data lines up chronologically, but it does not always and never row by row when pasted side by side. This is because column A is just transaction totals and column B contains the transactions split into product totals. For example:
Bob buys a $2 widget, $3 ball and a $5 stick. The data entry as it appears to me would be Column A $10 and Column B $2, $3 & $5.
These transactions don't occur often however and I need to isolate them from the overall data set and any figures that don't have matches. Most of the data generally has one to one transactions. For example:
Fred buys $5 widget. Column A $5 Column B $5.
Highlighting the matching 1 for 1 figures cells and leaving the odds fill blank would be optimal.
I have tried a few formulas and am getting nearly 90% success rate, which is close, but so frustrating. Basically just need a formula that will format the cells that have a 1 for 1 unique match in both columns and leave the ones that don't have a buddy highlighted. Also has to be done chronologically (so something in say column A row 112 can't match column b row 56).
So if anyone can help me out that'd be amazing. My only other option is analysing 10,000+ lines manually. Save me internet!
ps - sorry for the formatting, couldn't post lined up because it thinks I'm coding.
For Column E
=INDEX($B$2:$B$100,MATCH($E2,$B$2:$B$100,0))
A B C D E
2 Date WData Date DB Data
3 2/10/2018 1000 2/10/2018 1000 1000
4 2/10/2018 800 2/10/2018 450 #N/A
5 2/10/2018 900 2/10/2018 350 #N/A
6 2/10/2018 850 2/10/2018 900 900
7 2/10/2018 680 2/10/2018 850 850
8 2/10/2018 790 2/10/2018 680 680
9 2/10/2018 645 2/10/2018 790 790

Excel Pivot Table add % column

I have a Pivot:
City HC TC
--------------------------------
London 50 100
Manchester 67 250
Leeds 20 20
All I need to do is, within the Pivot table, add another column that calculates the percentage based on the second and third columns.
The outcome would be:
City HC TC
--------------------------------
London 50 100 50%
Manchester 67 250 27%
Leeds 20 20 100%
Under Fields, Items & Sets click Calculated Field... and add your formula that's dependent on other columns in the Pivot table.
Example:
The database:
The Pivot table (with Field2 formula: = Revenue / Units):

Conditional Formatting based on date and value in Excel

I am trying to return the color for a score based on the date for the score and the score itself. Scoring has used different cut-offs over time:
Table 1
Date1 Score Color
Sep-16 24 [should be red]
Jul-16 6 [should be green]
Apr-14 12 [should be yellow]
... ... ...
Table 2
Date2 Red Orange Yellow Green
Aug-16 20 15 9.5 0
Jul-16 20 15.5 9.5 0
Apr-16 20 15 9.5 0
Mar-15 19 14 7 0
Feb-15 20 13 8.5 0
Jan-15 19 14 7 0
Apr-14 19 14 7 0
I want to place a formula in the "Color" cell that will evaluate Table 2 and return the column name for instances where the date in date1 is the most recent instance where it is greater than date 2, and for which the score given on table 1 is equal to or larger than the score given on table 2 for the correct row.
Thanks,
You need nested approximate lookups. This would be easier if your data was sorted the other way around. At least table 2 should have the columns in ascending order, instead of descending, so the match function can return the correct position of the number with an approximate match.
If you can arrange the columns in Table2 in the order Date2, Green, Yellow, Orange, Red, then the following formula will be possible.
=INDEX(Table3[[#Headers],[Green]:[Red]],MATCH([#Score],INDEX(Table3[Green],IFERROR(MATCH([#Date1],Table3[Date2],-1),1)):INDEX(Table3[Red],IFERROR(MATCH([#Date1],Table3[Date2],-1),1)),1))
This uses structured references, which accommodates rows being inserted into the tables without breaking the formulas.
Now you can use conditional formatting based on the cell values in column C.
Just for comparison, I have chosen to keep the lookup table (in Sheet2 rather than in an actual table) the same as in the question i.e. both tables are sorted from largest to smallest or most recent to least recent and the MATCHes both have -1 as the third argument:-

Excel lookup same index different row

I have a formatted table that contains information about areas, including sales for the month (and a heap of other columns containing other details). The table is the basis for graphs and pivot tables.
There is a row per month for each area, e.g.:
A B C D
1 Area Month Sales TwoMonthAverage
2 North 1 400 Manually entered
3 West 1 500 Manually entered
4 South 1 200 Manually entered
5 North 2 200 <calculate??>
6 West 2 200 <calculate??>
7 South 2 200 <calculate??>
8 North 3 100 <calculate??>
9 West 3 900 <calculate??>
10 South 3 600 <calculate??>
Each month, I want to calculate the "2 month average" sales for an area (being the average of the current and previous months).
I need to know how to get the sales for the same area for the previous month. The table rows will not necessarily be in the same area or month order. Needs to work in Excel 2013 and 2010.
Thanks
Blair
You could perhaps use SUMIFS to get the sum of the past 2 months sale:
=SUMIFS($C$2:C5, $A$2:A5, A5, $B$2:B5, ">="&B5-1)
This in D5 will give the sum of values that:
Have values in cells A above and including the current row, and
Have the month above or equal to the current month - 1.
You then only need to divide by 2 to get the 2 month average.

How to vlookup on 2 columns and return total value of another?

I've got a table in Excel which is structure like so:
Month Date Time ID Name Currency Value
Jan 07/01/14 5 1234567 Ted GBP 500
Jan 10/01/14 12 1234567 Ted GBP 723
Feb 23/02/14 6 9877654 John GBP 300
Feb 10/02/14 10 1234567 Ted GBP 333
What I need to do is write a formula which basically returns be the total of Value where ID and Month are equal to whatever the lookup values are. For example, using the above I would say:
Find the total of Value where Month equals Jan and ID equals 1234567.
The answer in this case would be 1223.
Ive just tried
=SUMIFS(INPUT!H:H,INPUT!D:D='TRANS BY MID'!B2,INPUT!A:A='TRANS BY MID'!C1)
INPUT!H:H is my ID column
INPUT!D:D='TRANS BY MID'!B2 is the ID I want to use
INPUT!A:A is the Month column
TRANS BY MID'!C1 is Jan
To provide a working solution I simplified your question into one sheet. The data appears like this:
You can link your other sheet to the values shown in column J.
The formula is now this:
=SUMIFS(G:G,D:D,J1,A:A,J2)
The result is shown in J7:

Resources