VBA Date cleanup / reformatting - excel

I'm relatively new to VBA coding and wanted to get some ideas as to how I could do some data cleanup/reformatting. I have a excel data export from a system that has very little business logic/validation.
As a result I have a Date column that has data integrity issues that I have examples of below. Dates are not formatted the same consistently, there is dates combined with text strings, in some cases only text (in the date field)
Here are examples of the data I have in the Date column:
2/2/2018
8/3/2018
1996
1990-1991
02/29/95
1992-93
05/08/200
DECLINED
5/1418
8/14/2018
06/09/200
1/12/94, DECLINED CONTRACT 12/01/00
EXP CAT I
06/14/23018
1996
5-1-1207/07/92
8/3/2018
3-10-
1996
02/27/187
1-29-14
2/2/2018
1-4-11
3.8.99
2-17-12
10-6-16
I would like to convert the dates into the MM/DD/YYYY format. I realize where I just have pure text (e.g. 'DECLINED') that there is no way to extract a date, however I'm hoping for the other examples it may be possible to format the date to the above.
Some of the dates are plain no good (e.g. '5/1418' can't determine how to translate this), but I'm hoping for at least the dates formatted with MM-DD-YYYY and MM.DD.YYYY and similar combinations there is a way to convert their formatting, as well as where I just have 1 digit Month and Day (e.g. 2/2/2018 should be 02/02/2018). If just a 4 digit year is provided I want to convert to '01/01/(year)' Any ideas are appreciated.

This is one of those problems that you could spend a year trying to solve to get a 100% solution. The good news is you can get super lazy and have a nice 60% solution by using the VBA CDATE() function which makes a good guess for whatever you feed it. Tossing Split() at it to peel off extra words and whatnot (that may follow the date with a space or a comma) you can get most of the actual dates covered here. The remaining records are either dates that are so badly formatted that you will have to write code for the edge case, or it's just garbage non-date stuff you can ignore.
Create a new module in VBA and pop this in:
Public Function dateguesser(inDate As String) As Date
dateguesser = CDate(Split(Split(inDate, " ")(0), ",")(0))
End Function
Then in your sheet you can use this as a new function
=dateguesser(A1)
And copy down. For your list, you get the following:
+----+---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | A | B |
+----+---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 2/2/2018 | 2/2/2018 |
| 2 | 8/3/2018 | 8/3/2018 |
| 3 | 1996 | 6/18/1905 |
| 4 | 1990-1991 | #VALUE! |
| 5 | 02/29/95 | #VALUE! |
| 6 | 1992-93 | #VALUE! |
| 7 | 05/08/200 | ############################################################################################################################################################################################################################################################### |
| 8 | DECLINED | #VALUE! |
| 9 | 5/1418 | ############################################################################################################################################################################################################################################################### |
| 10 | 8/14/2018 | 8/14/2018 |
| 11 | 06/09/200 | ############################################################################################################################################################################################################################################################### |
| 12 | 1/12/94, DECLINED CONTRACT 12/01/00 | 1/12/1994 |
| 13 | EXP CAT I | #VALUE! |
| 14 | 06/14/23018 | #VALUE! |
| 15 | 1996 | 6/18/1905 |
| 16 | 5-1-1207/07/92 | #VALUE! |
| 17 | 8/3/2018 | 8/3/2018 |
| 18 | 3-10- | #VALUE! |
| 19 | 1996 | 6/18/1905 |
| 20 | 02/27/187 | ############################################################################################################################################################################################################################################################### |
| 21 | 1-29-14 | 1/29/2014 |
| 22 | 2/2/2018 | 2/2/2018 |
| 23 | 1-4-11 | 1/4/2011 |
| 24 | 3.8.99 | #VALUE! |
| 25 | 2-17-12 | 2/17/2012 |
| 26 | 10-6-16 | 10/6/2016 |
+----+---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Clearly this is just a starting point, but I think it's a good solid starting point. The remaining crap you can start writing edge cases for in your VBA, but the closer you get to 100% the longer it's going to take to get any further and before you know it you'll be a month into this and wondering what's happened to your life.

Related

Counting Time within ranges in table

So I did look through a couple of existing answers, but they were all in some programing language (ie, RUBY, PHP, etc). I also managed to figure out a way to do what I want but some of my formulas felt either hard coded or verbose. So my question is, is there a cleaner way to write my formulas to achieve my goal.
What I am starting with is a simple line to calculate cost of electricity usage. It look like:
+─────────────+────────+───────+───────+─────────+──────────+───────────+───────────+────────+──────────+───────────+───────────+──────────+──────────────+─────────────+
| | | | | | KW used | | | | Rate | | | Usage | Delivery | Total Cost |
| Date | Start | Stop | KW/H | Season | On-Peak | Mid-Peak | Off-Peak | Total | On-Peak | Mid-Peak | Off-Peak | Cost | Cost | |
+─────────────+────────+───────+───────+─────────+──────────+───────────+───────────+────────+──────────+───────────+───────────+──────────+──────────────+─────────────+
| 2022/12/17 | 05:00 | 22:00 | 1.44 | Winter | 0 | 0 | 24.48 | 24.48 | 0.17 | 0.113 | 0.082 | 2.00736 | 1.141296768 | 3.15 |
+─────────────+────────+───────+───────+─────────+──────────+───────────+───────────+────────+──────────+───────────+───────────+──────────+──────────────+─────────────+
The first 4 columns are user entry fields and correspond to columns B through E. The remainder of the columns F through P are all formulas.
My seasonal time bands are contained in the following table
+─────────+──────────────+─────────────+───────────+──────────+──────────+──────────+
| Season | Start | Stop | Time | Start | Stop | Rate |
| | | | Period | (HH:mm) | (HH:mm) | ($/Kwh) |
+─────────+──────────────+─────────────+───────────+──────────+──────────+──────────+
| Summer | May 01 | October 31 | Off-Peak | 00:00 | 07:00 | 0.082 |
| | | | Mid-Peak | 07:00 | 11:00 | 0.113 |
| | | | On-Peak | 11:00 | 17:00 | 0.17 |
| | | | Mid-Peak | 17:00 | 19:00 | 0.113 |
| | | | Off-Peak | 19:00 | 24:00 | 0.082 |
+─────────+──────────────+─────────────+───────────+──────────+──────────+──────────+
| Winter | November 01 | April 30 | Off-Peak | 00:00 | 07:00 | 0.082 |
| | | | On-Peak | 07:00 | 11:00 | 0.17 |
| | | | Mid-Peak | 11:00 | 17:00 | 0.113 |
| | | | On-Peak | 17:00 | 19:00 | 0.17 |
| | | | Off-Peak | 19:00 | 24:00 | 0.082 |
+─────────+──────────────+─────────────+───────────+──────────+──────────+──────────+
Note: weekends and holidays are Off-Peak
To aid with formula reading (or so I thought) I used the following named ranges:
Season_1 =$B$75 ("Summer")
Season_1_Start =DATE(YEAR(TODAY()),5,1) ("May 01")
Season_1_End =DATE(YEAR(TODAY()),10,31) ("Oct 31")
Season_1_TimeRates =$E$75:$H$79 ("Off-Peak" to 0.82)
Season_2 =$B$80 ("Winter")
Season_2_Start =DATE(YEAR(TODAY()),11,1) ("Nov 01")
Season_2_End =DATE(YEAR(TODAY()),4,30) ("Apr 30")
Season_2_TimeRates =$E$80:$H$84 ("Off-Peak" to 0.82)
TimeRates =IF($F3="Summer",Season_1_TimeRates,Season_2_TimeRates)
Step 1
Determining the season in F4
I used the following formula in F4.
=IF(AND($B3>=DATE(YEAR($B3),MONTH(Season_1_Start),DAY(Season_1_Start)),$B3<=DATE(YEAR($B3),MONTH(Season_1_End),DAY(Season_1_End))),Season_1,Season_2)
It seamed reasonable and substituted the year of the actual entry for making date comparison. I took the approach of if its not within the range for summer, then by default it has to be winter. Part of me is thinking I should check the winter date check. If both date checks fail then toss up a check date statement or the like.
Step 2
Determine how many hours are in each category according to the table and place results in G through I. This is where things go a little ugly for me. After chasing my tail in circle for a little bit I said screw it and plunged in with some hard coding and got something that worked but looked rather ugly and lengthy to me
I started out by figuring out that each time range check had 6 possible outcomes. I boiled those outcomes down to 2 possible results: 0 or Formula.
SB = Start Before
SI = Start Inside
SA = Start After
FB = Finish Before
FI = Finish Inside
FA = Finish After
So the results were either 0 because the time range in C and D are outside of the Time range of the table. Or its some form of Finish - Start.
I couldn't figure out how to make it count the hours in the range using the title (ie On-Peak) because there were sometimes more than one entry depending on the season in F. So what I did as a working step was to create a formula that would count the hours for each of the 5 time ranges instead of the names of the time range:
To get the hours I used the following formulas for each row:
=IF(OR($D3<=INDEX(TimeRates,1,2),$C3>=INDEX(TimeRates,1,3)),0,IF($D3<=INDEX(TimeRates,1,3),$D3,INDEX(TimeRates,1,3))-IF($C3<=INDEX(TimeRates,1,2),INDEX(TimeRates,1,2),$C3))*24
=IF(OR($D3<=INDEX(TimeRates,2,2),$C3>=INDEX(TimeRates,2,3)),0,IF($D3<=INDEX(TimeRates,2,3),$D3,INDEX(TimeRates,2,3))-IF($C3<=INDEX(TimeRates,2,2),INDEX(TimeRates,2,2),$C3))*24
=IF(OR($D3<=INDEX(TimeRates,3,2),$C3>=INDEX(TimeRates,3,3)),0,IF($D3<=INDEX(TimeRates,3,3),$D3,INDEX(TimeRates,3,3))-IF($C3<=INDEX(TimeRates,3,2),INDEX(TimeRates,3,2),$C3))*24
=IF(OR($D3<=INDEX(TimeRates,4,2),$C3>=INDEX(TimeRates,4,3)),0,IF($D3<=INDEX(TimeRates,4,3),$D3,INDEX(TimeRates,4,3))-IF($C3<=INDEX(TimeRates,4,2),INDEX(TimeRates,4,2),$C3))*24
=IF(OR($D3<=INDEX(TimeRates,5,2),$C3>=INDEX(TimeRates,5,3)),0,IF($D3<=INDEX(TimeRates,5,3),$D3,INDEX(TimeRates,5,3))-IF($C3<=INDEX(TimeRates,5,2),INDEX(TimeRates,5,2),$C3))*24
I wound up hard coding the entries because ultimately these are going to wind up in the formula for G3 through I3. Otherwise I could have used the Range # column as index points.
So the next problem I had was adding ranges with the same name together. Could have used SUMIF if I was keeping the table, but in my head I could not as A) I was not keeping the table, B) This would eventually need to be in a row that could be copied down.
So I looked at what needed to be done for each season and it was not bad...IF the formula was short.
So basically need an IF statement looking for summer and follow the addition of appropriate seasonal ranges for each range type. Seems simple enough but it gave me the following formulas for G3 through I3:
=$E3*IF(OR(WEEKDAY($B3)={1,7}),0,IF($F3="Summer",IF(OR($D3<=INDEX(TimeRates,3,2),$C3>=INDEX(TimeRates,3,3)),0,IF($D3<=INDEX(TimeRates,3,3),$D3,INDEX(TimeRates,3,3))-IF($C3<=INDEX(TimeRates,3,2),INDEX(TimeRates,3,2),$C3)),IF(OR($D3<=INDEX(TimeRates,2,2),$C3>=INDEX(TimeRates,2,3)),0,IF($D3<=INDEX(TimeRates,2,3),$D3,INDEX(TimeRates,2,3))-IF($C3<=INDEX(TimeRates,2,2),INDEX(TimeRates,2,2),$C3))+IF(OR($D3<=INDEX(TimeRates,4,2),$C3>=INDEX(TimeRates,4,3)),0,IF($D3<=INDEX(TimeRates,4,3),$D3,INDEX(TimeRates,4,3))-IF($C3<=INDEX(TimeRates,4,2),INDEX(TimeRates,4,2),$C3))))*24
=$E3*IF(OR(WEEKDAY($B3)={1,7}),0,IF($F3="Summer",IF(OR($D3<=INDEX(TimeRates,2,2),$C3>=INDEX(TimeRates,2,3)),0,IF($D3<=INDEX(TimeRates,2,3),$D3,INDEX(TimeRates,2,3))-IF($C3<=INDEX(TimeRates,2,2),INDEX(TimeRates,2,2),$C3))+IF(OR($D3<=INDEX(TimeRates,4,2),$C3>=INDEX(TimeRates,4,3)),0,IF($D3<=INDEX(TimeRates,4,3),$D3,INDEX(TimeRates,4,3))-IF($C3<=INDEX(TimeRates,4,2),INDEX(TimeRates,4,2),$C3)),IF(OR($D3<=INDEX(TimeRates,3,2),$C3>=INDEX(TimeRates,3,3)),0,IF($D3<=INDEX(TimeRates,3,3),$D3,INDEX(TimeRates,3,3))-IF($C3<=INDEX(TimeRates,3,2),INDEX(TimeRates,3,2),$C3))))*24
=$E3*IF(OR(WEEKDAY($B3)={1,7}),$D3-$C3,IF(OR($D3<=INDEX(TimeRates,1,2),$C3>=INDEX(TimeRates,1,3)),0,IF($D3<=INDEX(TimeRates,1,3),$D3,INDEX(TimeRates,1,3))-IF($C3<=INDEX(TimeRates,1,2),INDEX(TimeRates,1,2),$C3))+IF(OR($D3<=INDEX(TimeRates,5,2),$C3>=INDEX(TimeRates,5,3)),0,IF($D3<=INDEX(TimeRates,5,3),$D3,INDEX(TimeRates,5,3))-IF($C3<=INDEX(TimeRates,5,2),INDEX(TimeRates,5,2),$C3)))*24
So the question is, is there way to use formulas that would allow for the tidying up of what is currently in place for cells G3 to I3 (last 3 formulas in this question) that can be copied downward?
UPDATE as requested
I am working on the table on the top, below it is my building block area for thoughts. You may recognize some of the shots above from it.
Using Excel 2013

How to create a column showing if a value is in the bottom 10 values?

I have a large data set that contains details about objects that are currently on an extension. The extensions are given a specific due date. Some of the extensions are past their due date.
I'm struggling to work out how to create a column in PowerPivot for O365 Excel that will return a yes/no value depending on if the object is one of the 5 most overdue extensions. So far nothing I've tried has worked at all.
Example with fake data:
+-----------+---------+--------------------+------------+
| ID | Urgency | Bus Days Remaining | Due Date |
+-----------+---------+--------------------+------------+
| 118017544 | Overdue | -487 | 1/04/2017 |
| 34960939 | Overdue | -97 | 30/09/2018 |
| 10695082 | Overdue | -364 | 20/09/2017 |
| 166236826 | Overdue | -86 | 15/10/2018 |
| 166236826 | Overdue | -86 | 15/10/2018 |
| 34944450 | Overdue | -437 | 9/06/2017 |
| 69427293 | Overdue | -446 | 29/05/2017 |
| 56280961 | Overdue | -437 | 9/06/2017 |
| 12535364 | Overdue | -176 | 11/06/2018 |
| 46296100 | Overdue | -163 | 28/06/2018 |
| 171666963 | Overdue | -122 | 24/08/2018 |
+-----------+---------+--------------------+------------+
The calculated column should be able to put a "Yes" next to 5 rows in this data that are the oldest.
Factors that might be important:
Multiple extensions can share a due date but be separate extensions. This makes me think that the formula needs to be based off of the "Bus Days Remaining" column value
Excel has a function in Pivot Tables where you only show the Top 10 values. This isn't an option for me because using that filter means you cannot drill into the Pivot Tables data.
Any help you could provide would be great :)
Thanks in advance
Please try this formula.
=C2<=SMALL(C$2:C$12,5)
If the 5th and 6th smallest are equal the formula will return TRUE for more than 5 items.

Excel Sum product values and stock then multiplie when multiple criteria

So I have this information:
+---------------+---------+-------+------------+
| Chocolate | Brand | Stock | Sale value |
+---------------+---------+-------+------------+
| Chokito | Nestlé | 1520 | $3,50 |
| Snickers | Mars | 3300 | $5,20 |
| Snickers 2 | Mars | 500 | $2,50 |
| Kit Kat | Nestlé | 2000 | $9,10 |
| Double Decker | Cadbury | 1000 | $2,50 |
| Idaho | Mars | 0 | $6,10 |
| Caramello | Cadbury | 350 | $7,50 |
| Cadbury Daily | Cadbury | 1000 | $3,10 |
| Almond Joy | Hershey | 500 | $1,50 |
| Twix | Nestlé | 999 | $4,50 |
| Zero Bar | Hershey | 488 | $5,50 |
+---------------+---------+-------+------------+
Wha I want to get the total stock value for each brand. I get these values by inserting a column of of stock * value then doing a Pivot Table
Cadbury $8.225,00
Hershey $3.434,00
Mars $18.410,00
Nestlé $28.015,50
But what I want to do is a formula in Excel that will get this same values.
I first tried using SUMIF but obvioulsy it didnt worked xD
I cant think of any other formula
Thanks for your help
Try,
=SUMPRODUCT((C$2:C$12), (D$2:D$12), --(B$2:B$12=G4))
For a dynamic length of data,
=SUMPRODUCT((C$2:INDEX(C:C, MATCH(1E+99, C:C))), (D$2:INDEX(D:D, MATCH(1E+99, C:C))), --(B$2:INDEX(B:B, MATCH(1E+99, C:C))=G4))
Alternative approach using sumif
Place the following in E2 and copy down
=D2*E2
this will give the value you of each individual chocolate level in stock
in column G generate a list of brands
in H2 use the following formula and copy down as needed
=SUMIF(B:B,G2,E:E)

DAX Cumulative Total With Date Filters

I am trying to calculate a running total where orders are only valid during a certain date range. Each order has a value, a start date and an end date. I want to calculate the cumulative sum of the order's values only during the dates between an order's start date and end date.
I've read over this article on cumulative totals and have an equation for the running total but I can't figure out how to filter the equation so that it filter's out an order once the date table is past the order's End Date. The current measure I have is Cumulative Value:=CALCULATE(SUM(Orders[Vaue]), FILTER(ALL('Date'), [Date] <= MAX([Date]))) and I want to add a filter that filters out any orders with an end date past the current date row, similar to this Filter('Order', 'Orders'[Order_End_Date] < 'Date'[Date]). When I try to add this filter though I get an error since 'Date'[Date] is not used in any aggregation.
Below is the data model that I am using and a link to the Excel File with the data model.
The sample Data:
+-----------+
| Date |
+-----------+
| 1/1/2015 |
| 1/2/2015 |
| 1/3/2015 |
| 1/4/2015 |
| 1/5/2015 |
| 1/6/2015 |
| 1/7/2015 |
| 1/8/2015 |
| 1/9/2015 |
| 1/10/2015 |
+-----------+
+----------+------+------------------+----------------+
| Order_Id | Vaue | Order_Start_Date | Order_End_Date |
+----------+------+------------------+----------------+
| 1 | 1 | 1/1/2015 | 1/3/2015 |
| 2 | 3 | 1/2/2015 | |
| 3 | 6 | 1/3/2015 | 1/7/2015 |
| 4 | 7 | 1/5/2015 | |
+----------+------+------------------+----------------+
And the output of the current measure I have and what the correct measure's output should be.
+-----------+-----------------+--------------------------+
| Date | Current Measure | Desired Measure's Output |
+-----------+-----------------+--------------------------+
| 1/1/2015 | 1 | 1 |
| 1/2/2015 | 4 | 4 |
| 1/3/2015 | 10 | 9 |
| 1/4/2015 | 10 | 9 |
| 1/5/2015 | 17 | 16 |
| 1/6/2015 | 17 | 16 |
| 1/7/2015 | 17 | 10 |
| 1/8/2015 | 17 | 10 |
| 1/9/2015 | 17 | 10 |
| 1/10/2015 | 17 | 10 |
+-----------+-----------------+--------------------------+
Cumulative Value2:=CALCULATE(
SUM(Orders[Vaue])
,FILTER(
VALUES(Orders[Order_Start_Date])
,Orders[Order_Start_Date] <= MAX('Date'[Date])
)
,FILTER(
VALUES(Orders[Order_End_Date])
,ISBLANK(Orders[Order_End_Date])
|| Orders[Order_End_Date] >= MAX('Date'[Date])
)
)
Model Diagram (note I took out your date relation - for the limited use case you've provided, it only makes things more complicated):
Note: I will refer to function arguments positionally, with the first argument represented by (1).
So, what we're doing is similar to what you were trying. We've got two FILTER()s, each as an argument to our CALCULATE(). CALCULATE() combines its arguments (2)-(n) in a logical and.
The first FILTER() does essentially what you were already doing, except we are filtering the distinct values of the [Order_Start_Date], comparing them against the current filter context of the pivot table.
The second FILTER() loops over the distinct values of [Order_End_Date], checking two conditions combined in a logical or. We must handle the case of a BLANK [Order_End_Date]. This BLANK is normally implicitly converted to 0 == 1899-12-30, which is less than any date we're considering. In the case of a BLANK, we get a true value from ISBLANK() and the row is returned as a part of FILTER()'s resultset. The other test is simply checking that [Order_End_Date] is greater than the current filter context date in the pivot.
What you are looking for is often called the "event in progress" problem. Here are some posts that will help you to solve your problem.
a solid summary of the problem
a special case
guess this will help on first sight
if you can't get enough - read the complete white paper
I hope this helps.
-Tom

Excel 2010 Calculating Production line quantities without long calculations

Program: Excel 2010
Requirements: Prefer no VBA (Macro free book)
I am creating a spreadsheet to calculate items required for components (parts). I have a list of the product, and under the number of specific parts. I have a calculation which tells me what the total parts are needed, but, is there a better way?
=($C$32*C34)+($D$32*D34)+($E$32*E34)+($F$32*F34)+($G$32*G34)+($H$32*H34)+($I$32*I34)+($J$32*J34)+($K$32*K34)
| A | B | C | D | E | F |
| Making: | | 2 | 2 | 2 | |
|---------------|-------|------------|-------------|-----------------|---------|
| Item -> | Total | Small raft | Rowing boat | Sm sailing boat | Corbita |
| | | | | | |
| Planks | 20 | 4 | 6 | | |
| Logs | 8 | 4 | | | |
| Nails - Large | 16 | 8 | | | |
| Oars | | | | | |
In the above, you can see that ($C$32*C34) = 8 & ($D$32*D34) = 12 => 12+8 = 20 (B34) (Planks Total)
Is there an easier way of doing this, or will my equation just keep getting bigger?
Thanks in advance.
As chris neilsen mentioned in his comment, you can use the SUMPRODUCT function in Excel. The formula in your cell B34 (total planks) should look like this:
=SUMPRODUCT(C32:K32,C34:K34)
This has the effect of multiplying the corresponding components in the given ranges (C32 * C34, D32 * D34, etc.) and then returning the sum of those products/multiplications.
As you add more columns, you can expand K to the last column in the range that you want to add up in both ranges.

Resources