I have a spreadsheet with a dataset of a number of transactions, each of which is composed of substeps, each of which has the time that it occurred. There can be a variable number and order of steps.
I'd like to find the duration of each transaction. If I can do this in Excel then great, as it's already in that format. If there isn't a straight-forward way to do this in Excel, I'll load it into a database and do the analysis with SQL. If there is an Excel way round this it'll save a few hours setup though :)
A simplified example of my data is as follows:
TransID, Substep, Time
1, step A, 15:00:00
1, step B, 15:01:00
1, step C, 15:02:00
2, step B, 15:03:00
2, step C, 15:04:00
2, step E, 15:05:00
2, step F, 15:06:00
3, step C, 15:07:00
3, step D, 15:08:00
etc.
I'd like to produce a result set as follows:
TransID, Duration
1, 00:02:00
2, 00:03:00
3, 00:01:00
etc.
My initial try was with an extra column with a formula subtracting end time from start time, but without a repeating number of steps, or the same start and end steps I'm having difficulty seeing how this formula would work.
I've also tried creating a pivot table based on this data with ID as the rows and Time as the data. I can change the field settings on the time data to return grouped values such as count or max, but am struggling to see how this can be setup to show max(time) - min(time) for each ID, hence why I'm thinking about heading to SQL. If anyone can point out anything obvious I'm missing though, I'd be very grateful.
As suggested by Hobbo, I've now used a pivot table with TransID as the rows and twice added Time as the data. After setting the field settings on the Time to Max on the first and Min on the second, a formula can be added just outside the pivot table to calculate the differences. One thing I'd been overlooking here is that the same value can be added to the data section more than once!
A follow-on problem was that the formula I add is of the form =GETPIVOTDATA("Max of Time",$A$4,"ID",1)-GETPIVOTDATA("Min of Time",$A$4,"ID",1), whici doesn't then increment when copying and pasting. Solutions to this are to either use the pivot table toolbar to turn off GETPIVOTDATA formulae, or rather than clicking on the pivot table when selecting cells in the formula, type the cell references instead (e.g. =H4-G4)
In your formula "GETPIVOTDATA("Max of Time, $A$4, "ID", 1) - GETPIVOTDATA("Max of Time, $A$4, "ID", 1)' the cell references are addressed between the symbol "$'. For example $A$4. When the cell references having $ symbol and you copy the formula to other cell then reference cells are not updated automatically. Hence you get the same type.
Perhaps you modify the formula as follows and then copy the formula to other cells. The formula should be like:
"GETPIVOTDATA("Max of Time, A4, "ID", 1) - GETPIVOTDATA("Max of Time, A4, "ID", 1)".
Thanks.
You were on the right lines with pivot tables. Drag in TransID as a row field then drag in two copies of Time as data fields in the pivot table; right click on each and specify Min as the summarization function for one and Max for the other. To the right of the pivot table add a formula to calculate the difference.
alt text http://img296.imageshack.us/img296/5866/pivottableey5.jpg
"Looks good, the only problem I have is that the formula I add is of the the form =GETPIVOTDATA("Max of Time, $A$4, "ID", 1) - GETPIVOTDATA("Max of Time, $A$4, "ID", 1). When I copy that to the cells below, the 1 doesn't update to 2, 3 etc so they all show the same time. – Kris Coverdale "
Use this button on the pivot table toolbar to switch GETPIVOTDATA formulae off.
alt text http://img117.imageshack.us/img117/9937/pivottabletoolbarjn3.jpg
Maybe something as simple as a query like this.
SELECT TransID, DateDiff(mi, Min(Time),Max(Time)) AS Duration
FROM MyTable
GROUP BY TrandID
In excel:
A B C
1 1, step A, 15:00:00
2 1, step B, 15:01:00
3 1, step C, 15:02:00
4 2, step B, 15:03:00
5 2, step C, 15:04:00
6 2, step E, 15:05:00
7 2, step F, 15:06:00
8 3, step C, 15:07:00
9 3, step D, 15:08:00
11 1, =max(if($A$1:$A$9=$A11,$C$1:$C$9,"")-min(if($A$1:$A$9=$A11,$C$1:$C$9,"")
12 2, =max(if($A$1:$A$9=$A12,$C$1:$C$9,"")-min(if($A$1:$A$9=$A12,$C$1:$C$9,"")
note: formulas are array functions so press ctrl-shift-enter after editing them.
To add to Kibbee's post, in reference to the comment, you can use ADO with Excel:
'From: http://support.microsoft.com/kb/246335 '
strFile = Workbooks(1).FullName
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
cn.Open strCon
strSQL = "SELECT TransID, DateDiff('n', Min([MyTime]),Max([MyTime])) AS Duration " _
& "FROM [Sheet1$] GROUP BY TransID"
rs.Open strSQL, cn
'Write out to another sheet '
Worksheets(2).Cells(2, 1).CopyFromRecordset rs
EDIT: I have corrected some errors in the original post and changed the name of the time column to MyTime. Time is a reserved word in SQL and causes difficulties in queries. This now works on a very simple test.
Sometimes it is possible to do something once in Excel far more easily than it is to do something repeatably.
Assuming you are just trying to get the answer once or twice, and then throw away the spreadsheet (as opposed to run it every night, or give it to someone else to run), here's how I would do it.
I assume your raw data is in columns A, B and C, with headings in row 1, and data starting in row 2.
Sort the table by TransId as your primary key, and Time as your secondary, both ascending. (The following won't work if this isn't done.)
Add a new column, D, titled Duration with a formula that like this (Excel formulae haven't formatting or comments; I have added those to help explain, but they need to be stripped out):
=IF(B2=B3, // if this row's TransId is the same as the next one
"", // leave this field blank
C3- // else find the difference between the last timestamp and...
VLOOKUP( // look for the first value
A2, // matching this TransId
A:C, // within the entire table,
3) // Return the value in the third column - i.e. timestamp
)
Now the data you want is in column D, but not in the format you want.
Select Columns A-D and copy them. Use Paste Special to copy the values only into a new worksheet.
Delete column B and column C in the new worksheet, so all is left is TransID and Duration.
Sort by Duration, to bring all the rows with values next to each other.
Sort only the rows with values by TransId.
Voila, and there is your solution! Hope you don't need to repeat this!
p.s. This is untested
Related
I've been trying soo hard much to clean up this csv data for a coworker.
I’m going to walk through what the data usually looks like and then walk through the steps I’ve done and then bring up what I’m currently struggling with… Bear with me as this is my first post (and I have no background in vba and everything is self-taught by Google).
So the data export is a csv which can be opened in excel broken out by several columns. The column in question is column G, which essentially has multiple data sets (1 – 219) for the same menu item (row).
For example:
A B C D E F G
Chicken Soup {1;$6.00;59;$9.00;88;$6.00}
Beef Soup {1;$8.00;59;$12.00;88;$8.00}
Duck Soup {1;$6.00;59;$6.00;88;$6.00}
Egg Soup {1;$8.00;59;$9.00;88;$8.00}
Water {1;$0.00}
French Onion Soup {1;$16.00;59;$15.00;88;$12.00}
Chili Soup {1;$17.00;84;$17.00}
So in column G, you can tell, there is multiple prices the format is:
{Column Number ; $ Price ; Column number $ Price etc & }
Regex: .[0-9]{1,2},[$][0-9]{1,3}[.][0-9][0-9].|[0-9]{1,2},[$][0-9]{1,3}[.][0-9][0-9]
The first goal was to parse out the data in the column into the row, in a format that is true to the csv (so it can be combined and resubmitted).
For example: (imagine there is a semi colon between each data set, as there should be in the final result)
{1;$21.00}
{1;$16.00}
{1;$12.00 5;$12.00 8;$12.00}
{1;$18.00 6;$18.00 8;$18.00}
{1;$10.00 6;$7.00 9;$12.00 11;$10.00}
{1;$20.00 6;$20.00 8;$20.00}
{1;$5.49 3;$3.99 10;$4.99 12;$4.99}
{1;$18.99}
{1;$21.00}
{1;$21.00}
To accomplish this goal, I wrote a macro that:
Copies column G from “Sheet1” and inputs to new sheet “Sheet2” in A1
Replace all “;$” with “,$” to help separate each data set by itself instead of having it broken out column name then dollar sign in two different columns
Text to columns macro splitting on “;” (and inputs results starting B1 so I can keep A1 with all the data sets in one column in case I need it) – also if you know how to keep the semi colon here, that would be helpful so I don’t have to re-add it in the future
Replace All from b1 to end of data set "," to ";" <-- to bring it back to original formatting
Copies the Data from B1 to last cell with data (data is not in order, the 50th row could have 219 columns and then the last row could only have 150) and pastes this data into column G of “rp items” (therefore overriding the existing data and shifting the columns as far right as the last column used.
However, when I showed my coworker what I’ve done, he wanted the leading number (column number) to correspond to the Columns (since data starts in column G, this will be column 1, H will be 2 etc). Therefore looking something like this so he can filter the column by the all the items that have that column number:
For example, this photo is how the outcome should look
So, now the goal is to create a macro that…
Loops through B1:B in sheet “STEP ONE” (column B starting at B1 then C1 then when blank in that row go to next row)
While (B1 (or next row) is blank, do nothing, end macro)
If B1 (or active cell) is not blank, read the cell value to extract column; copy the cell’s contents, paste in “STEP TWO” sheet in the same row as the active cell, but offset by the column number from cell value.
Toggle back to main sheet, goes to next cell within that row – if blank, go to next row and repeat until all data is done.
To give you some background, I have more than 25,000 lines of data (menu items) and the longest column (I believe is 219). So far, I’ve mostly been trying pieces of scripts I’ve found online but none of them are doing similar to what I need and I don’t know how to write enough code to just write the script out myself. I believe I’ll need to have to establish a variable: the column name (not sure if I can extract this using the regex code I found out) and then use that in the offset...
But my range needs to be dynamic and loop…
Please let me know if you can assist – I’ve been stuck on this for like a week!
Thank you all so much for reading – if I can provide extra detail please let me know.
For example you could do it this way:
Sub Tester()
Dim arr, i As Long, c As Range, v, col, price
For Each c In Range("G2:G4").Cells
v = Replace(Replace(c.Value, "{", ""), "}", "") 'remove braces
If Len(c.Value) > 0 Then 'anything to process?
arr = Split(v, ";") 'split on ;
For i = 0 To UBound(arr) - 1 Step 2 'loop 2 at a time
col = CLng(Trim(arr(i))) 'column number
price = Trim(arr(i + 1)) 'price
c.Offset(0, col).Value = col & ";" & price
Next i
End If
Next c
End Sub
strong textSample table
I want to get the total of the columns "Value" & "Quantity" which is in a dynamic table (in which number of rows will be different each time) at the end of the last row.
If you select a cell in your table the Table Design menu will appear on the Ribbon. In it is a checkbox for Total Row. Check it.
This will add a total row to your table. In my test it contained a total only for the last column on the left. However, you can use that same formula, modified for the appropriate column name, in other columns as well. The row will automatically stay at the bottom of the table when you add rows.
To add rows at the bottom manually, place the cursor in the last cell on the right (not in the totals row but above it) and hit Tab. You can also select any cell in the table and click Insert Table Row either above or below the clicked cell.
All of the above works for a true table. You exhibit doesn't look like a true table. If it isn't, please consider one of two options. Either replace the table you already have with what is called a "Table" in Excel and enjoy the advantages that offers, or modify your question to clarify that your question applies to a list of data which are not in a table.
you could try this:
dim TotalValue as Long
dim i, value as integer
value = 1 'use the column number of Value
i = 1
Do until isempty(workbooks("yourworkbook.xlsm").worksheets(1).cells(i,Value)) = True
Totalvalue = TotalValue + workbooks(yourworkbook.xlsm).worksheets(1).cells(i,Value)
i = i + 1
Loop
i = i + 1
(workbooks(yourworkbook.xlsm).worksheets(1).cells(i,Value).value = TotalValue
Then do the same with Quantity column
I have a dashboard (image below) where I manually add entries. Then there is a log (image below) where all entries are recorded with the help of IF and Vlookup functions.
I need a code so so that every output cell in the log finds through all the entries in the dashboard and gives the answer. I think loop for vlookup will be used.
[Edit]
Consider the Dasboard table as a discrete table where manually entries are posted.
Consider log table as a continues table where record of every hour for each date is kept. The entries from Dashboard table get posted to the log table. New Image attached New Image
I have entered this function in output column in the log table:
=IF( AND(H3=$B$3,I3>= $C$3,I3<$D$3) ,$E$3,0) + IF(AND(H3=$B$4,I3>=
$C$4,I3<$D$4) ,$E$4,0) + IF (AND(H3=$B$5,I3>= $C$5,I3<$D$5), $E$5,0)
This works fine for me for plotting the entries but the problem is for every row in the dashboard i have to add a new IF-And function in the above. so for example if i want to add the 4th row of dashboard to be sync with the log ill have to add
+If(AND(H3=$B$6,I3>=$C$6,I3<$D$6),$E$6,0)
I want every row in the dashboard to add automatically somehow with a loop like:
i = variable
= If (AND(H3=$B$i,I3>= $C$i,I3<$D$i), $E$i,0)
Only one i will be greater than 0 while the rest will be zero. so the function should return me the sum of all i rather than just the last iteration.
Maybe just filldown the formula manually? if insist with macro, something like
Sub test()
Range("K3").Value = "X"
Range("K3:K10").FillDown
End Sub
replace "X" with your formula, keep the " "
replace K10 with how far down you want
----edit----
let me break down below for you,
match(H3,B:B,0), will find the correct row in B that = H, in H3 case it finds B3
INDEX(B:E,MATCH(H3,B:B,0),2) -> now it find B3, index let you find C3 (notice the 2,3,4 in later codes, it means the column from B3)
and (I3>=...,J3>=...)now we got both start and end time, we use I and J to compare
if 3. is true, lookup the output column, else 0
=IF(AND(I3>=INDEX(B:E,MATCH(H3,B:B,0),2),J3>=INDEX(B:E,MATCH(H3,B:B,0),3)),INDEX(B:E,MATCH(H3,B:B,0),4),0)
I have an excel file, which came out of two files, i copied everything from the second file, to the first. The task needed to be accomplished, is the following: for each row (actually for each id, stored in column A - which came from excel 1 ) that the following condition is true:
column E(which came from excel 1) = column B(which came from excel 2) AND column C(which came from excel 1) = column D(which came from excel 2),
then i must insert data from column Z(which came from excel 2) and column Y (which came from excel 2) to column X(which came from excel 1).
If column X, has already data in it, all new inserted data must have | as a separator, and should be inserted in front of the existing value.
If no data exist in column Z, or Y, then nothing is inserted to column X.
I have tried without success, to use index match, but no luck, i think something more complicated is needed. How can i do this with an excel formula?
How the below formula should be transformed, in order to get the row that
fulfils the conditions?
=X2&"|" & IF(AND(Sheet1!E2=Sheet2!B2,Sheet1!C2=Sheet2!D2),Sheet2!Z2 & "|" & Sheet2!Y2,"")
=X2&"|" & IF(AND(Sheet1!E2=Sheet2!B:B,Sheet1!C2=Sheet2!D:D),Sheet2!Z#lineWHICHMATCHESCRITERIA & "|" & Sheet2!Y#lineWHICHMATCHESCRITERIA,"")
I have a client that has a simple yet complicated request for an excel sheet setup, and I can't for the world thing of where to start. I'm drawing a blank.
We have a data range. Example:
Quarter Data
2010Q1 1
2010Q2 3
2010Q3 4
2010Q4 1
I have a chart built on top of that. Change data, chart changes, protect worksheet to keep other idi... er... users from changing old data. Simple.
What I want to have happen: When I add the next Q1 below Q4, the chart "automagically" selects the most recent 4Q. So when I update the data to:
Quarter Data
2010Q1 1
2010Q2 3
2010Q3 4
2010Q4 1
2011Q1 7
The chart will show data for the last 4 quarters (2010Q2 thru 2011Q1). The goal being: keep "old" data on the same sheet, but have the charts update to most recent quarters.
I'm thinking: "fixed" data locations, reverse the data (new data at top), and just insert row each new quarter:
Quarter Data
2011Q2 9
2011Q1 7
2010Q4 1
2010Q3 4
2010Q2 3
2010Q1 1
But this will involve a lot of changes to the already existing excel sheets and I was hoping that there may be an easier/better "fix".
EDIT:
#Lance Roberts ~ Running with your suggestion:
- Little more detail... The data is setup such that the column information is in A, but data for multiple tables are in B+. Table 1 is B/C. Table2 is D/E. Etc.
- Data is also on a different sheet than the tables.
Going by: This Offset Description, what I've tried doing is adjusting similar to such:
NAME FORMULA OFFSET(range, rows, columns, height, width )
DATA0 =OFFSET('DATASHEET'!$A$2, COUNTA('DATASHEET'!$A:$A - 8, 0, 8, 1)
DATA1 =OFFSET('DATASHEET'!$A$2, COUNTA('DATASHEET'!$A:$A - 8, 1, 8, 1)
DATA2 =OFFSET('DATASHEET'!$A$2, COUNTA('DATASHEET'!$A:$A - 8, 2, 8, 1)
Goal being to tie the length/location for B/C/etc data to A. So if I add a column on A, stuff tied to Data1/2 adjust accordingly (or 3/4/5/etc, which are different charts on different sheets
)
I want data cells to be picked by the first row, and then an offset number to get data x columns over. Variations on the formula don't seem to be working.
1 issue I haven't solved yet: the data is not aligning properly:
"Data" is always, last column under 2nd to last Quarter. Last quarter is always empty. Data is shifting to the right (in this example, under 3Q10 - NOT under the correct column. 11 should be under 4Q10. 9.5 should be under 2Q10).
I know I'm getting something simple wrong...
Seems to be working. First thing I had to change was CountA - 9 (not CountA - 8). Next was the "column offset" (0, 1, 2, 3,...). Also split some stuff up to make it more compartmentalized (I do have to train someone else how to do this for her reporting needs).
Thanks Lance :)
If the chart is on the same sheet as the data:
Name the first cell of the data (A2) as a named range, say TESTRANGE.
Created a named range MYDATA as the following formula:
=OFFSET(TESTRANGE, COUNTA($A:$A) - 5, 0, 4, 2)
Now, go to the SERIES tab of the chart SOURCE DATA dialog, and change your VALUES statement to:
=Sheet1!MYDATA
Now everytime you add a new row, it'll change the chart.
I know this is an old question, but I wanted to share an alternative that may be easier.
Change your Quarter-Data range to an Excel Table. Select the range, and press Ctrl+T. In the Insert Table, make sure the correct data range is selected, and that My Table Has Headers is checked, and press OK. This converts the simple range into a special data structure with magical properties.
Then make a new range which links to the last four rows of this table, and create a chart based on this new range. This is illustrated below. The table is the specially formatted range in A1:B9 (you can choose a less in-your-face style), and the plotting range is D1:E5.
The formulas in cells D2 through D5 are below. Copy D2:D5 and paste into E2:E5 to complete the formulas in our plotting range.
D2: =INDEX(Table1[Quarter],ROWS(Table1[Quarter])-3)
D3: =INDEX(Table1[Quarter],ROWS(Table1[Quarter])-2)
D4: =INDEX(Table1[Quarter],ROWS(Table1[Quarter])-1)
D5: =INDEX(Table1[Quarter],ROWS(Table1[Quarter]))
Table1 is the name assigned to the Table, and Quarter is the name of the first column of the Table (and also the column header). You don't need to type all this, just select the column in the Table. As the Table expands or contracts, Table1[Quarter] keeps track of the changes.
Now add a new data point. The Table expands, and our little staging area in D1:E5 links to the new last four rows of the table.
And as we add years worth of data, the formulas and the chart keep up.