Excel Data Entry made faster with Macro? - excel

I am currently trying write a macro in excel to help shorten the time that this will take. I have water sample data for 10 different manholes and each were sampled quarterly since 1994. Because of this, I have a total of 75 dates that I need to input per manhole for this spreadsheet. Each of these dates has 97 rows of information that is input into this sheet and then a space before going to the next date. My ultimate question is, how can I take the dates shown from the first image and copy the first one 97 times, then second 97 times without redundantly doing so. Example is the image attached.

In column A have an integer sequence, starting at 0 and going up. In column B, do integer division on column A (QUOTIENT). You'll integer divide by the number of times you want things repeated (97). Then in column C, use VLOOKUP, taking the value from column B as the index into your lookup table, to "copy" the values for dates.
If you have the same measurements for each date, you can use a new column that uses MOD on column A, then use that as the index for a second VLOOKUP into your measurements lookup table.

Related

Convert [Days, hh:mm:ss] to [hh:mm:ss]

Some data i export to Excel appears like this: '3 Days, 2:15:37' (when the period exceeds 24 hours only, [hh:mm:ss] otherwise, which is ok).
How can i display it on Excel as [hh:mm:ss] even when it exceeds 24 hours?
Maybe you'll find a better method, but here's one method that involves using some helper columns. Since your column is of General format, then it's going to recognize some of your cells as Time except for those with "Days". You have a mixture of formats so it's a but cumbersome.
You can probably consolidate a couple steps here, but breaking it out better identifies a process. Column A is your original data.
Column B: Formula to extract data to the right of "comma space":
=IFERROR(MID(A2,FIND(",",A2)+2,10),A2)
Column C: For cells containing the word "Day", take the value to the left of it and multiply by 24.
=IFERROR(LEFT(A2,FIND("Day",A2)-1)*24,0)
Column D: Extract the HOURS from your time format of column B.
=HOUR(B2)
Column E: Take the MM:SS from column B.
=TEXT(B2,"mm:ss")
Column F: Put it all together:
=CONCAT(C2+D2,":",E2)
Here's one way to combine the last two steps, which would reduce the number of helper columns by one.
=CONCAT(C2+D2,":",TEXT(B2,"mm:ss"))

Excel Formula or function that returns the Nth value from a dynamically generated grouping of cells

I am trying to assemble a index/match combination and am having trouble figuring out how to make it work. I have experience with a lot of the formula types in excel, but unfortunately I am pretty ignorant when it comes to these functions.
I will explain what I am trying to do first, but I have attached 3 images at the end that will probably make things more clear.
In order to identify the specific values I want, I am having to use helper cells. These helper cells are denoted with the (helper) tag in the pictures. These cells go through and grab the adjusted closing price of the stock (column A) at the beginning (column C) and the end (Column D) of a dynamically calculated period.
I would like to consolidate these values into numerical order in columns F and G. The thought is that the first non zero number in C/D is belongs to the first predefined period and should go into columns F/G beside the #1 (column E). This gets carried on through all of the periods (ex: 2nd non zero goes beside the number 2, third nonzero number goes beside the number 3 etc.)
This is just an example of one stock. I need the function or formula to be dynamic enough to work on a wide variety of distributions. Sometimes there are up to 100 dynamically calculated periods within the stock analysis.
Below are the images that should provide more clarity
Image 1 is an example of what the data looks like
Image 2 is a crudely drawn example of how I would like the data to move
Image 3 is the desired result
Image 1
Image 2
Image 3
Updated image for Scott Craner showing out of order results
Please let me know if I can clarify any confusion.
If you just need to return the first value of each period (column C) and the last value of each period (column D), you could use index match and lookup to do this without even using helper columns.
Try this in cell F2
=INDEX(A2:A50,MATCH(E2,B2:B50,0))
And this in cell G2
=LOOKUP(E2,B2:B50,A2:A50)
Depending on much variance is in your overall number of rows, you could use indirect references in the formulas to dynamically update the ranges.
Example:
=INDEX(A2:INDIRECT("A"&COUNTA(A:A)),MATCH(E2,B2:INDIRECT("B"&COUNTA(A:A)),0))
You will need to open macro. Then do the following in recorded macro.
+ Filter only non-null value in C/D
+ Select whole column in C/D then copy the whole column
+ Turn off Filter
+ Paste the whole C/D in F/G
+ Stop macro
Gook Luck
Put this formula if F2:
=INDEX(INDEX(C:C,MATCH($E2,$B:$B,0)):INDEX(C:C,MATCH($E2,$B:$B,0)+COUNTIF($B:$B,$E2)-1),MATCH(1E+99,INDEX(C:C,MATCH($E2,$B:$B,0)):INDEX(C:C,MATCH($E2,$B:$B,0)+COUNTIF($B:$B,$E2)-1)))
Copy over one column and down the list.

TTest in Excel across multiple columns

I have 56 columns that are of different lengths. I want to be a series of TTests between all of them. I know that the syntax in Excel is TTest(array1, array2, tail, type). Is there a fast way to do this with labeling? I know that manually there are 1540 combinations. I really do not want to type that many formulas out in Excel. So there are 56 columns. The first position in each column is a label for what data is in that column.
Thank you very much
I think the solution to your problem lays in the formula =OFFSET.
To display all solutions I wrote the numbers 1 to 56 (column number of first data set) in row 1 and 1 to 56 (column number of second data set) in column A and then you would only calculate the t-test for the lower half of this "cube".
This can be done with (here I just take always an array of length 100, because as far as I know empty cells will just be ignored by TTEST
=IF($A3<=B$1;"";TTEST(OFFSET(Sheet1!$A$1;0;$A3-1;100);OFFSET(Sheet1!$A$1;0;B$1-1;100);1;2))
I hope this helps.

How to convert 1min OHLC data into 5min OHLC data

I'm trying to convert 1 minute OHLC (Open/High/Low/Close) data into 5 minute OHLC data in Excel 2013. So far I know the principle. Open has to take the open value every 5 rows, similarly for Close. Min/Max is also understandable. Unfortunately Excel can't understand that I want to get Min/Max from rows 0-5, 5-10 etc. it goes 0-5, 1-6, 2-7 etc.
I was also trying to use AVERAGE somehow but it's pointless since its output doesn't correspond with reality at all. From some more research I think I will have to create a macro from functions AVERAGE, OFFSET, INDEX and MATCH and that's where my struggle begins. I have no idea how to construct that formula.
Here's a picture of how it looks after using Filter on the Count Column:
Add a column (assumed to be A) on the left with 0 as a label and =IF(MOD(B2,6)=0,1+A1,A1) in A2 copied down. Subtotal for each change in 0 and use Count on all the other columns. Change the first subtotal row formulae to be:
ColumnC: =C2
ColumnD: =SUBTOTAL(4,D2:D6)
ColumnE: =SUBTOTAL(5,E2:E6)
ColumnF: =F6
Filter for ColumnA contains c and copy formulae down.
Edit
pandas library of Python would do it easily. Give it a try

Optimizing multiple-criteria IFs

I'm performing array calculations that are taking a long time to complete. I'd like to optimize my formulas some more. All of the formulas are of the same nature - they perform some high-level function (Average, Slope, Min, Max) across a column of values. However, not all cells in a column are included in the array. I use multiple IF criteria to choose which cells get included. All comparisons are made to the current row. Here's an example of the data:
A B C D E
1 Company Generation Date Value ToCalculate
2 Abc 1 1/1/2010 5.6
3 ... ... ... ... ...
E would look something like this
{=Average(If(A2=A2:A1000, If(B2=B2:B1000, If(C2 > C2:C1000, D2:D1000))))}
So once E2 is calculated then I have to autofill down column E. Column F, G, H, ... Uses the same approach, either selects different values to operate on or a different function to perform. My dataset is quite large, and with only a few of these the spreadsheet is taking an hour plus to compute. Every so often I'll add a fourth criteria, all other criteria being the same.
Is there an efficiency? Some thoughts:
Can I use a single array per column instead of thousands per column?
Can I condense the first three criteria so that the output is row numbers? Perhaps then subsequent formulas won't have to search for multiple criteria but can just perform the function?
or somehow build the crtieria up? So a new column returns all rows where the company is the same. another column returns all rows from the first column where generation is the same...and so on...
For the Average you can do without arrays:
=AVERAGEIFS(D2:D$1000,A2:A$1000,A2,B2:B$1000,B2,C2:C$1000,"<="&C2)
As there is also a COUNTIFS and a SUMIFS, I think your slopes could be calculated the same way.
For the rest of the functions (max, min, etc), we should analyze case by case.
I did a slight performance test, and this is apparently better, but of course my datasets are just mocked.
HTH!
Note: Excel 2007 and up only!
Edit - Answering your comment.
Without knowing the dimensions of the problem is difficult to give advice, but I'll risk one anyway:
You could write a VBA function that:
1) Generates a new sheet for each company-generation pair
2) Sorts the data in those sheets by date
3) Adds the formulas to those sheets (no conditionals needed in this context)
4) Recalculates and Gets the results from those formulas and populates the original sheet
5) Deletes the auxiliary sheets
To capture the rows and re-use try this approach:
Sort the data by Company & Generation.
Make a unique list of Companies & generations (use Advanced Filter, Unique Only, Copy)
For each Company generation pair in the list build 2 columns of formulae. First column gives the count of rows in the data for this pair (use COUNTIFS), second column gives the first row in the data for this pair (=first row for previous pair+count of rows for previous pair). Then you can use a function like OFFSET to return only the rows of data for the Company-Generation pair and embed this inside the final function/array formula (AVERAGEIFS etc) You could extend this sort and count approach to include dates if you wanted. There is a drawback that if the list of cities and generations change you have to change the list of uniques and associated formulas. There are examples of this approach on my website athttp://www.decisionmodels.com/optspeedk.htmhttp://www.decisionmodels.com/optspeedj.htm

Resources