Split range of times into 5 mins intervals in Excel - excel

I'm new to programming and data analysis in general, and need some help with a large dataset file (43 GB). It is a list of High Frequency trades fro a stock containing two columns I'm intrested in: Time (in UTC format including milliseconds, e.g. 2019-01-01T00:06:41.033529796Z) and price. I have managed to open the file using delimiter software and split it into 509 files which would fit in an excel sheet.
I now need to compare the price change during 5 minute intervals based on the prices in this file.
My first problem is that Excel doesnt the approriate time format for interpretation.
Secondly, I need to understand perhaps using the =FLOOR formula, to split the lsit of trade times into 5 mins intervals and find the difference in corespongin prices.
I have tried making excel recognise the UTC format with no success. Any help would be appreciated!

Related

Interpolating a huge set of data

So I have a very large set of data (4 million rows+) with journey times between two location nodes for two separate years (2015 and 2024). These are stored in dat files in a format of:
Node A
Node B
Journey Time (s)
123
124
51.4
So I have one long file of over 4 million rows for each year. I need to interpolate journey times for a year between the two for which I have data. I've tried Power Query in Excel as well as Power BI Desktop but have had no reasonable solution beyond cutting the files into smaller < 1 million row pieces so that Excel can manage.
Any ideas?
What type of output are you looking for? PowerBI can easily handle this amount of data, but it depends what you expect your result to be. If you're looking for the average % change in node to node travel time between the two years, then PowerBI could be utilised as it is great at aggregating and comparing large datasets.
However, if you are wanting an output of every single node to node delta between those two years i.e. 4M row output, then PowerBI will calculate this, but then what do you do with it.... a 4M long table?
If you're looking to have an exported result >150K rows (PowerBI limit) or >1M rows (Excel limit), then I would use Python for that (as mentioned above)

Combine (hour, minute, am/pm) columns for two times, then calculate minutes elapsed

I have Time In and Time Out that need to be input on a google sheet. It has to be tracked down to the minute so I have a column for hour, minute, and am/pm. My goal is to have the amount of minutes elapsed between the time in and time out in a row.
I have not found a way to combine all three columns into a time, especially with the am/pm column in the mix. Then subsequently do a formula to find minutes elapsed. I am not well versed in spreadsheet formulas so if there is an easier way of achieving my goal please let me know.
A screenshot is attached of the google sheet columns. Thank you to anyone that can help.
Screenshot of columns :
TIME(HOUR,MINUTES,SECONDS)
That is one formula that you can use to convert integers into time in excel. I do not know if it will work in Google Sheets. I will continue with the excel solution with the assumption the formulas are the same in google sheets or there is an equivalent.
Assuming your data is layed out as per the picture below, you could use the following formulas to convert your time to an actual time that the spreadsheet can use. There are other solutions as well.
=TIME(A1+IF(AND(C1="PM",A1<12),12,0),B1,0)
That will convert your separated times into a spread sheet time. Do the same thing for the OUT Time as below:
=TIME(D1+IF(AND(F1="PM",D1<12),12,0),E1,0)+IF(TIME(D1+IF(AND(F1="PM",D1<12),12,0),E1,0)<TIME(A1+IF(AND(C1="PM",A1<12),12,0),B1,0),1,0)
That whole other part about checking the time and adding 1 or 0 is that if the out time is less, it is assuming the time is the next day. Days are represented by integers, and time is represented by the decimal value.
Now that you have a method for determine both times, subtract the larger time from the smaller time with the formula below in a single cell:
=(TIME(D1+IF(AND(F1="PM",D1<12),12,0),E1,0)+IF(TIME(D1+IF(AND(F1="PM",D1<12),12,0),E1,0)<TIME(A1+IF(AND(C1="PM",A1<12),12,0),B1,0),1,0))-(TIME(A1+IF(AND(C1="PM",A1<12),12,0),B1,0))
ALTERNATE METHOD
Convert everything to minutes, take the difference.
The first time converted to minutes will be:
=(A1+IF(AND(C1="PM",A1<12),12,0))*60+B1
The Second time converted to minutes will be:
=(D1+IF(AND(F1="PM",D1<12),12,0))*60+E1+IF(((D1+IF(AND(F1="PM",D1<12),12,0))*60+E1)<((A1+IF(AND(C1="PM",A1<12),12,0))*60+B1),24*60,0)
Now you just need to take the difference between the minutes in a single cell:
=((D1+IF(AND(F1="PM",D1<12),12,0))*60+E1+IF(((D1+IF(AND(F1="PM",D1<12),12,0))*60+E1)<((A1+IF(AND(C1="PM",A1<12),12,0))*60+B1),24*60,0))-((A1+IF(AND(C1="PM",A1<12),12,0))*60+B1)

Converting Hourly Data to Daily Data for many different Excel files

I have been downloading and organizing hourly water quality data into Excel for many different states, and have organized them by year. I have done data prep for them to make sure there are no zeros/every day of the year (DOY) has 24 values, but the time series plots were too noisy so want me to get the daily average values instead.
All of the sites annual data is different in terms of how many days are available, and sometimes they are missing whole months due to no recordings.
So my question is, how can I develop a code to give me the average daily value linked to a specific DOY that I can apply to many different Excel sheets. The data appears like this:
And the files are saved like like CA1_2012 (California Site 1 hourly data from 2012)
I know there is a lot on this topic but I have been trying everything and I can't get a code that works!
You can get the summation of the second column based on values in the first column in matlab using accumarray;
[m,~,n] = unique(data(:,1));
sumdata = [m, accumarray(n,data(:,2))];
for mean I would suggest grpstats:
avgdata = grpstats(data, DOY, {'mean'});
or as #gnovice suggested:
avgdata = accumarray(DOY, data, [], #mean);
You can also get what you want by using Pivot Table in excel and group data by DOY and get the mean value for them in the table. (No coding required).

Excel - Evaluate multiple cells in a row and create report or display showing lowest to highest

In an Excel 2003 spreadsheet, I have the top row of cells calculating the number of days and hours I have worked on something based on data I put in the cells below for each category. For example I enter the time spent on Programming, Spoken languages, house, piano, guitar...etc. The top cell in each category will keep track of and display how many days and hours I spent as I add the time spent for each category each day. I want to evaluate this top row and then list in a "report" (like a pop up box or another tab or something) in order from least amount of time to the most amount of time. This is so I can see at a glance which category is falling behind and what I need to work on. Can this be done in Excel? VBA? Or do I have to write a program from scratch in C# or Java? Thanks!
VH
Unbelievable... I've been scolded for trying to understand an answer and requested to mark this question answered. I don't see anything to do this and could not find anything that tells you how, so I'm just writing it here. MY QUESTION WAS ANSWERED... But thanks anyway...
Consider the following screenshot:
The chart data is built with formulas in columns H3:I3 and below. The formulas are
H3 =INDEX($B$3:$F$3,MATCH(SMALL($B$2:$F$2,ROW(A1)),$B$2:$F$2,0))
I3 =INDEX($B$2:$F$2,MATCH(SMALL($B$2:$F$2,ROW(A1)),$B$2:$F$2,0))
Copy down and build a horizontal bar chart from the data. If you want to change the order of the source data, use LARGE() instead of SMALL().
Alternative Approach
Instead of recording your data in a matrix, consider recording in a flat table with columns for date, category and time spent. That data can then easily be evaluated in many possible ways without using any formulas at all. The screenshot below shows a pivot table and chart where the data is sorted by time spent.
Edit after inspecting file:
Swap rows 2 and 3. Then you can choose one of the approaches outlined above.
Consider entering the study time as time values. It is not immediately clear if your entry 2.23 means 2 hrs and 23 minutes, or 2 hrs plus 0.23 of an hour, which totals to 2hrs, 13 minutes.
If you are using the first method, then all your sums involving decimals are off. For example, the total for column B is 7.73 as you sum it. Is that meant to be 7 hrs and 73 minutes? That would really be 8 hrs and 13 minutes, no? Or is it meant to be 7 hrs and 43 minutes? You can see how this is confusing. Use the colon to separate hrs and minutes and - hey - you can see human readable time values and don't have to convert minute values into decimals.

Attendance Calculations / Period Calendar

This is a multi-tiered project. Let me give a quick overview. I have attendance data, card/ timestamp punches. I would like to have a pivot table with slicers in Excel. Ideally you'd be able to choose a department / last name / associate number. And also a period of time. Ideally this would be a table with the company period/week. And maybe default to last weeks.
I can get at timecard data in two ways:
(1) generate a CSV that automatically performs the timecard math, to figure out how many hours someone worked and it is smart enough to understand 3rd shift workers. The format of that CSV is:
Last Name, First Name, Personnel Type, Associate Number, Facility, Department, TimeIn, TimeOut, Total Hours
The problem with this method is that I would have to manually append the information to the CSV tables. Or come up with some autoIT script.
(2) Get at the raw data via sql/odbc. This way the math is not done. It is just all of the associates timestamps. I would have to figure up the daily hours myself and figure out a 3rd shift formula too. It is not a set schedule, many people swing shifts and others get called in a lot.
Lastly, I would like to be able to filter the dates by using our company fiscal calendar. I have a spreadsheet that goes from 2000 to 2093. With everyday listed and it's corresponding year/period/week.
Example period info spreadsheet:
date Year Period week WeekTotal Period Total
12/3/2007 2008 1 1 2008.1.1 2008.1
12/4/2007 2008 1 1 2008.1.1 2008.1
I know there is a lot going on here, but what would be the best way to approach this project?
First I have not been able to post any script however the last I tried it I used two options 1. Was a php conversion where the time was numbers ( which makes it easier for calculations)
2. Was in the tables where I deliberately entered the values places the time in different columns or fields for hours, mins, and seconds this meant that while the input is eased I still have to calculate the output in php especially for totals, averages and differences.
Hope it helps a bit

Resources