I have been downloading and organizing hourly water quality data into Excel for many different states, and have organized them by year. I have done data prep for them to make sure there are no zeros/every day of the year (DOY) has 24 values, but the time series plots were too noisy so want me to get the daily average values instead.
All of the sites annual data is different in terms of how many days are available, and sometimes they are missing whole months due to no recordings.
So my question is, how can I develop a code to give me the average daily value linked to a specific DOY that I can apply to many different Excel sheets. The data appears like this:
And the files are saved like like CA1_2012 (California Site 1 hourly data from 2012)
I know there is a lot on this topic but I have been trying everything and I can't get a code that works!
You can get the summation of the second column based on values in the first column in matlab using accumarray;
[m,~,n] = unique(data(:,1));
sumdata = [m, accumarray(n,data(:,2))];
for mean I would suggest grpstats:
avgdata = grpstats(data, DOY, {'mean'});
or as #gnovice suggested:
avgdata = accumarray(DOY, data, [], #mean);
You can also get what you want by using Pivot Table in excel and group data by DOY and get the mean value for them in the table. (No coding required).
Related
I'm new to Excel so I don't know if it's even possible to do what I'm trying to.
I wanna build a graph with 3 columns of direct data and the last one should be an average of older data and every week I should add a new column with a newest one and that should update the graph becoming the first column, sending the previous one to the second one and so on until the previous 3rd one becomes part of the average column. I'm uploading some images to make things clearer.
Sorry about the paint images =/
If not clear, the average should be the mean values of all older columns and my graph should have always 4 columns no matter the amount of older ones.
I am open to other implementation methods that could give me a similar solution.
I created a tracker of learning hours in order to conduct an experiment.
For the purpose of the ticket, there is a time series that does not sum correctly data in Pivot table.
Some other tickets on Stack Overflow mention how to correctly sum. Here's what I tried:
considered blank spaces > the Pivot does not consider them anymore, yet the issue persist.
data type & format > the whole column is set for data type short date
labeling > I use to copy-paste the category for each learning and I double checked if there was no typo in them.
Here's how the file works. There are 3 sheet in my Excel file. The core of it is "Data" where I do track the time spent for each learning / exercise. The columns marked in red are the ones used for the next Pivot Table.
These information are consolidated in another sheet, called SuperLearner Review. I use this one to display overall learning hours by type and category. Numeric outputs here are looked up from Data (or calculated accordingly).
After several checks, I cannot retrieve the issue. All I know is that for February data are not tracked correctly.
Originally I attributed either to wrong data labeling or format, but I can manually see what is the real sum of learning done in February:
In the data-series those hours seem not to display. This is what the Pivot table returns instead:
Whereas, for the consolidated information (coming from the same source) does not seem there are problems in calculation:
When I tried to cancel and build up again the same Pivot Table, the same error occur. So I am not getting out of this wrong calculation. Therefore it must not be a problem of calculation, rather than Excel does not retrieve some data entries at one point.
What would you recommend? Thanks for the help.
I was trying to plot some reports for Covid-19 cases around the Globe, using Excel and Power BI. With Power BI is easier and fancier to do definitely, but I need an Excel file or calculation that makes sense - similar to the PBI. What I actually wanted is to calculate the daily increase in new cases (with %) and also death rate but per day, or total death by day and so on..
I did some calculations (% of column total and I calculated one field to get death rate%) here using Pivot tables but not sure how to do daily increase/decrease? Did anyone get an idea for additional calculations?
This is copied from PBI (calculations) which I wanna have similar in Excel - but I am not sure If I can calculate it properly (last 2 pictures).
The data source from the input data is here:
https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx
You need an extra column for the result you want (e.g. daily increase/decrease), then you can plot either the waterfall chart, or using techniques similar to
https://www.extendoffice.com/documents/excel/5945-excel-chart-display-percentage-change.html
I am averaging a number, grouped by week inside of Google Data Studio, and i am averaging the same numbers grouped by week inside of Big query however the output is slightly different.
Overall Score
AVG(table.score) OVER (PARTITION BY Weeknum) as OverallScore
The datasource is a list of scores, along with a date. I am averaging this inside DS using the aggregate function within the metric, and using the Time dimension ISO Year Week.
The purpose of this is to have one set of numbers hard coded, whilst the other line is used to filter to different departments, keeping the original "overall" score present to be used as a benchmark.
Exporting my table into excel, i can average it filtered by week 3 (See below) and i it returns 19.59 as well. Meaning, the avg aggregate function inside Datastudio is the same as excel. Also, i can query the table using the below, which rules out an averaging difference inside bigquery. However when i place overall score into the graph below i get slightly different numbers for the overall score..
SELECT avg(overallscore) FROM `dbo.table` where weeknum = '2018 3'
Output = 19.59
Does anyone have an idea what may be causing this?
When you open the report, you should be able to see the query it runs in your query history in Big Query. Check that it's using the same formula as sometimes it uses approximate aggregates.
I have some data that has multiple columns for simplicity lets call them "start" "End" and "duration". Duration comes pre calculated.
However I would like to show my data in daily or at least weekly figures. The data could be anything spanning several hours to several weeks or months. What would be the best way to get this data separated in this form?
I currently have extra columns calculating the part-duration for the new month but obviously with days or week it becomes quite cumbersome