Scripting Language that can count number of files and do simple plotting - excel

I am looking for a programming language that can do the following:
Count the number of files (User will specify the file type and date range) in a directory.
Windows-based
Can do simple GUI that will prompt the user with input values
Can do simple plotting of bar and line graphs
Can take input values from an Excel file.
Can do simple means test like t-test or ANOVA.
The purpose is this. I need to automize the weekly sample per consumable ratio data. Every picture file corresponds to one sample tested. The consummable data is logged in an excel file. So the program I want to write will "read" the number of samples tested by counting the picture files. And it will "read" the consummable used by reading the Excel file.
Input is as follows:
1. Date range where you want to see the sample tested per consummable ratio
2. Machine number and consummable type to know which part of Excel file to extract
3. Tick bar as option if we want to do stat test or not.
Output:
1. Bar or line graph (sample tested per consummable ratio vs time)
2. T-test results if we want to do a comparison of two time periods.
Many thanks in advance.

Related

Restructuring Excel Spreadsheet Data

I have an Excel spreadsheet with membership data, which includes each member’s coverage start date and coverage end date (if applicable) among many other data fields. I’m trying to restructure the data so that each month that the member’s coverage was effective is represented by a single row. For example, if a member has a coverage start date of 1/1/2019 and a coverage end date of 12/31/2019, the member’s information would be displayed 12 times. If the member has no coverage end date listed, then the number of rows listed for that member will be based on the number of months between the coverage start date and a date defined by the user.
To further complicate the issue, there are instances within the data that a member is displayed more than once due to changes in various coverage options so the total number of rows for a particular member will need to represent all months in which their coverage was in effect.
Any ideas/suggestions as to how to accomplish all of this?
The answer depends on your needs moving forwards. Changing the data to a per-month format and maintaining it that way is different than continuing to maintain the original data, and also needing to convert it to this new format. Consider that carefully when you choose how to do this. You might make things more difficult for yourself moving forwards.
In order to convert the data you currently have (which I assume is one line per member) to the data you want (one line per month per member) I would write a VBA script which would loop through the entire first sheet and do the steps to get it in the format I want. If you are not VBA saavy, you can do this with filters. The steps for manual filtering would be:
Find the earliest month you have (Let's say for this example that is January 2000), and filter your data for that month only.
Copy the filtered lines only to a new sheet. This is your January 2000 data.
At this point, you can either modify the lines you just copied so they indicate they are January 2000, or name the page January 2000, and keep each month on its own page.
Do the same for the next month. Continue until you have all of your data sorted by month.
At this point, you can modify and recombine the data, or leave it in separate sheets.
For a VBA solution, I would go line by line through the original data, and copy the line to each location in belongs in based on start/end dates. If separate sheets, I would pre-create all the sheets I need.
Good luck.

Two different numbers after an average inside Google Big Query and Data Studio

I am averaging a number, grouped by week inside of Google Data Studio, and i am averaging the same numbers grouped by week inside of Big query however the output is slightly different.
Overall Score
AVG(table.score) OVER (PARTITION BY Weeknum) as OverallScore
The datasource is a list of scores, along with a date. I am averaging this inside DS using the aggregate function within the metric, and using the Time dimension ISO Year Week.
The purpose of this is to have one set of numbers hard coded, whilst the other line is used to filter to different departments, keeping the original "overall" score present to be used as a benchmark.
Exporting my table into excel, i can average it filtered by week 3 (See below) and i it returns 19.59 as well. Meaning, the avg aggregate function inside Datastudio is the same as excel. Also, i can query the table using the below, which rules out an averaging difference inside bigquery. However when i place overall score into the graph below i get slightly different numbers for the overall score..
SELECT avg(overallscore) FROM `dbo.table` where weeknum = '2018 3'
Output = 19.59
Does anyone have an idea what may be causing this?
When you open the report, you should be able to see the query it runs in your query history in Big Query. Check that it's using the same formula as sometimes it uses approximate aggregates.

Excel - Average a group of cells where the size of the group isn't fixed

I have several groups of cells, each with representing a batch of goods produced to a certain 'Cutting Pattern', which is just a number of circles cut out of a piece of wood of a certain size. I want to average each Cutting Patterns total production. I don't know how to get Excel to stop at the end of a batch and to recognise two different batches of the same template.
This image shows the layout of the patterns that are produced and how their figures are laid out:
It shows one continuous run so this would all be one batch, if the template were to change to a different size then that would be a new batch, if the template were to then change back that would also be a new batch.
This image shows the data I am working with to help generate the report.
As you can see each sheet has it's own ID in the AU column with the smaller patterns in the AT column.
How can I get an average batch size for each pattern without having to do it all manually?

Converting Hourly Data to Daily Data for many different Excel files

I have been downloading and organizing hourly water quality data into Excel for many different states, and have organized them by year. I have done data prep for them to make sure there are no zeros/every day of the year (DOY) has 24 values, but the time series plots were too noisy so want me to get the daily average values instead.
All of the sites annual data is different in terms of how many days are available, and sometimes they are missing whole months due to no recordings.
So my question is, how can I develop a code to give me the average daily value linked to a specific DOY that I can apply to many different Excel sheets. The data appears like this:
And the files are saved like like CA1_2012 (California Site 1 hourly data from 2012)
I know there is a lot on this topic but I have been trying everything and I can't get a code that works!
You can get the summation of the second column based on values in the first column in matlab using accumarray;
[m,~,n] = unique(data(:,1));
sumdata = [m, accumarray(n,data(:,2))];
for mean I would suggest grpstats:
avgdata = grpstats(data, DOY, {'mean'});
or as #gnovice suggested:
avgdata = accumarray(DOY, data, [], #mean);
You can also get what you want by using Pivot Table in excel and group data by DOY and get the mean value for them in the table. (No coding required).

Excel time series data plot

I am trying to plot some time series data, but in a way that has stumped me so far. The salient part here is that each data point is associated with an open date and a closed date. I would like a time series line graph that counts the number open on a given date.
Example: Open - Close
first record: 2/10/2013 - 3/1/2013
second record: 2/15/2013 - 3/5/2013
The graph I'm looking for would start at 0, rise to 1 on 2/10 rise again to 2 on 2/15 then drop down 1 on 3/1 and back to 0 on 3/5.
The actual dataset contains hundreds of records, so manual processing is out of the question. I'm sure there must be an easy way to do it, but I have not found it yet. Tried help and google search, but I'm not exactly sure what I'm looking for.
Use the CountIfs() function like so:
So, you specify the category labels, and then use the COUNTIFS() function to evaluate, for each category label, how many records are open at that time.
You can use the result of the Countifs function as the frequency for a histogram, time series, bar chart, etc.
Then, plot the data in columns E & F (or however your sheet happens to be arranged) to create the chart.
Edit
To include blank values in the count, modify the formula thusly:
=COUNTIFS($B$3:$B$7,"<="&E3,$C$3:$C$7,">="&E3)+COUNTIFS($B$3:$B$7,"<="&E3,$C$3:$C$7,"")

Resources