making big data set smaller in excel

making big data set smaller in excel - excel

I made a little test machine that accidentally created a 'big' data set:
6 columns with +/- 550.000 rows.
The end result I am looking for is a graph with 6 lines, horizontal axis 1 - 550.000 measurements and vertically the values in the rows. (capped at 200 or so). Data is a resistance measurement that should be between 0 - 30 or very big (borken), the software writes 'inf' in these cases.
My skill is limited to excel, so what have I done until now:
Imported in Excel. The measurements are valuable between 0 - 30 and inf is not good for a graph, so I did: if(cell>200){200}else{keep cell value}.
Now making a graph is a timely exercise and excel does not like this, result is not good.
So I would like to take the average value of 60 measurements to reduce the rows to below 10.000. So =AVERAGE(H1:H60)
But I cannot get this to work.
Questions:
How do I reduce this data set and get a good graph.
Should I switch
to other software that is more applicable?
FYI: I already changed the software of the testing device to take the average value of a bunch of measurements the next time... But I cannot repeat this test.
Download link of data set comma separated file 17MB

I think you are on the right track, however my guess is that you only want to get an average every 60 rows and are unsure how to do this.
Using MOD(Number, Divisor) inside an if statement will let you specify that the average should be calculated only once in every x number of cells.
Assuming you'll have one row above your data table for headers, you are looking for something along the lines of:
=IF(MOD(ROW(A61),60) = 1,AVERAGE(H2:H61),"")
Once you have this you can filter your average column to non-blank values and use this to create your graph.

Related

Formula to determine if growth is increasing/decreasing, smooth, lumpy, etc

I have some results data in around 10 columns (sample in CSV below) that I would like to run a formula or formulas per row to determine if the trend is
reasonably steady/predictable, or lumpy (no straight lines expected)
generally trending upwards or downwards over the period
changing trend towards the end (most recent 3 months) or continuing it's trend
As the sample shows below, some rows do not have all the data, but still need a determination of the general trend and consistency of the results.
A graph would display these easily enough, but I have thousands of rows to compare, so is not efficient or feasible.
I've tried a few formulas such as trend, growth, stddev, avedev, but I suspect I might have to use them in combination, which is currently beyond me. I feel like using the percentage difference between neighboring cells will help standardise the results to a degree, rather than the value of each cell. If the percentages are all positive then the trend is upward, but that's the best I've been able to come up with that I'm happy gives a clear answer.
I'm using google sheets, but might be able to convert an Excel formula.
Any suggestions?
May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Jan,Feb
11.65,11.79,11.96,12.26,12.71,12.6,12.71,13.6,14.1,14.7
0.57,0.65,0.33,0.89,1.03,0.74,1.35,0.81,2.13,2.15
1.85,1.88,1.84,1.92,2.07,2.24,2.56,2.74,2.85,2.92
,,,0.66,0.72,0.78,1.33,1.43,1.47,1.52
,,,,0.64,0.6,0.56,0.55,0.3,
,8.97,8.54,10.46,11.44,8.06,7.42,7.86,7.66,7.1
2.67,1.53,1.84,2.43,2.94,3.43,4.04,7.46,6.25,9.09
Row 1 - Smooth growth
Row 2 - Lumpy growth
Row 3 - Growth
Row 4 - Growth
Row 5 - Declining
Row 6 - Lumpy. Past 3m decline.
Row 7 - Growth. Past 3m lumpy

Power BI: Custom Min/Max values into Visual (Line chart)

I am trying to do some reporting stuff (engineering field), and I am stuck on some "simple" thing, how to insert/add custom Min/Max values so that I can have on the certain visual (like on the picture) border values, min and max that I have defined!
I am receiving set of data, where Min/Max aren't included, and I have to calculate them (those border values are defined by us). I tried to do add column but when I go to "analytics" and try to put there Min/Max, I am getting doubled lines and its quite messy (on pic below)... Is there any easier and better way to calculate / add that? From additional table or else??
Min would be 60 °C
Max would be 100 °C , it is just temperature range and its limitation.
So on the pic below, I tried Add Column into the Data and also Measure directly (60 and 100 degrees) and got this below....the problem is, you cant edit these lines... the thing is I wanted them from Analytics part but there I cant see them because they re not "existing".

So, after little bit of playing around and some suggests of PBI pros, I will just write two ways of getting the same results, with minor change.
1) You can insert measure/column into your data set (in my case that was Min = 60 and Max = 100), simple as a day. Then on Format tab > Customize series where you can adjsut your Line: width, type, style...
2) Without any calculation: On Analytics tab > add 2x Constant Lines, put values that you want and thats it.
The only difference here is Data labeling, on 2nd option you can see your limiter value on one of the sides...and using 1st option, your limiter value is through whole line, and you cant remove it (for a whole line). But depends what and how you wanna see your data.
Maybe there are some other ways but this one was fine for me, didnt explore fully PBI yet.

Excel - Evaluate multiple cells in a row and create report or display showing lowest to highest

In an Excel 2003 spreadsheet, I have the top row of cells calculating the number of days and hours I have worked on something based on data I put in the cells below for each category. For example I enter the time spent on Programming, Spoken languages, house, piano, guitar...etc. The top cell in each category will keep track of and display how many days and hours I spent as I add the time spent for each category each day. I want to evaluate this top row and then list in a "report" (like a pop up box or another tab or something) in order from least amount of time to the most amount of time. This is so I can see at a glance which category is falling behind and what I need to work on. Can this be done in Excel? VBA? Or do I have to write a program from scratch in C# or Java? Thanks!
VH
Unbelievable... I've been scolded for trying to understand an answer and requested to mark this question answered. I don't see anything to do this and could not find anything that tells you how, so I'm just writing it here. MY QUESTION WAS ANSWERED... But thanks anyway...

Consider the following screenshot:
The chart data is built with formulas in columns H3:I3 and below. The formulas are
H3 =INDEX($B$3:$F$3,MATCH(SMALL($B$2:$F$2,ROW(A1)),$B$2:$F$2,0))
I3 =INDEX($B$2:$F$2,MATCH(SMALL($B$2:$F$2,ROW(A1)),$B$2:$F$2,0))
Copy down and build a horizontal bar chart from the data. If you want to change the order of the source data, use LARGE() instead of SMALL().
Alternative Approach
Instead of recording your data in a matrix, consider recording in a flat table with columns for date, category and time spent. That data can then easily be evaluated in many possible ways without using any formulas at all. The screenshot below shows a pivot table and chart where the data is sorted by time spent.
Edit after inspecting file:
Swap rows 2 and 3. Then you can choose one of the approaches outlined above.
Consider entering the study time as time values. It is not immediately clear if your entry 2.23 means 2 hrs and 23 minutes, or 2 hrs plus 0.23 of an hour, which totals to 2hrs, 13 minutes.
If you are using the first method, then all your sums involving decimals are off. For example, the total for column B is 7.73 as you sum it. Is that meant to be 7 hrs and 73 minutes? That would really be 8 hrs and 13 minutes, no? Or is it meant to be 7 hrs and 43 minutes? You can see how this is confusing. Use the colon to separate hrs and minutes and - hey - you can see human readable time values and don't have to convert minute values into decimals.

How to generate random numbers within a normal distribution using Excel

I want to use the RAND() function in Excel to generate a random number between 0 and 1.
However, I would like 80% of the values to fall between 0 and 0.2, 90% of the values to fall between 0 and 0.3, 95% of the values to fall between 0 and 0.5, etc.
This reminds me that I took an applied statistics course once upon a time, but not of what was actually in the course...
How is the best way to go about achieving this result using an Excel formula. Alternatively, what is this kind of statistical calculation called / any other pointers that I can Google around for.
=================
Use case:
I have a single column of meter readings, which I would like to duplicate 7 times (each column for a new month). each column has 55 000 rows. While the meter readings need to vary for each month, when taken as a time series, each meter number should have 7 realistic readings.
The aim is to produce realistic data to turn into heat maps (i.e. flag outlying meter readings)

I don't think that there is a formula which would fit exactly to your requirements. I would use a very straightforward solution:
Generate 80% of data using =RANDBETWEEN(0,20)/100
Generate 10% of data using =RANDBETWEEN(20,30)/100
Generate 5% of data using =RANDBETWEEN(30,50)/100
and so on
You can easily change the precision of generated data by modifying the parameters, for example: =RANDBETWEEN(0,2000)/10000 will generate data with up to 4 digits after decimal point.
UPDATE
Use a normal distribution for the use case, for example:
=NORMINV(RAND(), 20, 5)
where 20 is a mean value and 5 is a standard deviation.

Inserting dummy lines for no observation

I have an Excel file for which every line is an observation of a range of species. Observations are made by 315 different cameras (sample points), each of which was set to collect data for a range of 5-38 days (I have the number of survey days for each of the points recorded).
I need to get an average number of observations by effort:
(number of cameras set * number of days set)
I can get this average number easily, but in order to run an ANOVA on the average number of observations by effort for each species observed I need all of the values, including the days where no species was observed.
I tried PivotTables to get the number of observations for each species by camera and survey day. The problem is, on days when no species was observed, there is no entry for the day. I thought of fixing this by adding dummy lines with 0s for all of these days, but with 315 points this will take a lot of time and have a high chance of error.
Any ideas of a better way to do this?

If you right click the PivotTable, and then click "PivotTable options", there is a box that says "For empty cells show:" and you can select to put 0 in to the cells with no value recorded.
(I think this is what you're asking. I may be totally off base, though.)

Go to the pivot field in the field list> columns(days for you)> click on down arrow on your field> Field_Settings> Layout Tab> Click on Show fields with no Data

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string