Grouping Excel Values with certain criteria - excel

I have these data, it's rainfall readings from a gauge. We want to calculate the amount of storms, which is defined as a rain pulse from another separated by more than 1 hour.
Basically, in the data example, anything that is yellow is one storm, the orange is a new storm, then the next yellow is another storm.
Any idea how to do this in excel?
Thanks!

It appears that this will work.
=COUNTIF(B2:B13,">="&"1:00:00")
You may need to change the last part ("1:00:00") depending upon what data type you have assigned to the "Time D..." column. Alternatively, change the display of this column to be whole minutes (e.g. 15, 30, 90, 930, etc.) then the criteria changes to ">="&60.

Related

Which is the best Data Mining model to extrapolate known values to missing values in a table? (General question)

I am working on a little data mining project (I am still a Data Science student, not a professional). Maybe you can help me to choose a proper model for my task.
So, let's say we have a table with three columns and around 4000 rows:
YEAR
COLOR
NAME
1900
Green
David
1901
Yellow
Sarah
1902
Green
???
1902
Red
Sarah
…
…
…
2020
Purple
John
Any value for any field can be repeated in the dataset (also Year values).
In the first two columns we don't have missing values, but we only have around 20% of Name values in the third column. Name value deppends somewhat on the first two columns (not a causal relation).
My goal is to extrapolate the available Name values to the whole table and get a range of occurrences for each name value (for example in a boxplot)
I have imagined a process like that, although I am not very sure if statitically it makes sense (any objections and suggestions are appreciated):
For every unknown NAME value, the algorythm choose randomly one of the already known NAME values. The odds of a particular NAME value to be chosen depend on the variables YEAR and COLOR. For instance, if 'David' values tend to be correlated with low Year values AND with 'Green' or 'Purple' values for Color, the algorythm give 'David' a higher probability to be chosen if input values for Year and Color are "1900, Purple".
When the above process ends, the number of occurrences for each name is counted.
The above process is applied 30 times and the results for each name are displayed in a plotbox.
However, I don't know which is the best model to implement an idea similar to this. I have drawn the process in a simple paint drawing:
Possible output for the task
Which do you think it could be a good approach to this task? I appreciate any help.
I think you have the process down, it's converting the data which may be the first hurdle.
I would look at using from sklearn.preprocessing import OrdinalEncoder to encode the data to convert from categorical to numeric.
You could then use a random number generator to produce a number within the range defined by the encoding which would randomly select a name.
Loop through this 30 times with an f loop to achieve the result.
It also looks like you will need to provide the ranking values for year and colour prior to building out your code. From here you would just provide bands, for example, if year > 1985, etc within your for loop to specify the names.

Excel - Plot time slots on a continuous time axis

Let's say we have time slots documented in which a production line was running. In between each product maufactured are time slots in which the machine was idling.
I now want to plot the machine status over time, basically as a boolean value (running vs idling).
I get the machine log and need the chart on the right.
The machining duration will ultimately be logged including seconds and may vary for each product.
The first - and probably biggest - challenge for me is to find a smart way to extract the status from the time stamps. My current first step ist to create a table row for each minute and use the if statement in H4 to check wether article 1 was being manufactured.
IF(AND([#Time]>Machine_log[#Start],[#Time]<Machine_log[#Finish]);;)
However, since the final list will range over 24 hours or more and the number of articles quickly reaches 50 and more, I would love to avoid using nested IFs on this one..
I'm thankfull for any input and open for inspiration :)
Thank you all in advance!
PS: Anyone know how a better way than a scatter chart with two values per X-Value to display the chart as vertical lines/right angles like this?
One option is to add only those points that are necessary to the Status extraction table (which I named "Status"). (I named the Machine log table "Log").
Note: it looks like you are using a semicolon list separator, so you'll need to change the commas in the formulas below to semicolons.
Formula for the Time column:
=IF(ROW()=ROW(Status),MIN(Log[Start])-1/144,IFERROR(INDEX(Log[[Start]:[Finish]],INT((ROW()-ROW(Status)-1)/4)+1,MOD(INT((ROW()-ROW(Status)-1)/2),2)+1),MAX(Log[Finish])+1/144))
Formula for the Production running? column (enter into H4 and fill down):
=IF(SUMPRODUCT(--(Log[[Start]:[Finish]]=[#Time])),IF([#Time]=G3,3-H3,H3),1)
These formulas will pad your plot with 10 minutes of off time on either side.
To answer your question about avoiding two points for each x-value: no, each point on the plot has to have a corresponding data pair.
UPDATE IN RESPONSE TO COMMENT: I failed to mention that the above solution assumes the time data in the Machine log table are in ascending order. This means that if your data span more than one day, they will need to contain a date component or you can get plots where the line crosses back to the beginning. For example, if you have 23:57:00 followed by 00:10:00 with no date component, Excel treats these as 11:57 pm on 1 January 1900 and 12:10 am on 1 January 1900. (To see this, change the format to "General", and you'll see the values that Excel uses to encode date-time aren't in ascending order.) The solution is to enter the dates as "8/16/2020 23:57:00" and "8/17/2020 00:10:00" in the formula bar. If you're copying over from another data source, the date needs to be copied with the time. If the dates and times are in separate columns, your Start and Finish columns would each be a date column plus a time column.

Distribution of time values randomly in a table Excel - Modeling Power Grid

I am working on a model of charging load of electric vehicle. I am attaching a link to an excel workbook for your better understanding.
Column B contains random time values
Column G to P represents houses and each house can have 1 car. So the each time values needs to be distributed in one column. Now when a car is plugged in, its load stays constant for 3 cells.
I want excel to randomly distribute these cars e.g. 4 cars to 4 houses and leave others blank.
what i can think of is, to assign each time a random house then use IF formula with AND function to match random times with time series and second condition to match random houses with columns 1-10.
the problem i am facing is, the formula gives a value error and only works in the rows with has random generated time in front of them screenshot. I know there is a very small thing that i am missing. please help me find it
Regards
workbook
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(AND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))>=$F6,INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))<=$F6+TIME(0,30,0)),11,""))
The two elements in the AND find the house number in column C and return the corresponding time in column B.
The first element compares the time in F to that time. The second element compares the time + 30 minutes to F (three cells). If it's between those two times, it gets an 11.
The ISNA makes sure that the house in question is on the list. You could also use an IFERROR, but I prefer the precision of ISNA.
Update
If you want the values to wrap around, you need to OR compare to the next day.
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5))),11,""))
That formula structure looks like
=If(isna(),"",if(or(and(today,today),and(tomorrow,tomorrow)),11,"")
This formulas already getting too big. If you triple it for your three voltages, it will be huge. You should consider writing a UDF in VBA. It won't be as quick to calculate, but will probably be more maintainable.
If you want to stick with a formula, you could put the wattage in row 4 above the house number. Then in another table, list the wattages and minutes to charge. So in, say, B12:C14 you have
3.7 120
11 30
22 15
Now where you have 11 in your formula, you'd have G$4 and the two placed you have TIME(0,30,0), you'd have TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0). I re-arranged some stuff to make it more 'readable' (but it's still pretty tough) and here's the final formula
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5))),G$4,""))

Spotfire: calculating difference between two rows in same column based on attributes in different columns

I am new to Spotfire and need help in getting the right expression for a calculated column.
My Data contains different subjects grouped in column ID. For every ID, Bodyweight was measured on different days. Days are given in column Day and stated as 1,2,3...
The last day is denoted by Last and Bodyweight measurements given in another column. Another column is present which is called Baseline. The Body weight measured is considered as baseline if the column contains a Y for that row.
I need to insert a calculated column, which will contain the difference between Body measurement measured on Day denoted Last and Body measurement marked by Y in column Baseline.
This should be done for every new ID. I am not able to figure this out. Could someone advise me on how to go about it?
Here is an example attached
So, the calculated column for Rita will give -4 (body weight at Last=56 and BodyWeight at baseline=56, so 52-56 =-4)
the sample data you provided is a little weird, particularly the [Day] column. if it's within your control, I suggest to use actual dates rather than a number/string here.
barring that, I was able to get your desired results, but it required two calculated columns: the first one will consolidate the [Day] and [Baseline] columns into a single column, and the second one contains your desired info.
column 1, which I called Day (int):
CASE
WHEN [Day]="Last" THEN 1000000
WHEN [Baseline]="Y" THEN -1000000
WHEN [Day]!="Last" THEN Integer([Day])
END
I picked a random high and low max to establish a chronological order. this will put 1000000 in place of "Last" (if you have any programs that are longer than one million days, you'll need to increase this number). the same for the [Baseline] column, but that value will be -1000000, which is presumably the lowest value you will ever see in this column. both of these are assumptions and may not work for your implementation. finally, in all other cases, the day number will be used.
column 2, which I called Diff:
Last([Weight]) OVER (Intersect([Name],LastNode([Day (int)]))) -
First([Weight]) OVER (Intersect([Name],FirstNode([Day (int)])))
the first line uses what's called an OVER expression to retrieve the first value for [Weight], ordered by [Day (int)], per [Name]. the second line gives the reverse of that, and so the difference is calculated as -4 (or whatever is appropriate).

IF function on excel with some cell operations

Im newbie on excel/spreadsheets and I want to do this:
I have a table with one column that contains some intervals with "Steeping times" like 7-15 days (I can change format).
And I have another column with "Recipe Created" (12/02/2017).
What I want is to check if "Current Date/Local date - Recipe created" is between the interval or if it's less or more.
Example:
Steeping time = 7-15
Current Date 18/07/2017 (spanish format) - Recipe created 10/07/2017 = 8
8 is between Steeping time? Yes.
If it is then paint a cell with yellow, if its less than 7 paint it red and if it is more than 15 paint it green.
Maybe its not an IF function but I would do this with IF on javascript
Thanks in advance.
In Google sheets at least, use Format, Conditional formatting. Set the default to be yellow. Add a rule that if the value is less than 7, red. Add a rule that if the value is greater than 15, green.
You can probably find lots of information on calculating date differences all over stack overflow. The date (only, no time) now is:
=date(year(now()),month(now()),day(now()))
To parse your date which was in A2 I did (and there may be easier or more robust ways)
=left(A2,2) for the day
=mid(A2,4,2)
for the month
and =right(A2,4)
for the year then made a date of those as above. Then I put the difference of the 2 dates in a cell and applied conditional formatting based on its contents.

Resources