I have a Node.js backend and MSSQL as database. I do some data processing and store logs in a database table which shows which entity is a what stage in the progress, e.g. (there are three stages each entity has to pass)
+----+-----------+-------+------------------------+
| id | entity_id | stage | timestamp |
+----+-----------+-------+------------------------+
| 1 | 1 | 1 | 2019-01-01 12:12:01 |
| 2 | 1 | 2 | 2019-01-01 12:12:10 |
| 3 | 1 | 3 | 2019-01-01 12:12:15 |
| 4 | 2 | 1 | 2019-01-01 12:14:01 |
| 5 | 2 | 2 | 2019-01-01 12:14:10 <--|
| 6 | 3 | 1 | 2019-01-01 12:24:01 |
+----+-----------+-------+------------------------+
As you can see in line with the arrow, the entity no. 2 did not go to stage 3. After a certain amount of time (maybe 120 seconds), these entity should be considered as faulty and this will be reported somehow.
What would be a good approach in Node.js to check the table for those outtimed entities? Some kind of cron job which checks all lines every x seconds? That sounds rather clumsy to me.
I am looking forward to our ideas!
Related
I am looking for a start in the right direction and I hope someone on this forum has run into this issue. I have an excel table with 24K jobs on it and a technician assigned to every job. These technicians have 40 weeks to complete all the jobs assigned. I have a helper table with each technician’s id and how many jobs per week then need to complete all the work. I have sorted the jobs by geographic area for efficiency. I need a formula that will look at the Technician id and if they are receiving 3 jobs per week that it will number the first 3 with a 1, and the next 3 with a 2 and so on. And when it switches Technician it would reset the counter.
In the example below Tech 1 is assigned 3 jobs per week, and Tech 2 has 2 jobs per week.
| JobID | Tech | Grouping |
|-------|--------|----------|
| BK025 | Tech 1 | 1 |
| CD044 | Tech 1 | 1 |
| DE024 | Tech 1 | 1 |
| DE031 | Tech 1 | 2 |
| DE035 | Tech 1 | 2 |
| FT083 | Tech 1 | 2 |
| IR004 | Tech 2 | 1 |
| IR006 | Tech 2 | 1 |
| IR052 | Tech 2 | 2 |
| IR061 | Tech 2 | 2 |
| IR062 | Tech 2 | 3 |
| IR072 | Tech 2 | 3 |
I have been searching SO and Google looking for an answer but may not be using the correct key words.I have found this formula =ROUNDUP((ROW()-offset)/repeat,0) -- Found on exceljet -- which will work, but in order to get it to work properly I would have to filter to each tech individually.
Assuming your helper table is something like in the screenshot below, you could use an approach like the following:
=ROUNDUP(COUNTIF(B$2:B2,B2)/VLOOKUP(B2,$E$1:$F$3,2,0),0)
I've created Pivot Tables before using VBA but my professor recently gave us a bonus that although is not necessary, is driving me nuts.
Use a VBA Macro to write Region, District, and Store Name to your first report to create a new report
1) My first report looks like this:
Location | Sum of ActNetSales | Sum of PlanNetSales
----------|--------------------|---------------------
1 | $76,170 | $65,172
100 | $163,691 | $140,057
101 | $34,724 | $29,710
104 | $70,501 | $60,322
106 | $113,826 | $97,391
2) Below is the data source for the above report.
Division | Year | Week | Location | SchedDept | PlanNetSales | ActNetSales | AreaCategory
----------|------|------|----------|-----------|--------------|-------------|--------------
5 | 2018 | 10 | 520 | 541 | 1943.2 | 2271.115 | Non-Comm
5 | 2018 | 10 | 520 | 608 | 4378.4 | 5117.255 | Non-Comm
5 | 2018 | 10 | 520 | 1059 | 1044.8 | 1221.11 | Comm
5 | 2018 | 10 | 520 | 1126 | 6308 | 7372.475 | Non-Comm
3) My professor wants me to add the following information to the above table: Region, District and Store Name. However, these 3 fields are from a different data source then the above report. Below is the data source for the 3 fields I've listed.
Division | Location | LocationName | Region | RegionName | District | DistrictName
----------|----------|--------------|--------|------------|----------|--------------
5 | 1 | Location 1 | 3 | Region 3 | 18 | District 18
5 | 4 | Location 4 | 5 | Region 5 | 32 | District 32
5 | 5 | Location 5 | 3 | Region 3 | 19 | District 19
5 | 6 | Location 6 | 5 | Region 5 | 28 | District 28
I've created what he's asking above by joining the 2 tables (created a key by concatenating the foreign keys - location and division: to make a unique key and using a basic index/match ) and just creating a Pivot Table from that but I want to try my best to solve the bonus! Unfortunately, I don't have Power Query so I had to do it this way. I've tried searching up the above and I can't find any good resources. Is there anything you can suggest or just point me in the right direction? Thank you!
Is it cheating to modify your table under (2) to add the columns region, district, and storename using VLOOKUP on the third table? The second table would then have raw data, and extra columns of constructed data, effectively joining it to the third table using the Excel VLOOKUP trick rather than an actual SQL table join.
Then you can just use the expanded, joined table as your one Pivot Table source.
Cheating is legal in love, war, and IT solutions.
/* My question is language-agnostic I think, but I'm using PySpark if it matters. */
Situation
I currently have two Spark DataFrames:
One with per-minute data (1440 rows per person and day) of a person's heart rate per minute:
| Person | date | time | heartrate |
|--------+------------+-------+-----------|
| 1 | 2018-01-01 | 00:00 | 70 |
| 1 | 2018-01-01 | 00:01 | 72 |
| ... | ... | ... | ... |
| 4 | 2018-10-03 | 11:32 | 123 |
| ... | ... | ... | ... |
And another DataFrame with daily data (1 row per person and day), of daily metadata, including the results of a clustering of days, i.e. which cluster day X of person Y fell into:
| Person | date | cluster | max_heartrate |
|--------+------------+---------+----------------|
| 1 | 2018-01-01 | 1 | 180 |
| 1 | 2018-01-02 | 4 | 166 |
| ... | ... | ... | ... |
| 4 | 2018-10-03 | 1 | 147 |
| ... | ... | ... | ... |
(Note that clustering is done separately per person, so cluster 1 for person 1 has nothing to do with person 2's cluster 1.)
Goal
I now want to compute, say, the mean heart rate per cluster and per person, that is, each person gets different means. If I have three clusters, I am looking for this DF:
| Person | cluster | mean_heartrate |
|--------+---------+----------------|
| 1 | 1 | 123 |
| 1 | 2 | 89 |
| 1 | 3 | 81 |
| 2 | 1 | 80 |
| ... | ... | ... |
How do I best do this? Conceptually, I want to group these two DataFrames per person and send two DF chunks into an apply function. In there (i.e. per person), I'd group and aggregate the daily DF per day, then join the daily DF's cluster IDs, then compute the per-cluster mean values.
But grouping/applying multiple DFs doesn't work, right?
Ideas
I have two ideas and am not sure which, if any, make sense:
Join the daily DF to the per-minute DF before grouping, which would result in highly redundant data (i.e. the cluster ID replicated for each minute). In my "real" application, I will probably have per-person data too (e.g. height/weight), which would be a completely constant column then, i.e. even more memory wasted. Maybe that's the only/best/accepted way to do it?
Before applying, transform the DF into a DF that can hold complex structures, e.g. like
.
| Person | dataframe | key | column | value |
|--------+------------+------------------+-----------+-------|
| 1 | heartrates | 2018-01-01 00:00 | heartrate | 70 |
| 1 | heartrates | 2018-01-01 00:01 | heartrate | 72 |
| ... | ... | ... | ... | ... |
| 1 | clusters | 2018-01-01 | cluster | 1 |
| ... | ... | ... | ... | ... |
or maybe even
| Person | JSON |
|--------+--------|
| 1 | { ...} |
| 2 | { ...} |
| ... | ... |
What's the best practice here?
But grouping/applying multiple DFs doesn't work, right?
No, AFAIK this does not work not in pyspark nor pandas.
Join the daily DF to the per-minute DF before grouping...
This is the way to go in my opinion. You don't need to merge all redundant columns but only those requrired for your groupby-operation. There is no way to avoid redundancy for your groupby-columns as they will be needed for the groupby-operation.
In pandas, it is possible to provide an extra groupby-column as a pandas Series specifically but it requires to have the exact same shape as the to be grouped dataframe. However, in order create the groupby-column, you will need a merge anyway.
Before applying, transform the DF into a DF that can hold complex structures
Performance and memory wise, I would not go with this solution unless you have multiple required groupby operations which will benefit from more complex data structures. In fact, you will need to put in some effort to actually create the data structure in the first place.
Apologies for the bad title, I'm struggling to describe exactly what I;m trying to do (which has also made it difficult to search for an existing answer.
I have a series of date with the columns "Asset", "Time", and a "State" that is a calculated column that changes based on several other values. The data is source from a constant, and non-regular stream of data (though in the table below I have created a sample of timestamps that are regular).
The below table shows the source data ("Asset", "Time" and "State"), as well as an intended calculated column "Event" that tracks each time an event starts. I do not simply want to count the number of times an "Asset" has a Bad "State", but identify each time an "Asset" changes from a "State" of Good to a state of Bad (note in the real data there is a large number of different states, but the pattern I'm trying to identify is consistent).
+-------+----------+-------+-------+
| Asset | Time | State | Event |
+-------+----------+-------+-------+
| 1 | 12:00:00 | Good | 0 |
| 2 | 12:00:00 | Good | 0 |
| 1 | 12:00:01 | Good | 0 |
| 2 | 12:00:01 | Good | 0 |
| 1 | 12:00:02 | Bad | 1 |
| 2 | 12:00:02 | Good | 0 |
| 2 | 12:00:03 | Good | 0 |
| 2 | 12:00:03 | Good | 0 |
| 1 | 12:00:04 | Bad | 0 |
| 1 | 12:00:04 | Good | 0 |
| 2 | 12:00:05 | Good | 0 |
| 2 | 12:00:05 | Bad | 1 |
| 2 | 12:00:06 | Bad | 0 |
| 1 | 12:00:06 | Good | 0 |
| 2 | 12:00:07 | Bad | 0 |
| 2 | 12:00:07 | Good | 0 |
| 2 | 12:00:08 | Good | 0 |
| 1 | 12:00:08 | Bad | 1 |
| 2 | 12:00:09 | Good | 0 |
| 1 | 12:00:09 | Bad | 0 |
| 2 | 12:00:10 | Good | 0 |
| 1 | 12:00:10 | Good | 0 |
+-------+----------+-------+-------+
I intend to create a chart for this data to show how many times an event occurs for a particular asset per day. I figured the easiest way to do that is to have a column that counts a 1 when the event starts, and then sum this column over the dates to create the visualisation.
My idea was for a calculated column that picks up when the "State" is bad, and then checks to see if the previous state for that "Asset" was good, and set as a 1 if so. I have so far been unable to find a way to identify what the value was for a previous row entry for a particular "Asset".
Note that there are roughly 100 (or more) individual "Asset" entries, so creating an individual calculated column to track each would not be feasible. I am also using Spotfire 7.1.
Thanks in advance, and sorry again if the way I've written this is confusing.
I have a question in Excel and need your help!
I can do this if it was static problem but I need my end result to adjust to my input in the table because the year sometimes the year finishes sooner or later or is different or the number of activities vary.
Initial table (It always go in chronological order and it only has 1 or nothing. (I put 0 there because I wanted to put space and didn't know how to do it. also Number of activities may vary)
+----------------+------+------+------+------+------+------+
| year | 2016 | 2016 | 2016 | 2017 | 2017 | 2017 |
+----------------+------+------+------+------+------+------+
| month calendar | 10 | 11 | 12 | 1 | 2 | 3 |
| month project | 1 | 2 | 3 | 4 | 5 | 6 |
| Activity 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| Activity 2 | 0 | 0 | 1 | 1 | 1 | 1 |
| Activity 3 | 0 | 0 | 0 | 1 | 1 | 0 |
| Activity 4 | 0 | 1 | 1 | 0 | 0 | 0 |
| Activity 5 | 1 | 1 | 1 | 1 | 1 | 1 |
+----------------+------+------+------+------+------+------+
I want in another sheet
+---------------+------------+------------+------------+------------+
| Activity year | 2016 | | | |
| | | | | |
| Activity 1 | Activity 2 | Activity 3 | Activity 4 | Activity 5 |
| 25,0% | 12,5% | 0,0% | 25,0% | 37,5% |
| | | | | |
| Activity year | 2017 | | | |
| | | | | |
| Activity 1 | Activity 2 | Activity 3 | Activity 4 | Activity 5 |
| 0,0% | 37,5% | 25,0% | 0,0% | 37,5% |
+---------------+------------+------------+------------+------------+
Now imagine that in the "another sheet" I have nothing and I want the result to adjust to the initial table. How can I do this?
Sorry for bad editing but I can't do better.
Any help is good thank you, I answer any question you might have.
I'm going to add something that is related to this and I also need. I need a formula to get the number of different activities per year that happen at least once for my calculations later. In this case it would be a formula for 2016 which the result is 4 and 2017 is 3.