Dealing with a daily time window across timezones in Node.js - node.js

Currently, I'm working on a project that requires a window of time to be selected that is used as a valid window to trigger an event within. This window is selected by the user as a start time (24 hour time), end time (24 hour time), and a timezone. My goal is to then be able to convert these times into UTC based on the offset from the provided timezone and save into MySQL.
The main problem is I have set up the entire flow to deal with time-only data types from the mobile app all the way back to the MySQL database. I have been trying to figure out a solution that won't require changing all those data types to include date and time which would require changes in many parts of the project.
Can I make this calculation without dealing with the date? I don't believe I can as timezone offsets range from -12:00 to +14:00 which would push some windows to the next or previous days when turned into UTC.
Is the correct approach to add in the date component and then continue to update it as time progresses? I also want to ensure daylight savings doesn't create errors.
Ultimately I would like the best approach to take so if I have to change a lot now I'd rather do that then deal with a headache later. Any thoughts would be greatly appreciated!

Related

dealing with UTC dates and the future

I just discovered, that storing dates in utc is not ideally correct if we are also dealing with dates in the future. It seems to be the case because, timezones seem to change more often than we think they do. Fortunately, we seem to have the IANA tzdb that seems to get updated periodically, but, confusingly, postgres seems to use a specific version of the db which it seems to have at build time..
So, my question is, if the timezones are changing, with daylight saving going on, with political, geographical adjustments happening, and our database is not with the latest of the tzdb, how would we be able to keep track of the accuracy of the dates in the system? Additionally, would libraries like date-fns-tz basically not be accurate to account for new timezone changes?
Ideally I would think a library would make a network call to a central server that would maintain the latest changes, but, it doesn't seem to be the case. How are the latest date/timezone changes usually dealt with?
The IANA time zone database collects the global knowledge about what time zone was in effect at what time in every part of the world. That information is naturally incomplete, specifically when it comes to the future. A (IANA) time zone is not an offset from UTC, but a rule that says when which offset from UTC is active. EST is not a time zone in that sense, it is an abbreviation for a certain UTC offset. If you live in New York, you will sometimes have EST, sometimes EDT, depending on the rules for the time zone America/New_York. Of course you should update the time zone database, but not because the timestamps change (they are immutable), but because the way that the timestamps are displayed in a certain time zone can change.
What is stored in the database is always an UTC timestamp, so the timestamp itself is immutable. What changes is the representation. So if you predict that the world will end next July 15 at noon Austrian time, and the Austrian government abolishes daylight savings time, your prediction will be an hour off (unless you expect the cataclysm to follow Austrian legislation). If you are worried about that, make your predictions in UTC or at least add the UTC offset to the timestamp.
If you store the timestamp with time zone in the database, and you query it today with timezone set to Europe/Vienna, you will get a certain result. If you update the time zone database, and the new legislation is reflected in the update, then the same query will return a different result tomorrow. However, it will still be the same timestamp, only the UTC offset in use will be different:
SELECT TIMESTAMP WITH TIME ZONE '2023-07-15 12:00:00+02'
= TIMESTAMP WITH TIME ZONE '2023-07-15 11:00:00+01';
?column?
══════════
t
(1 row)
To clarify #Laurenz's statement in the comments further with an example, lets take an extreme case of samoa , where they switched from GMT-11 timezone, to GMT+13 skipping an entire day.
While ignoring what a timezone actually is (different similar opinions in the comments), for the purpose of the calculations below, lets just consider it a value offset from the standard UTC. Also, do note, I use my own symbolic ways to calculate, but, it is very understandable, hopefully ;-)
so, samoa on Dec 29, 2011 skipped a day, how? Based on what I found, when the clock struck midnight they effectively skipped Friday. But, the unix timestamp
remains equivalent/unchanged:
GMT-11
(-)GMT+13
__________
= 24hrs
Let, WST=GMT-11
2011-12-29 T 24:00:00 - 11 (clock strikes midnight)
= 2011-12-30 T 00:00:00 - 11 (WST)
= 2011-12-30 T 11:00:00 (UTC)
now the switch occurs, WST=GMT+13
2011-12-31 T 00:00:00 + 13 (WST)
= 2011-12-31 T-13:00:00 (UTC)
= 2011-12-30 T 11:00:00 (UTC)
So, as far as I can see, storing future dates does not really affect the value of the date itself. But, what it does affect is the way the dates are displayed, e.g. if the timezone info was not updated, people would still see the day after the 29th at samoa as Friday, 30th. But, in that case, it would be Fri, 30th GMT-11, whereas if the information was updated, it would be Sat, 31, GMT+13. So, all is well.
more details in the comment section of #Laurenz's answer
Also, as #Adrian mentions above, softwares that deal with timezones, come packaged with a version of tzdb if they support the conversion at all. It seems to be the case in postgres as well though it seem you can configure it to use the system's version. For such cases, you gotta update the software or the system's db itself.
I understand that you want to store a future point in time, like "10:00am on July 5th 2078 in the time zone of Australia/Sydney", regardless of what offset that time zone has compared to UTC when you retrieve the point in time again. And when the time comes, the point in time might not even exist, because it is being skipped for the introduction of daylight saving time (or it might exist more than once).
Speaking XML Schema, the information you want to store consists of
a dateTime without timezoneOffset, in the given example 2078-07-05T10:00:00 (no trailing Z)
plus a time zone, given as a string from the IANA database, in the given example Australia/Sydney.
I don't know how this is best stored in a PostgreSQL database, whether as two separate strings, or in a special data type. The PostgreSQL documentation says:
All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.
That sounds to me as if the UTC value was fixed, and the local time value in a given time zone might change if daylight saving time is introduced or abolished in that time zone. (Am I correct here?) You want it the other way round: The local time remains the same and the UTC value might change after DST introduction/abolition.
For example, assume that polling stations for the next general election open at 2025-09-21T08:00:00+02:00 in my time zone. But if my country abolishes DST before then, they will open instead on 2025-09-21T08:00:00+01:00 without an explicit rescheduling. In other words: The UTC time changes, but the local time does not.
Or consider a flight whose local departure time and time zone are stored, which has a duration of 10 hours and arrives in another time zone. Its local arrival time then changes when the offset of the departure time zone changes, for example, because daylight saving time is introduced or abolished in that country on day X, but the offset of the arrival time zone does not change. An app that computes the local arrival time must then show a changed arrival time when it is executed on day X or later, although the stored data (the local departure time, departure time zone, arrival time zone and flight duration) have not changed. The required change can happen automatically if the app uses a library that is based on the IANA time zone database and receives an upgrade that includes the DST introduction/abolition before day X arrives.
For an example of such a library, see https://day.js.org/docs/en/timezone/parsing-in-zone.

Handling timezones between Client and Server

I'm developing a client and server application and I'm using luxon and postgres.
Considering the current time i'm writing this post (2022-11-15T22:55:27.374-03):
On local server/db, the dates are saved on UTC-3, like the above (2022-11-15T22:55:27.374-03);
On hosted server/db, the dates are saved on UTC-0 (2022-11-16T01:55:27+00);
My problem is that i need to query data in between begin and end of day, and while on local it works, on server it doesn't because day starts at 03:00 and finishes on next day 02:59.
I tried hardcoding conditional offset based on env but i dont believe that this should be the best solution...
Is there a proper way to handle this timezone difference?
You need to define what beginning and end of day mean. There are a total of 48 hours in which any timezone can be in a particular date. You'll have to decide if your queries will be "today" in the server's timezone or in the local client's timezone. If your queries are the server's "today". You can parse any request as plain timestamps and let the receiver decide what bounds to use. The flip side is to request the actual date string you want e.g. 2020-01-01 and let the server calculate the bounds in its timezone.

Stream Analytics: How can I start and stop a TUMBLINGWINDOW aggregation job inorder to reduce costs while still getting the same aggregation results?

Context
I have created a streaming job using Azure portal which aggregates data using a day wise TUMBLINGWINDOW. Have attached a code snippet below, modified from the docs, which shows similar logic.
SELECT
DATEADD(day, -1, System.Timestamp()) AS WindowStart
System.Timestamp() AS WindowEnd,
TollId,
COUNT(*)
FROM Input TIMESTAMP BY EntryTime
GROUP BY TumblingWindow(day, 1), TollId
Now that the job has been running and can see it producing output I want to be able to reduce the costs ideally by setting some sort of time scheduling so that the job can run and still produce the same output without being on all the time.
The only real constraint being that the aggregated output at the end of each TUMBLINGWINDOW has to remain the same as if it were running all the time (no impact of stop-starting on output).
This then brings me to my question.
Update: 2021-02-28
Before going into the question another thing that drove me was that through Azure portal you can manually start and stop a job. When you start/restart a job you can set a custom start time for the job/query. With this level of control say I start a job (or have a job running) and then decide to stop it for majority of the day and then turn it on at say 11:30pm each day with a custom start time of midnight of the current day then it would be able to be on for approx 30min before it would output the results (yet still to my understanding produce the same aggregation results/effect compared to if it was on the whole day up until that point). This job could then be paused again at 00:30am ( the next day for which it stays paused for the majority of the day (1380min total until 11:30pm again) upon which the same above logic is applied.
This way it remains off the majority of the day yet still can produce the same output for each day wise window (correct me if I am wrong in my thinking). The only issue with this to me seems to be the fact someone would have to manually perform this. Thus I was driven to the docs looking for a way to automate this.
Question
How can I start and stop a job in an automated fashion such that the required output would still remain intact but so that the job doesn't have to remain on all the time (like it currently is)?
Does the documentation linked above suffice given the context above, if so what are some possible arrangements for the N minutes (on) and M minutes (off) time variables for this to work?
Is this possible given the scenario that I want to aggregate on a one day TUMBLINGWINDOW window (whereby I want each window to start and end at midnight of each day, as per its default behaviour.)?
Eg
Window start: 2022-02-20 00:00:00 Window end: 2022-02-21 00:00:00 (aggregation performed),
Window start: 2022-02-21 00:00:00 Window end: 2022-02-22 00:00:00 (aggregation performed),
Window start: 2022-02-22 00:00:00 Window end: 2022-02-23 00:00:00 (aggregation performed),
....so on
Thoughts
I found this documentation from Microsoft regarding auto-pausing jobs using a few methods
However came across a paragraph (quoted below) which made me doubtful whether it seems reasonable in my particular use case (TUMBLING 1 day window as described in my question section).
Note
There are downsides to auto-pausing a job. The main ones being the loss of the low latency /real time capabilities, and the potential risks from allowing the input event backlog to grow unsupervised while a job is paused. Auto-pausing should not be considered for most production scenarios running at scale.
Could this method
There are 3 ways to lower costs:
downscale your job, you will have higher latency but for a lower cost, up to a point where your job crashes because it runs out of memory over time and/or can't catch up with its backlog. Here you need to keep an eye on your metrics to make sure you can react before it's too late
going further, you can regroup multiple queries into a single job. This job most likely won't be aligned in partitions, so it won't be able to scale linearly (adding SUs is not guaranteed to give you better performance). Same comment as above, plus you need to remember that when you need to scale back up, you probably will have to break down that job into multiple jobs to again be able to scale in a linear fashion
finally you can auto-pause a job, one way to implement that being explained in the doc you linked. I wrote that doc, and what I meant by that comment is that here again you are taking the risk of overloading the job if it can't run long enough to process the backlog of events. This is a risky proposition for most production scenarios
But if you know what you are doing, and are monitoring closely the appropriate metrics (as explained in the doc), this is definitely something you should explore.
Finally, all of these approaches, including the auto-pause one, will deal with tumbling windows transparently for you.
Update: 2022-03-03 following comments here
Update: 2022-03-04 following comments there
There are 3 time dimensions here:
When the job is running or not: the wall clock
When the time window is expected to output results: Tumbling(day,1) -> 00:00AM every day, this is absolute (on the day, on the hour, on the minute...) and independent of the job start time below
What output you want produced from the job, via the job start time
Let's say you have the job running 24/7 for multiple months, and decide to stop it at noon (12:00PM) on the 1st day of March.
It already has generated an output for the last day of February, at 00:00AM Mar1.
You won't see a difference in output until the following day, 00:00AM Mar2, when you expect to see the daily window of Mar1, but it's not output because the job is stopped.
Let's start the job at 01:00AM Mar2 wall clock time. If you want the missing time window, you should either pick a start time at 'when last stopped' (noon the day before), or a custom time any time before 23:59PM Mar1. What you are driving is the output window you want. Here you are telling ASA you want all the windows from that point onward.
ASA will then reload all the data it needs to generate that window (make sure the event hub has enough retention for that, we don't cache data between restarts in the job): Azure Stream Analytics will automatically look back at the data in the input source. For instance, if you start a job “Now” and if your query uses a 5-minutes Tumbling Window, Azure Stream Analytics will seek data from 5 minutes ago in the input. The first possible output event would have a timestamp equal to or greater than the current time, and ASA guarantees that all input events that may logically contribute to the output has been accounted for.

Send multiple notifications to a user on specific time

So I have database of users which have a reminderTime field which currently is just a string which looks like that 07:00 which is a UTC time.
In the future I'll have a multiple strings inside reminderTime which will correspond to at which time the user should receive a notification.
So imagine you logged into an app, set a multiple reminders like so 07:00, 15:00, 23:30 and sent it to server. The server will save those inside a database and run a task and send a notification at 07:00 then at 15:00 and so on. So later a user decided that he will no longer wants to receive notifications at 15:00 or change it to 15:30 and we should adapt to that.
And each user has a timezone, but I guess since reminderTime is already in UTC I can just create a task without looking at timezone.
Currently I have a reminderTime as number and after client sends me a 07:00 I convert it to seconds, but as I understand I can change that and stick to string.
All my tasks are running with Bull queue library and Redis. So as I understood the best scalable approach is to take reminderTime and just create notifications for each day at a given time and just run the task, the only problem is that should I save them to my database or add a task to a queue in Bull. The same will be for multiple times.
I don't understand how should I change already created tasks inside Bull so that the time will be different and so on.
Maybe I could just create like a 1000 records at which time user should receive a notification inside my database. Then I create a repeatable job which will run like every 5 minutes and take all of the notifications which should be send in the next couple of hours and then add them to a Bull queue and mark it that it was sent.
So basically you get the idea, maybe it could be done a little bit better.
Unless you have really a lot of users, you could simply create a schedule-like table in your DB, which is simply a list of user_id | notify_at records. Then, run a periodic task every 1-5 minutes, which compares current time and selects all the records, where notify_at is less than the current time.
Add the flag notified, if you want to send notifications more than once a day to ignore ones that was already sent. There is no need to create thousands of records for every day, you can just reset that flag once a day, e.g. at 00:00 AM.
It's ok that your users wont recieve their notifications all at the same time, there could be little delays.
The solution you suggested is pretty much fine :)

Tumbling window with dynamic duration

While passing reference data field as a duration in TumblingWindow I am getting compile time error related to Window duration require positive float constant.
Can anyone please guide?
group by TumblingWindow(minute, referencetable.EntryTime)
At the moment we don't support variable time windows, so you need to set the time explicitly and not load it from the reference data. Sorry for the inconvenience.
A workaround, in the case you have only few different time durations, would be to have different steps/subqueries for the different times and use a where clause to create or not an output for that step.
Let me know if you have further question.
JS (from the Azure Stream Analytics team)

Resources