LUIS inconsistent datetimeV2 parsing (US and UK formats) - azure

As far as I'm aware, LUIS only comes in the en-US culture for English (there's no en-UK). Therefore, I'd expect the datetimeV2 entities to come back as YYYY-DD-MM. However sometimes LUIS sends back datetimeV2 entities as YYYY-MM-DD, and it's impossible to tell when this happens programtically.
Example:
Utterance "take time off 01/03/2019 to 04/03/2019" resolves as the US YYYY-DD-MM format:
[ { timex: '(2019-01-03,2019-04-03,P90D)',
type: 'daterange',
start: '2019-01-03',
end: '2019-04-03' } ]
HOWEVER, utterance "take time off 1st march 2019 to 4th march 2019" or "take time off march 1st 2019 to march 4th 2019" resolves as the UK YYYY-MM-DD format:
[ { timex: '(2019-03-01,2019-03-04,P3D)',
type: 'daterange',
start: '2019-03-01',
end: '2019-03-04' } ]
In addition, if the date is written as DD/MM/YYYY when the month > 12, the format is switched to YYYY-MM-DD once again. E.g. "take time off 01/03/2019 to 18/03/2019" resolves to the first date as YYYY-DD-MM and the second date as YYYY-MM-DD:
[ { timex: '(2019-01-03,2019-03-18,P74D)',
type: 'daterange',
start: '2019-01-03',
end: '2019-03-18' } ]
this makes it very hard to parse dates if the formats keep changing. How can I ensure every date range is formatted as YYYY-DD-MM? Or even YYYY-MM-DD, I don't care as long as it's consistent or at least tells me what format it has used.

There are a few points to see in your question.
The 1st one is about the first two items: there is a mistake in your assessment here:
Utterance "take time off 01/03/2019 to 04/03/2019" resolves as the US
YYYY-DD-MM format:
[ { timex: '(2019-01-03,2019-04-03,P90D)',
type: 'daterange',
start: '2019-01-03',
end: '2019-04-03' } ]
The resolution here is not the US (YYYY-DD-MM) format, it is in UK format YYYY-MM-DD because as you can see, there is a duration mention of P90D: 90 days between both dates, so 3 months.
For your last item, the reason is different. It can be explained when you have a look to how it is working. For such cases, you have to understand how this items recognition is working: as you can see here, LUIS uses Microsoft.Recognizers.Text to do this entity extraction from texts:
Microsoft.Recognizers.Text powers pre-built entities in both LUIS:
Language Understanding Intelligent Service and Microsoft Bot
Framework; and is also available as standalone packages (for the base
classes and the different entity recognizers).
All this solution is open-sourced, here: https://github.com/Microsoft/Recognizers-Text so we can analyse.
The available cultures in .Net version are listed here: https://github.com/Microsoft/Recognizers-Text/blob/master/.NET/Microsoft.Recognizers.Text/Culture.cs
public const string English = "en-us";
public const string EnglishOthers = "en-*";
public const string Chinese = "zh-cn";
public const string Spanish = "es-es";
public const string Portuguese = "pt-br";
public const string French = "fr-fr";
public const string German = "de-de";
public const string Italian = "it-it";
public const string Japanese = "ja-jp";
public const string Dutch = "nl-nl";
public const string Korean = "ko-kr";
I made a quick demo to see what is the output with your data, using the Culture possibilities provided by the Recognizers (as I don't know which English is used in LUIS):
Recognizing 'take time off 01/03/2019 to 18/03/2019'
**English**
01/03/2019 to 18/03/2019
{
"values": [
{
"timex": "(2019-01-03,2019-03-18,P74D)",
"type": "daterange",
"start": "2019-01-03",
"end": "2019-03-18"
}
]
}
**English Others**
01/03/2019 to 18/03/2019
{
"values": [
{
"timex": "(2019-03-01,2019-03-18,P17D)",
"type": "daterange",
"start": "2019-03-01",
"end": "2019-03-18"
}
]
}
As you can see, my 1st result is matching yours so I guess LUIS is based on English culture, so en-US if you have a look above.
Based on this, you can see in the implementation that for the US version, it is trying to match YYYY-DD-MM first and YYYY-MM-DD is a fallback, so the 1st date of your sentence is using the 1st matching (recognized as 3rd of January) whereas the 2nd date is using the fallback (recognized as 18th of March)

Related

Noda time representation for close/open that is an entire day (24 hour period)

I am parsing some interestingly formatted data from https://raw.githubusercontent.com/QuantConnect/Lean/master/Data/market-hours/market-hours-database.json
It contains a snippet (removing some days) as below:
"Future-cme-[*]": {
"dataTimeZone": "UTC",
"exchangeTimeZone": "America/Chicago",
"monday": [
{
"start": "00:00:00",
"end": "1.00:00:00",
"state": "market"
}
],
"friday": [
{
"start": "00:00:00",
"end": "16:00:00",
"state": "market"
}
]
}
I am using a JsonConverter<LocalTime> to convert the above, and I can parse the friday properties start and end without any issues into a LocalTime.
However, the dates which this represents as an entire day, i.e. 1.00:00:00 it throws an error on as it is not an ISO format - this opens up questions on my (incorrect!) use of the structures.
Currently I have the format that uses the LocalTime as below:
public class MarketHoursSegment
{
public LocalTime Start { get; init; }
public LocalTime End { get; init; }
public MarketHoursState State { get; init; }
}
And the formatter:
public class LocalTimeConverter : JsonConverter<LocalTime>
{
public override LocalTime Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
{
return LocalTimePattern
.GeneralIso
.Parse(reader.GetString())
.GetValueOrThrow();
}
public override void Write(Utf8JsonWriter writer, LocalTime value, JsonSerializerOptions options)
{
writer.WriteStringValue(value.ToString());
}
}
Is there a preferred way to deal with a LocalTime that represents a 24 hours span?
Would I detect 1:00:00:00 in the reader.GetString()
of the converter, and if so set to 00:00:00 (midnight) and check if
Start == End then we know it is an entire 24 hour period?
Or would it be more correct to have the Start as a LocalTime,
and a Duration for the hours (i.e. 24 hours) with End => Start + Duration?
Is there a preferred way to deal with a LocalTime that represents a 24 hours span?
It's worth taking a step back and separating different concepts very carefully and being precise. A LocalTime doesn't represent a 24 hour span - it's just a time of day. Two LocalTime values could effectively represent a 24 hour span without reference to a specific date, yes.
If you can possibly change your JSON to use 00:00:00, and then treat a "start==end" situation as being the full day, that's what I'd do. That does mean, however, that you can never represent an empty period.
Now, in terms of whether you should use a start and duration... that really depends on what you're trying to model. Are you trying to model a start time and an end time, or a start time and a duration? So far you've referred to the whole day as "a 24 hour span" but that's not always the case, if you're dealing with time zones that have UTC offset transitions (e.g. due to daylight saving time).
Transitions already cause potential issues with local intervals like this - if you're working on a date where the local time "falls back" from 2am to 1am, and you've got a local time period of (say) 00:30 to 01:30, then logically that will be "true" for an hour and a half of the day:
00:00-00:30: False
00:30-01:30 (first time): True
01:30-02:00 (first time): False
01:00-01:30 (second time): True
01:30-02:00 (second time): False
02:00-00:00 (next day): False
We don't really know what you're doing with the periods, but that's the sort of thing you need to be considering... likewise if you represent something as "00:00 for 24 hours" how does that work on a day which is only 23 hours long, or one that is 25 hours long? It will very much depend on exactly what you do with the data.
I would adopt a process of:
Work out detailed requirements, including what you want to happen on days with UTC offset transitions in the specific time zone (and think up tests at this stage)
Extract the logical values from those requirements in terms of Noda Time types (with the limitation that no, we unfortunately don't support 24:00:00 as a LocalTime)
Represent those types in your JSON as closely as possible
Make your code follow your requirements documentation as closely as possible, in terms of how it handles the data

How can I ensure NodaTime objects always get 'stringified' to ISO formats?

When we use the NodaTime objects, it's a bit too easy to get the wrong format.
For example, we use string interpolation to construct uri's, but we really want the yyyy-MM-dd format. Same goes for logging, we don't really want any other format.
LocalDate date = new LocalDate(2020, 8, 10);
string toString = $"{date}"; // "den 10 augusti 2020"
logger.LogInformation("Date: {Date}", date); // "Date: Monday, 10 August 2020"
The documentation for ToString (which is used for the 2nd line above) states:
"The value of the current instance in the default format pattern ("D"), using
the current thread's culture to obtain a format provider."
If I change the current culture to InvariantCulture, I now get both of the above lines to show "Monday, 10 August 2020", which is better because they are consistent but neither is the yyyy-MM-dd format.
System.Threading.Thread.CurrentThread.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture;
Ideally though, I would want to only customize how NodaTime objects are "stringified" to avoid any other undesired side effects of changing culture. Is there any help to get here or am I stuck?
edit:
I made a console application to have a minimal reproducible example
Console.WriteLine(new LocalDate(2020,8,13));
Console.WriteLine(ZonedDateTime.FromDateTimeOffset(DateTimeOffset.Now));
Console.WriteLine(DateTime.Now);
Console.WriteLine(DateTimeOffset.Now);
and with that I got the following output:
den 13 augusti 2020
2020-08-13T08:39:16 UTC+02 (+02)
2020-08-13 08:39:16
2020-08-13 08:39:16 +02:00
I would have liked if LocalDate had a default output of 2020-08-13 which is more useful in logs as well as string interpolation, for example: var uri = $"api/orders?date={localDate}"
The simplest way of achieving this is to use a CultureInfo that defaults to ISO-8601 formatting. It's reasonably easy to create that, starting with the invariant culture:
using NodaTime;
using System;
using System.Globalization;
using System.Threading;
class Program
{
static void Main(string[] args)
{
var isoCulture = (CultureInfo) CultureInfo.InvariantCulture.Clone();
var format = isoCulture.DateTimeFormat;
format.ShortDatePattern = "yyyy-MM-dd";
format.ShortTimePattern = "HH:mm:ss";
format.LongTimePattern = "HH:mm:ss.FFFFFFF";
format.FullDateTimePattern = "yyyy-MM-dd'T'HH:mm:ss.FFFFFFF";
format.LongDatePattern = format.ShortDatePattern;
Thread.CurrentThread.CurrentCulture = isoCulture;
Console.WriteLine(new LocalDate(2020, 8, 13));
Console.WriteLine(ZonedDateTime.FromDateTimeOffset(DateTimeOffset.Now));
Console.WriteLine(DateTime.Now);
Console.WriteLine(DateTimeOffset.Now);
}
}
Output:
2020-08-13
2020-08-13T09:52:18 UTC+01 (+01)
2020-08-13 09:52:18.7351962
2020-08-13 09:52:18.7356716 +01:00
I believe .NET just uses "date pattern" {space} "time pattern" when formatting a DateTime, so I don't think there's a way of getting a T in there. But hey, the LocalDate output is what you wanted :)

dialogflow sys.date-time parameter format question

I'm making an app on Dialogflow and need to extract date-time info from user. So I specified a required parameter called "date-time" with #sys.date-time entity in my intent. However, when I tried to extract this parameter in my fulfillment code, I found that this parameter structure is not the same every time when I extract it. For example, when I type 12:30am into the chatbot, the returned API json response contained this:
"parameters": {
"date-time": "2019-11-27T00:30:00-08:00",
"log": "5"
},
So I can directly read date-time parameter value by parameters['date-time']
However, if I type "yesterday at 2pm" into chatbot, the returned parameter structure is this:
"parameters": {
"date-time": {
"date_time": "2019-11-25T14:00:00-08:00"
},
"log": "log"
},
See that the "date-time" parameter is wrapped inside an extra "date-time" object. This is really annoying because now i need to consider these two cases in my fulfillment code. Does anyone know why this happened? And is this a bug on my side? Thanks!
You may have found the answer to this by now, but going through Googles documentation here I found you have to consider a variety of cases when using the #sys.date-time entity. So there's nothing wrong on your end.
An extra "date_time" is used when a date and time were specified, whereas if its a period of time there's a "startDate" and "endDate" that you have to look out for inside the original "date_time" object as well.
From looking at the examples in that document I've outlined some of the cases below.
specific time (e.g.12:30am) or specific date (e.g. December 12) = single date_time object
time period (date or time e.g. April or morning) = "startDate" and "endDate" entry in a date_time object
specific date + specific time (e.g. yesterday at 2pm) = "date_time" entry in a date_time object
date + time period (e.g. yesterday afternoon) = "startDateTime" and "endDateTime" in a date_time object
Hope that helps!

Mongodb/mongoose query for completely overlapping dates (that could span multiple documents)

I'm having some issues designing a query that deals with overlapping dates.
So here's the scenario, I can't reveal too much about the actual project but here is a similar example. Lets say I have a FleaMarket. It has a bunch of data about itself such as name, location, etc.
So a FleaMarket would have many Stalls, that are available to be booked for a portion of the year (as short as 2 days, as long as all year sort of thing). So the FleaMarket needs to specify when in a year it will be open. Most scenarios would either be open all year, or all summer/fall, but it could possible be broken down further (because seasons determine pricing). Each FleaMarket would define their Seasons which would include a startDate and endDate (including year).
Here's an ERD to model this example:
When a user attempts to book a Stall, they have already selected a FleaMarket (although ideally it would be nice to search based on availability in the future). It's really easy to tell if a Stall is already booked for the requested dates:
bookings = await Booking.find({
startDate: { $lt: <requested end date> },
endDate: { $gt: <requested start date> },
fleaMarketId: <flea market id>,
}).select('stallId');
bookedIds = bookings.map(b => b.stallId);
stalls = await Stall.find({
fleaMarketId: <flea marked id>,
_id: { $nin: bookedIds }
});
The issue I'm having is determining if a Stall is available for the specified Season. The problem comes that 2 seasons could be sequential, so you could make a booking that spans 2 seasons.
I originally tried a query like so:
seasons = await Season.find({
fleaMarketId: <flea market id>,
startDate: { $lt: <requested end date> },
endDate: {$gt: <requested start date> }
});
And then programatically checked if any returned seasons were sequential, and plucked the available stalls from that that existed in all seasons. But unfortunately I just realized this won't work if the requested date only partially overlaps with a season (ex: requested Jan 1 2020 - Jan 10 2020, but the season is defined as Jan 2 2020 - May 1 2020)
Is there a way I can handle checking for completely overlapping dates that could possible overlap with multiple documents? I was thinking about calculating and storing the current and future available season dates (stored as total ranges) denormalized on the Stall.
At this point I'm almost thinking I need to restructure the schema quite a bit. Any recommendations? I know this seems very relational, but pretty much everywhere else in the application doesn't really do much with the relationships. It's just this search that is quite problematic.
Update:
I just had the thought of maybe creating some sort of Calendar Document that can store a centralized list of availability for a FleaMarket, that would do a rolling update to only store future and present data, and slowly wiping away historical data, or maybe archiving it in a different format. Perhaps this will solve my issue, I will be discussing it with my team soon.
So as I said in an update in my post, I came up with the idea to create a rolling calendar.
For anyone who is interested, here's what I got:
I created an Availability collection, that contains documents like the following:
{
marketId: ObjectId('5dd705c0eeeaf900450e7009'),
stallId: ObjectId('5dde9fc3bf30e500280f80ce'),
availableDates: [
{
date: '2020-01-01T00:00:00.000Z',
price: 30.0,
seasonId: '5dd708e7534f3700a9cad0e7',
},
{
date: '2020-01-02T00:00:00.000Z',
price: 30.0,
seasonId: '5dd708e7534f3700a9cad0e7',
},
{
date: '2020-01-03T00:00:00.000Z',
price: 35.0,
seasonId: '5dd708e7534f3700a9cad0e8',
}
],
bookedDuring: [
'2020-01-01T00:00:00.000Z'
'2020-01-02T00:00:00.000Z'
]
}
Then handling updates to this collection:
Seasons
when creating, $push new dates onto each stall (and delete dates from the past)
When updating, remove the old dates, and add on the new ones (or calculate difference, either works depending on the integrity of this data)
When deleting, remove dates
Stalls
When creating, insert records for associated seasons
When deleting, delete records from availability collection
Bookings
When creating, add dates to bookedDuring
When updating, add or remove dates from bookedDuring
Then to find available stalls for a market, you can query where { marketId: /* market's ID */, availableDates.date: { $all: [/* each desired day */], bookedDuring: { $nin: [/* same dates */ ] } }} and then pluck out the stallId
And to find markets that have available, do { availableDates.dates: { $all: [/* each desired day */], bookedDuring: { $nin: [/* same dates */ ] } }} select distinct marketIds

set GrapqQL date format

I have a mongoDB database in which one field is an ISO date.
When i query the table using a graphql (node) query i receive my objects back all right but the date format i see in graphiql is in this weird format:
"created": "Sun Nov 26 2017 00:55:35 GMT+0100 (CET)"
if i write the field out in my resolver is shows:
2017-11-25T23:55:35.116Z
How do i change the date format so it will show ISO dates in graphiql?
the field is just declared as a string in my data type.
EDIT
My simple type is defined as:
type MyString {
_id: String
myString: String
created: String
}
When I insert a value into the base created is set automatically by MongoDB.
When I run the query it returns an array of obejcts. In my resolver (for checking) I do the following:
getStrings: async (_, args) => {
let myStrings = await MyString.find({});
for (var i = 0; i < myStrings.length; i++) {
console.log(myStrings[i]["created"]);
}
return myStrings;
}
all objects created date in the returned array have the form:
2017-11-25T23:55:35.116Z
but when i see it in GraphIql it shows as:
"created": "Sun Nov 26 2017 00:55:35 GMT+0100 (CET)"
my question is: Why does it change format?
Since my model defines this as a String it should not be manipulated but just retain the format. But it doesn't. It puzzels me.
Kim
In your resolver, just return a formatted string using toISOString()
const date1 = new Date('2017-11-25T23:45:35.116Z').toISOString();
console.log({date1});
// => { date1: '2017-11-25T23:45:35.116Z' }
const date2 = new Date('Sun Nov 26 2017 00:55:35 GMT+0100 (CET)').toISOString();
console.log({date2})
// => { date2: '2017-11-25T23:55:35.000Z' }
UPDATED to answer the added question, "Why does [the date string] change format"?
Mongo does not store the date as a string. It stores the date as a Unix epoch (aka Unix time, aka POSIX time), which is the number of seconds that have elapsed since January 1, 1970 not counting leap seconds (in ISO 8601: 1970-01-01T00:00:00Z). Since your data model requests a string, it'll coerce the value using toString()
const date1 = new Date('2017-11-25T23:45:35.116Z').toString();
console.log({date1})
// => { date1: 'Sat Nov 25 2017 15:45:35 GMT-0800 (PST)' }
That should clarify the behavior for you, but what you probably really want to do is change your model so that created is properly typed as a Date. You can do this a couple ways.
Create a custom scalar
GraphQLScalarType
Creating custom scalar types
Or use an existing package that already does the above for you
graphql-date
graphql-iso-date
You just need to do step by step:
Set type of your field is Object
Insert line scalar Object into your .graphql file
Add dependence graphql-java-extended-scalars into pom.xml file
Add syntax .scalar(ExtendedScalars.Object) in buildRuntimeWiring function
Let try it.
I try it and successful!

Resources