Why does ICU have distinctions for "stand alone" values for dates? - icu

ICU has different formatting symbols for "stand alone" values. For example:
q Stand Alone quarter
L Stand Alone month in year
c Stand Alone local day of week
The documentation states:
"Stand Alone" values refer to those designed to stand on their own, as opposed to being with other formatted values. "2nd quarter" would use the stand alone format (QQQQ), whereas "2nd quarter 2007" would use the regular format (qqqq yyyy).
However, this doesn't explain why there is a distinction. I presume that this matters for some languages, but what are some examples?
(More confusingly, the documentation contradicts itself since it uses both q and Q for the stand alone version.)
I also presume that stand alone versions aren't needed for other fields (such as year, hour, minute, seconds) because those are numeric. If that's the case, however, why do the stand alone values for weekday, month, and quarter support numeric forms?

Distinction is relevant for ie Polish in which expressing day dd MMMM yyyy: "22 września 2022" is different than expressing just a month like ie in calendar LLLL yyyy: "wrzesień 2022".

I ended up filing ICU-21225 to correct the contradiction in the documentation and to ask for clarification. One of the comments directed me to https://www.unicode.org/reports/tr35/tr35-dates.html#months_days_quarters_eras, which states:
The context is either format (the default), the form used within a complete date format string (such as "Saturday, November 12"), or stand-alone, the form for date elements used independently, such as in calendar headers. The most important distinction between format and stand-alone forms is a grammatical distinction, for languages that require it. For example, many languages require that a month name without an associated day number (i.e. an independent form) be in the basic nominative form, while a month name with an associated day number (as in a complete date format) should be in a different grammatical form: genitive, partitive, etc.
I'm still curious about specific examples (which languages?), though.

Related

How to convert BC dates in NodaTime Instant to a regular date

Instant.FromUnixTimeSeconds(-100100000000).ToDateTimeUtc1
Once the date gets too ancient this doesn't work anymore, for example, BC dates.
Is there any easy way to convert NodaTime instant values to years months days, that works for the entire range of supported Instant values (aka 27000 BCE to 31000 CE) ?
I don't mind what data type, I am just looking to easily extract the regular time periods from the Instant values.
It's been a year or more since I've used Nodatime, but [this page in the user guide][1] says
Additionally, all calendars are restricted to four digit formats, even
in year-of-era representations, which avoids ever having to parse
5-digit years. This leads to a Gregorian calendar from 9999 BCE to
9999 CE inclusive, or -9998 to 9999 in "absolute" years.
You're question could be read to mean you didn't think BC dates worked at all. When you get more than a few thousand years from the present, strange things start to happen, such as the changing rotation rate of the Earth means there are different kinds of days; those that would be counted from sunrises vs. the kind of time used in radioactive decay, or calculation of planetary positions. It might be helpful if you mentioned your application.
[1]: https://nodatime.org/3.1.x/userguide/range

Convert hebrew day-of-the-month to letter(s) in Excel

If I display a date in cell A1 using the cell format [$-8040D] d to show the day of the jewish month, I get a number (from 1 to 30) instead of -the way it is normally displayed- a hebrew letter.
So I want to use
=CHOOSE(A1,"א","ב","ג","ד","ה","ו","ז","ח","ט","י","יא","יב","יג","יד","טו","טז","יז","יח","יט","כ","כא","כב","כג","כד","כה","כו","כז","כח","כט","ל")
But even though I see a number 1-30 displayed in A1, what's really there is a date serial code (something like "44181").
I have tried N(), and VALUE().
What's the correct way to do it?
Thanks!
Excels stores dates numbers as per the Gregorian calendar with 1 = 1 Jan 1900 (and 1900 erroneously being deemed a leap year for competitive reasons).
So first you need to convert the date to the Jewish date (I'm assuming the Jewish Lunar calendar); extract the day of the month(with the TEXT function), and then convert that value to its Hebrew letter equivalent.
eg:
=CHOOSE(TEXT(A1,"[$-he-IL,8]dd"),"א","ב","ג","ד","ה","ו","ז","ח","ט","י","יא","יב","יג","יד","טו","טז","יז","יח","יט","כ","כא","כב","כג","כד","כה","כו","כז","כח","כט","ל")
or:
=CHOOSE(TEXT(A1,"[$-8040D]d"),"א","ב","ג","ד","ה","ו","ז","ח","ט","י","יא","יב","יג","יד","טו","טז","יז","יח","יט","כ","כא","כב","כג","כד","כה","כו","כז","כח","כט","ל")
So for today which is
the formula would return
The "correct" way to do it is to realize there is no problem here, no problem at all.
Yes, Excel uses a serial number dating system. However, it completely knows how to interpret it too. So you will have a serial number as the "real" content of A1, but can extract from it the day of the month value. You can do this with:
=TEXT(A1,"d")
This gets you a TEXT "6" if it is December 6. Often that fact (that it returns TEXT) can cause trouble. But only when Excel could expect you could mean either and has to guess which. In the case of the above formula it would be reasonable for it to assume you wanted it looked at as text since the function is... TEXT()...
But in this case, using it in the CHOOSE() function it can ONLY be useful if Excel treats it as a value rather than text. So it does. No need to add anything to force Excel to do so.
So you can just replace the A1 portion of your formula with the above TEXT() function. Then Excel will use it properly and select the correct day of the month from the list.
And that's all you need.

Change the Excel date input format?

I've been struggling with Excel (2016) date formats. I know how to change display formats for dates and cells but the problem I have is the input format for dates. If I input a date as "DD.MM" or "DD.MM.YYYY" it does recognize it as a date but if I input the date as "DD.MM." (with the second dot after the month), Excel does not recognize it as a date anymore. The column in question is formatted as short date.
Is there anything that can be done or is this by-design? If so, it seems really strange as at least in my country it's the official way to write the date containing that second dot after the month number when there is no year included in the date.
I've been searching and Googling for solution but couldn't find anything on this really. I appreciate all comments and help regarding this question!
SUMMARY/TL;DR:
Excel version is 2016, country is Finland and language is finnish
Excel accepts/recognizes these as dates: 12.5 or 30.8
Excel does NOT accept/recognize these as dates: 12.5. or 30.8.
The column in question is formatted as short date
The dot after the month seems to be screwing things up
Why is this happening? Can anything be done?
Kind regards,
Tenttu
Yes, it is/was by design. (Funny enough, my Excel won't allow dots, only dashes (-), so I couldn't even test if "15.8" works)
So, there's a slight chance that the language of Excel (the defaults of time (24 hours or AM/PM), dates (MM/DD or DD/MM), decimals (comma or dot) etc.) wouldn't allow the dot at after the month. Here's an example of a user that has that dot, and wants to get rid of it. So, your system language is a good candidate for why this wouldn't work for you.
However, I realize that the example linked above don't feature a date with a dot at the end. Which could suggest it is rather by design. For example, if I add a dot to a valid date or time, it will result in some #VALUE!-error. And that's because of how Excel is programmed to convert text to a date - and remember, dates are actually just really large numbers. So, adding a dot at the end makes that conversion "impossible". We might think it's as easy as to remove a dot, but in programming, we need to program that explicitly to do that, and I'm leaning towards there is no such operation done during text to date conversion (certainly not on my system, as I get #VALUE!).
One work-around is to strip the ending dot from the date to make it a valid date. So, you can import sheets with dates with dots at the end, then strip them away, and you'll be good to go!

Named Entity Extraction of dates

I am absolutely new to the NER and Extraction and programming in general. I am trying to figure out a way where I can extract due dates and start date of certain documents. Is there a way to do this? A place where I can start? I have been looking around but the problem I run into is the same. Can extract dates but not whether the date is due or post. If it only has 1 date, is it post or due. Stuff like that. Any help would be appreciated.
Example:
"Essay on Medieval Asia was due on September 3rd."
"Your last assignment that was given on April 6th was supposed to be submitted in 10 days."
"The bid is due no later than a month from the date it was posted(today)."
The amount of possibilities to express dates in free text is huge. There are a few solutions:
You can come with a set of regular expressions and try to parse them for yourself.
Another option is to train a supervised sequence classifier like CRF, if you have a document with dates annotated.
A third option, which can have quick results is to use this framework from Facebook research https://github.com/facebookincubator/duckling, it will identify expressions which are dates or time expressions, and it will even normalise them into a single unique date.
Yet another options is ct-parse, based on Duckling but a pure python package to parse time expressions from natural language in German and English.

How should I store old dates in SharePoint?

I need to store dates in SharePoint that need to go back around 5000 BC. Ideally, I would like to be able to do date addition/subtraction, like this:
oldDate = '5000 BC';
newDate = '1995 AD';
DateDiff(oldDate, newDate, 'Years'); // equals 6995
How should I proceed? Build an old_date class based on strings? Just use regular dates, but add an AD or BC that makes the date negative?
This is a seriously non-trivial problem, and really depends on what exactly you want to do with those dates. For example, we've only used the current (Gregorian) calendar since 1582. Before that it was the Julian calendar, and before that an old Roman calendar. To make matters worse, this info is really only for Western Europe (and culturally-related areas). So if you are hoping to have someting that will give you proper accepted dates for historical events with a little simple math, you are in for a big dissappointment.
If you just want to carry the Gregorian calendar backwards, I suppose that's doable. However, there still is error, and on that scale it matters. From Wikipedia:
On timescales of thousands of years,
the Gregorian calendar falls behind
the seasons because the slowing down
of the Earth's rotation makes each day
slightly longer over time (see tidal
acceleration and leap second) while
the year maintains a more uniform
duration
If you are interested only in years and not in days then you could build a custom field with custom editor and store the year value as integer value.
Values less than zero mean BC and values higher or equal that zero mean AD.
I ended up storing dates as a text field in ISO 8601 format:
YYYY-MM-DDThh:mm:ss.sTZD
You don't have to store the entire string, for instance if you wanted to store 5000 BC, you would enter -5000-01-01. I don't get my date addition and subtraction very easily, but it was much easier to get the data in there in the format I wanted.

Resources