NodaTime TimeZone Data files naming - nodatime

It appears that the time zone database files used by Nodatime are named by year, with releases within the same year incrementing by a
letter - i.e., "tzdb2019a.nzd" is current as I write this, the next release
will be "tzdb2019b.nzd", and some of the previous versions may have been
"tzdb2018a.nzd", "tzdb2018b.nzd", "tzdb2018c.nzd", etc.
However, I have not been able to find this naming convention formally documented anywhere, and assumptions make me nervous.
I expect the time zone data to change more often than my application
is updated, so the application periodically checks for the latest data file at
https://nodatime.org/tzdb/latest.txt, and downloads a new file if the one in
use is different. Eventually there will be several files locally available.
I want to know that I can sort these by name and be assured that I can
identify the most recent from among those that have already been
downloaded.

That's what I anticipate, certainly. We use the versioning from the IANA time zone page, just with a tzdb prefix and a .nzd suffix. So far, that's been enough, and it has maintained the sort order.
It's possible that we might want to provide other files at some point, e.g. if there's no IANA changes for a long time (as if!) but the CLDR Windows mapping files change significantly. I don't have any concrete plans for what I'd do in that case, but I can imagine something like tzdb2019-2.nzd etc.
It's hard to suggest specific mitigations against this without knowing the exact reason for providing other files, but you could potentially only download files if they match a regex of tzdb\d{4}[a-z]+.nzd.
I'd certainly communicate on the Noda Time discussion group before doing anything like this, so if you subscribe there you should get advance warning.
Another nasty possibility that we might need more than 26 releases in a single calendar year... IANA says that would go 2020a...2020z, then 2020za...2020zz etc. The above regex handles that situation, and it stays sortable in the normal way.
Another option I could provide is an XML or JSON format for "all releases" - so just like there's https://nodatime.org/tzdb/index.txt that just lists the files, I could provide https://nodatime.org/tzdb/index.json that lists the files and release dates. If you kept hold of that file along with the data, you'd always have more information. Let me know if that's of interest to you and I'll look into implementing it.

Related

VariesAcrossGroups lost when ReInsert_ing doc.ParameterBindings?

Our plugin maintains some instance parameter values across many elements, including those in groups.
Occasionally the end users will introduce data that activates an unused Category,
so we have to update the document parameter bindings, to include those categories. However, when we call
doc.ParameterBindings.ReInsert()
our existing parameter values inside groups are lost, because our VariesAcrossGroups flag is toggled back to false?
How did Revit intend this to work - are we supposed to use this in a different way, to not trigger this problem?
ReInsert() expects a base Definition argument, and would usualy get an ExternalDefinition supplied.
To learn, I instead tried to scan through the definition-keys of existing bindings and match those.
This way, I got the document's InternalDefinition, and tried calling Reinsert with that instead
(my hope was, that since its existing InternalDefinition DID include VariesAcrossGroups=true, this would help). Alas, Reinsert doesn't seem to care.
The problem, as you might guess, is that after VariesAcrossGroups=False, a lot of my instance parameters have collapsed into each other, so they all hold identical values. Given that they are IDs, this is less than ideal.
My current (intended) solution is to instead grab a backup of all existing parameter values BEFORE I update the bindings, then after the binding-update and variesAcrossGroups back to true, then inspect all values and re-assign all parameter-values that have been broken. But as you may surmise, this is less than ideal - it will be horribly slow for the users to use our plugin, and frankly it seems like something the revitAPI should take care of, not the plugin developer.
Are we using this the wrong way?
One approach I have considered, is to bind every possibly category I can think of, up front and once only. But I'm not sure that is possible. Categories in themselves are also difficult to work with, as you can only create them indirectly, by using your Project-Document as a factory (i.e. you cannot create a category yourself, you can only indirectly ask the Document to - maybe! - create a category for you, that you request). Because of this, I don't think you can bind for all categories up front - some categories only become available in the document, AFTER you have included a given family/type in your project.
To sum it up: First, I
doc.ParameterBindings.ReInsert()
my binding, with the updated categories. Then, I call
InternalDefinition.SetAllowVaryBetweenGroups()
(after having determined IDEF.VariesAcrossGroups has reverted back to false.)
I am interested to hear the best way to do this, without destroying the client's existing data.
Thank you very much in advance.
(I'm not sure I will accept my own answer).
My answer is just, that you can survive-circumvent this problem,
by scanning the entire revit database for your existing parmater values, before you update the document bindings.
Afterwards, you reset VariesAcrossGroups back to its lost value.
Then, you iterate through your collected parameters, and verify which ones have lost their original value, and reset them back to their intended value.
One trick that speeds this up a bit, is that you can check Element.GroupId <> -1. That is, those elements that are group members.
You only need to track elements which are group members, as it's precisely those that are affected by this Revit bug.
A further tip is, that you should not only watch out for parameter-values that have lost their original value. You must also watch out for parameter-values that have accidentally GOTTEN a value, but which should be left un-set.
I just use FilteredElementCollector with WhereElementIsNotElementType().
Performance-wise, it is of course horrible to do all this,
but given how Revit behaves, I see no other solution if you have to ship to your clients.

Measuring distance of values / strings

martin fownler was discussing event sourcing
https://martinfowler.com/eaaDev/EventSourcing.html
e.g. storing data as a set of events.
Now an example would be an account. You create an account with balance 0.
Then you put 10$. You withdraw 5$. You put another 100$. Now the balance is 105$, but you don't store 105$. What you do store is
+10
-5
+100
as a series of events in the database.
Now if I want I can say "undo the last 2 steps." then I just remove the 2 last changes in the database -> account is 10
Now: how can you do that with strings?
Say account name first is empty string. Then
dirk dietmeier then hans hansenmann then foo bar how can you capture this data as set of changes? While letting it be reversable e.g. the events need to be able to reverse itself. E.g. you could just say 'delete everything and then put foo bar but is there no better solution?
is there like a svn or git like algorithm? some encoding (hex, binary?)?
Now if I want I can say "undo the last 2 steps." then I just remove
the 2 last changes in the database -> account is 10
Not if you want to preserve the history. In production event sourced applications, I would issue a compensating event. E.g. New event Y that undoes what event X did. The git analogue to this would be git revert.
Now: how can you do that with strings?
It depends on your application.
If you are tracking changes to code, it makes sense to do some research on how to express differences between two files, such that you can revert at a later time. In this sense, your event would be similar to a git commit. I suggest you look at the diff linux command http://linuxcommand.org/man_pages/diff1.html and look at the source code, or how you can implement it.
If your event is something like CustomerFirstNameChanged, doing a diff makes very little sense. You would always want to revert to a previous state such as John or Rick.
Number 2 would also make sense with an event such as ArticleRedrafted, where you can go back to a previous version. Content editors don't see revisions as we see git commits when we use git revert... They see them as points in time that can be returned to.
There are many ways you could represent the change from one string to another string as a reversible operation. It really would depend on requirements.
I would not expect the deltification in a source control system to necessarily meet the general needs you've outlined above. For example, git's purpose in deltifying files for storage (which it only does when generating a pack) is to save space. Older versions of an object may be stored as deltas from newer versions, but there is never a need to reconstruct the newer version via delta from the older version - it's stored in its entirety specifically so that it can be accessed quickly (without need to combine deltas). So storing enough information for the delta to be used as a two-way transformation would be wasteful from git's point of view.
Instead of looking at the pack deltas, you could think about how diff represents a change. This is more line-oriented than you seem to want. If you have an edited file, you might represent the edits as
d37
- Line 37 used to contain this text, but it was removed
a42
+ We added this text on line 42
c99
- Line 99 used to have this text
+ Line 99 now has this text
You could do a character-oriented version of this. Say you define operators for "add characters at offset", "remove characters at offset", and "change characters at offset". You either have to be careful with escaping delimiters, or use explicit lengths instead of simply delimiting strings. And you also should consider that each edit might change the offset of the subsequent edits. Technically pre-image offsets have all the information you need, but reversal of the patch is more intuitive if you also store the post-image offsets.
So given
well, here is the original string
you might have a reversible delta like
0/0d6:well, 14/8c12:the original1:a33/16a11: with edits
to yield
here is a string with edits
Of course, calculating the "most correct" patch may not be easy.
0/0c33:well, here is the original string27:here is a string with edits
would be equally valid but degenerates to the line-oriented approach.
But that's just one example; like I said, you could define any number of ways, depending on requirements.

Localized Xpages application multiplying properties files

Yet another weird story from Domino Designer 9.0.1:
The application in question is set to support German and English; German is set to be both the source and the default langauge.
Over the course of the past few weeks we observed that there are some CustomControls and Xpages whose properties files are multiplying; within something like 12 hours we often see hundreds of multiplied files (currently we have 120 multiplications; earlier this week we had a case with > 1000 multiplied propertiey files!) In package explorer they turn up like this:
As you can see there is something like a docUnid added to the property's file name. Apart from a different time stamp they all are identical internally. In same cases both language versions are multiplied, in this particular case here only the German (= source) version shows that phenomenon.
Another strange fact: this particular custom control hasn't changed for quite a while, and it only contains a single control with a static text attribute, alongside a
Anyone having an idea what could be causing this, and what possible solutions I could try?
Tech facts and some more observations:
Domino Designer 9.0.1 FP6, ExtLib 17; we are working in a team where each one of us is coding in their own local replica, then replicating into the "hub" replica. I can't prove it but I assume that there is a connection between one of us replicating updates and the creation of new prüperty duplicates
EDIT: some more observations: I think I was able to pin it down to the replication between two specific machines; I just ran a sequence of 5 or 6 manually driven replications between both instances, every time without making any changes to the design code on either side. nevertheless every replication reported exactly 1 update and 1 addition, and each time a new property file was added.
So meanwhile I deleted the custom control in question and rebuilt it from scratch under a slightly different name (just to be on the safe side). For now it seems that the application is "behaving" now but I'm sort of sure that this will return sooner or later.
Speak after me: source control and replication do not match.
More details:
The property files get stored as attachments in a design note. That's usually the note with the form. Unless you switch on multi lingual, then each property gets its own note. When different people work on the database these note elements get recreated on build getting the next UNID kind of.
So the right flow for what you try to do: pick your best version of the nsf. Nuke the other replicas. Bind it to version control. Let your peer developer create an nsf from that repository. Sync of design shall only happen via that repository.
While your on it: add Bavarian as language, so your Munich customers can use the app too

Oracle Sort Order - What may cause it to change

Disclaimer: I know that it is bad to not use an 'ORDER BY' in SQL when sorted data is required.
I am currently supporting a Pro*C program which is having a wierd-problem.
One of the possible causes of the wierd-problem may be that the original developers (from a long time ago) have not used ORDER BY in their SQL even though the program logic depends on it!
The program has been working fine all these years and started showing problems only recently.
We are trying to pin the wierd-problem to the ORDER BY mistake (there are other cause candidates like a recent port from Solaris to Linux which took place).
What shadowy things on the database end should we look at that may have changed the old sort order? Things like data files etc?
Anybody have any experience with Pro*C on Solaris magically sorting the result-set?
Thanks!
Since you know that the program cares about the order in which results are returned and you know that the query that is submitted is missing an ORDER BY clause, is there a reason that you don't just fix the problem rather than looking to try to figure out whether the actual order of results may have changed? If you fix the known ORDER BY problem and the "weird problem" you have disappears, that would provide some pretty good evidence that the "weird problem" is, in fact, caused by the missing ORDER BY.
Unfortunately, there are lots of things that might have caused the order of results to change many of which may be impossible to track down. The most obvious cause would be a change in the execution plan. That, in turn, may have been caused either because statistics changed or because statistics didn't change enough or because of a patch or because of an initialization parameter change or because of a client configuration change among other things. If you are licensed to use the AWR (Automatic Workload Repository), you might be able to find evidence that the plan has changed by looking to see if there are multiple PLAN_HASH_VALUE values for the SQL_ID in DBA_HIST_SQLSTAT over different days. If there are, you'd still have to try to figure out whether the different plans actually caused the results to be returned in a different order. Beyond a query plan change, though, there are dozens of other possible causes. The physical order of data on disk may have changed because someone reorganized the table or because someone moved data files around on the disk or because the SAN automatically rebalanced something by moving data around. Some data may have been cached (or may not have been cached) in general in the past that is now cached. An Oracle patch may have been applied.
I suggest that change your physical table with view and make your required order in that view.
example
TABLE_NOT_SORTED --> rename to --> PHYS_TABLE_NOT_SORTED
CREATE VIEW TABLE_NOT_SORTED
AS
SELECT * FROM PHYS_TABLE_NOT_SORTED
ORDER BY DESIRED_COLUMNS
For response to comment:
According to this question and Ask Tom's Answer, it seems that since Oracle does not guarantee a default sorting if you do not use "ORDER BY", they are free to change it. They are absolutely right of course. If you need sorting, use Order By.
Other than that we can not say anything about your code or default ordering.

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.
You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.
I would also look at CSV and then use an OLEDB connection against the CSV file for importing.
If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.
With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3
Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004
Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.
If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

Resources