How to parse multiple formats with Noda Time? - nodatime

I like the Noda Time handling of formatting and parsing values using the various *Pattern types in the NodaTime.Text namespace. However, user input is often not as regular as a single format. For example, our app uses the time format "h:mm tt" but we would like to be able to parse user input in any of the following formats:
h:mm tt
h:mmtt (no space)
h:mm t
h:mmt (no space)
h tt
hh:mm
and so on...
Is there a way to use Noda Time to parse input that may be in any of a number of formats?

(Sorry for taking so long to respond to this.)
Annoyingly, it looks like we haven't exposed this.
It's present in Noda Time in the CompositePattern class. Unfortunately that's currently internal. I've raised issue 147 to fix this at some point (probably in the 1.1 time frame).
For the moment, it's probably easiest to just use the same code from CompositePattern - you needn't create your own IPattern<T> implementation for this, as it's only really the Parse method which is useful. (Indeed, you could even write it as an extension method on IEnumerable<IPattern<T>>, although I'm not sure offhand whether we've given enough visibility to create the same kind of failure result.)

Related

Getting started at tradingview and referencing balances

I may be new to tradingview but their pinescript programming language seems to be the best I've ever seen for automated trading. They seem to really want me to succeed but I cannot find where it tells me how to access the balances for certain balances. I am trying to make a code where I do not reinvest the extra I make so I have to be able to reference the available amount. I have not quite finished the manual yet but I do not see what variable or function allows me to do that, or at least not where I would expect it.
Have a look at strategy.equity. There are quite a few built-in variables for strategy values. You can inspect them from the refman by searching on "strategy".
You can also calculate your own metrics using a technique like this one if you don't find what you need in the built-ins.
And welcome to Pine! This is the best place to start your journey:
https://www.tradingview.com/?solution=43000561836

NodaTime TimeZone Data files naming

It appears that the time zone database files used by Nodatime are named by year, with releases within the same year incrementing by a
letter - i.e., "tzdb2019a.nzd" is current as I write this, the next release
will be "tzdb2019b.nzd", and some of the previous versions may have been
"tzdb2018a.nzd", "tzdb2018b.nzd", "tzdb2018c.nzd", etc.
However, I have not been able to find this naming convention formally documented anywhere, and assumptions make me nervous.
I expect the time zone data to change more often than my application
is updated, so the application periodically checks for the latest data file at
https://nodatime.org/tzdb/latest.txt, and downloads a new file if the one in
use is different. Eventually there will be several files locally available.
I want to know that I can sort these by name and be assured that I can
identify the most recent from among those that have already been
downloaded.
That's what I anticipate, certainly. We use the versioning from the IANA time zone page, just with a tzdb prefix and a .nzd suffix. So far, that's been enough, and it has maintained the sort order.
It's possible that we might want to provide other files at some point, e.g. if there's no IANA changes for a long time (as if!) but the CLDR Windows mapping files change significantly. I don't have any concrete plans for what I'd do in that case, but I can imagine something like tzdb2019-2.nzd etc.
It's hard to suggest specific mitigations against this without knowing the exact reason for providing other files, but you could potentially only download files if they match a regex of tzdb\d{4}[a-z]+.nzd.
I'd certainly communicate on the Noda Time discussion group before doing anything like this, so if you subscribe there you should get advance warning.
Another nasty possibility that we might need more than 26 releases in a single calendar year... IANA says that would go 2020a...2020z, then 2020za...2020zz etc. The above regex handles that situation, and it stays sortable in the normal way.
Another option I could provide is an XML or JSON format for "all releases" - so just like there's https://nodatime.org/tzdb/index.txt that just lists the files, I could provide https://nodatime.org/tzdb/index.json that lists the files and release dates. If you kept hold of that file along with the data, you'd always have more information. Let me know if that's of interest to you and I'll look into implementing it.

Standard way to extract sequence number from core data objectID?

A core data objectID looks like this:
x-coredata:\/\/6CBBA433-7B21-4638-BBBD-67C771B38E97\/User\/p2
How can I retrieve in an easy, pretty way the last sequence number?
You can't, at least not if you want reliable, stable code. Getting the end of that string is basic string processing. It's the last component of a URL, there's even a method on NSURL that will just give it to you.
But the format of these ID strings is undocumented, and therefore something that can potentially change any old time Apple thinks it would be a good idea. You'd be relying on something which is in no way guaranteed to actually work.
If you want a sequence number, add a sequence number as one of the attributes of the object. That's the right way to do it with Core Data. Not only is it more reliable, it's something you can change when or if you think it should change instead of being something Apple will break for you when they think it should change.

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.
You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.
I would also look at CSV and then use an OLEDB connection against the CSV file for importing.
If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.
With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3
Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004
Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.
If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

What text format can I use to present data originally in an Excel spreadsheet?

I have an Excel spreadsheet that has many people's estimates of another person's height and weight. In addition, some people have left comments on both estimate cells like "This estimate takes into account such and such".
I want to take the data from the spreadsheet (I've already figured out how to parse it), and represent it in a plain text file such that I can easily parse it back into a structured format (using Perl, ideally).
Originally I thought to use YAML:
Tom:
Height:
Estimate: 5
Comment: Not that confident
Weight:
Estimate: 7
Comment: Very confident
Natalia: ...
But now I'm thinking this is a bit difficult to read, and I was wondering if there were some textual tabular representation that would would be easier to read and still parsable.
Something like:
PERSON HEIGHT Weight
-----------------------------
Tom 5 7
___START_HEIGHT_COMMENT___
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed [...]
Wait, what's this project about again?
___END_HEIGHT_COMMENT___
___START_WEIGHT_COMMENT___
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed [...]
Wait, what's this project about again?
___END_WEIGHT_COMMENT___
Natalia 2 4
John 3 3
Is there a better way to do this?
CSV (Comma Separated Values).
You can even save it directly into this format from Excel, and read it directly into Excel from this format. Yet it is also human readable, and easily machine parseable.
Normally if I want to capture data from a spreadsheet in textual form I use CSV (which Excel can read and write). It's easy to generate and parse as well as being compatible with many other tools but it doesn't rank high on the "human readable" chart. It can be read but it's awkward for anything but simple files with equal field widths.
XML is an option, but YAML is easier to read. Being human-readable is one of the design goals of YAML. The YAML::Tiny module is a nice and lightweight module for typical cases.
It looks like what you have in mind is a plain text table, or possibly a tabular format with fixed with columns. There are some modules on CPAN that might be useful: Text::Table, Text::SimpleTable, others... These modules can generate a representation that's easy to read but parsing it will be harder. (They're intended for data presentation, not storage and retrieval.) You'd probably have to build your own parser.
Adding to Robert's answer, you can simply put the comments in additional columns (commas will be escaped by the CSV output filter of Excel etc). More on CSV format: www.csvreader.com/csv_format.php
No reason you can't use XML, though I'd imagine it's overkill in this particular case.
There's also Config::General for simple data, and its family of related classes.

Resources