Date-period understands “before 2000” as [“2000-01-01/2000-12-31”]

Date-period understands “before 2000” as [“2000-01-01/2000-12-31”] - date-range

I am building an api.ai agent, but I'm struggling to get unbounded date-period parameters understood correctly.
For example:
before 2000 is interpreted as ["2000-01-01/2000-12-31"]
after 1999 is interpreted as ["1999-01-01/1999-12-31"]
after January 2007 is interpreted as ["2007-01-01/2007-01-31"]
This makes me think only the date (2000/January 2007) is used for calculating the date-range, ignoring the adverb (before/after).
Is there a way to understand before 2000 as ["0000-01-01/2000-01-01"] (or at least ["1970-01-01/2000-12-31"] if Epoch-based)?

The before and after isn't matched, you are right.
You can add a custom entity, like temporal preposition, to deal with this.
before: prior to, previous to, earlier than, preparatory to, in preparation for, preliminary to, in anticipation of, in expectation of; in advance of, ahead of, leading up to
after: following, subsequent to, succeeding, at the close/end of, in the wake of, later than
You may create a composite entity (see in the docs).
There you can compose the new preposition entity with the sys.date entity.
So you will bound them together logically.

Related

VariesAcrossGroups lost when ReInsert_ing doc.ParameterBindings?

Our plugin maintains some instance parameter values across many elements, including those in groups.
Occasionally the end users will introduce data that activates an unused Category,
so we have to update the document parameter bindings, to include those categories. However, when we call
doc.ParameterBindings.ReInsert()
our existing parameter values inside groups are lost, because our VariesAcrossGroups flag is toggled back to false?
How did Revit intend this to work - are we supposed to use this in a different way, to not trigger this problem?
ReInsert() expects a base Definition argument, and would usualy get an ExternalDefinition supplied.
To learn, I instead tried to scan through the definition-keys of existing bindings and match those.
This way, I got the document's InternalDefinition, and tried calling Reinsert with that instead
(my hope was, that since its existing InternalDefinition DID include VariesAcrossGroups=true, this would help). Alas, Reinsert doesn't seem to care.
The problem, as you might guess, is that after VariesAcrossGroups=False, a lot of my instance parameters have collapsed into each other, so they all hold identical values. Given that they are IDs, this is less than ideal.
My current (intended) solution is to instead grab a backup of all existing parameter values BEFORE I update the bindings, then after the binding-update and variesAcrossGroups back to true, then inspect all values and re-assign all parameter-values that have been broken. But as you may surmise, this is less than ideal - it will be horribly slow for the users to use our plugin, and frankly it seems like something the revitAPI should take care of, not the plugin developer.
Are we using this the wrong way?
One approach I have considered, is to bind every possibly category I can think of, up front and once only. But I'm not sure that is possible. Categories in themselves are also difficult to work with, as you can only create them indirectly, by using your Project-Document as a factory (i.e. you cannot create a category yourself, you can only indirectly ask the Document to - maybe! - create a category for you, that you request). Because of this, I don't think you can bind for all categories up front - some categories only become available in the document, AFTER you have included a given family/type in your project.
To sum it up: First, I
doc.ParameterBindings.ReInsert()
my binding, with the updated categories. Then, I call
InternalDefinition.SetAllowVaryBetweenGroups()
(after having determined IDEF.VariesAcrossGroups has reverted back to false.)
I am interested to hear the best way to do this, without destroying the client's existing data.
Thank you very much in advance.

(I'm not sure I will accept my own answer).
My answer is just, that you can survive-circumvent this problem,
by scanning the entire revit database for your existing parmater values, before you update the document bindings.
Afterwards, you reset VariesAcrossGroups back to its lost value.
Then, you iterate through your collected parameters, and verify which ones have lost their original value, and reset them back to their intended value.
One trick that speeds this up a bit, is that you can check Element.GroupId <> -1. That is, those elements that are group members.
You only need to track elements which are group members, as it's precisely those that are affected by this Revit bug.
A further tip is, that you should not only watch out for parameter-values that have lost their original value. You must also watch out for parameter-values that have accidentally GOTTEN a value, but which should be left un-set.
I just use FilteredElementCollector with WhereElementIsNotElementType().
Performance-wise, it is of course horrible to do all this,
but given how Revit behaves, I see no other solution if you have to ship to your clients.

data and structure cleansing of excel sheets

I have over 6,000 excel sheets. While all the sheets describe the same thing, they are independently formatted. They all have between 9 and 13 columns, but they are out of order, the column names are independently misspelled, and they may or may not have a second, or third, column header.
I am currently trying in python to read cells in a left-down-right-up motion to attempt to locate the same data, but there is physically too many differences in structure names, column ordering, and data definitions to lock them in one a time. Is there a tool that I can use to read these documents and conform them to a single format, via a rapid mapping function?
Thanks much.
Thanks

Wow, it's the Ultimate Data Horror Story.
I want to ask how you ever let it get this way... but I actually don't want to know; I'm already going to have nightmares about this.
It's like that Hoarding show on TV, but with data.
No, I'm afraid that if you can't even identify a pattern then there's no magic function that will be able to either.
But that doesn't mean it's a lost cause. It's just going to need some human interaction, and there are ways to minimize the pain.
What you need is a custom interface that will load the documents one by one, and will walk a human through clicking each relevant column or area, and then automatically load the next document.
There would also need to be buttons for sorting things like obvious garbage sheets (blanks?), "unknowns" (that get put in a folder for advanced research later), and other "unpredictables" may come up during the process.
Also, perhaps once you get into it, you'll notice a pattern you're not thinking of, like maybe *"the person who handled the files from 2002 to 2004 set them up this way"*, or, "when Budget is misspelled, it's always either Bugdet or Budteg".
In this scope, little patterns like that can make a big difference.
Depending on your coding skills, you may or may not need outside assistance with this. I assume this is not data that can just get thrown out, or you wouldn't be asking...
If each document took an average of 20 seconds to process, that would be about 33 hours in total. An hour a day an it's done in a month. Or someone full-time, and it's done in a week.
Do you have a budget you can throw at this? Data archaeology is an actual thing! Hell, I'll do it for you for the right price... (wouldn't break the bank, depending on how urgent it is, of course!)
Either way, this ain't going to be fun for "someone"...

Modeling one-to-many relations using Domain Driven Design

This question is more of a general question about how to model simple one-to-many relations using collections: should a change in a list item be reflected in the version of the aggregate containing it?
The domain is about meeting scheduling (like in Outlook).
I have a Meeting entity, which can have multiple Participants.
A participant can accept/decline meeting requests.
Rescheduling a meeting nullifies all of the participants confirmations.
I thought of two ways to model this.
Option 1
The Meeting aggregate will contain a list of Participants where each Participant has a ParticipantId and a Status (accepted/denied).
The problem here is that every Accept or Deny command, for a specific participant, increments the Meeting's version, which means two participants will enter a race condition if trying to Accept the meeting request based on the same original version.
Although this could be solved by re-reading the Meeting's document and retrying the Accept command, it's quite annoying considering how often this could happen.
Another approach is to ignore the meeting's version when executing the Accept command, but this introduces a new problem: what happens if, after sending the meeting requests, the meeting has been rescheduled? In this case we can't afford to ignore the Meeting's version, because this time the version DOES represent a real version that should be considered.
BTW, is it at all a good practice to ignore the version in some of the commands and not in others?
Option 2
Extract a Participation aggregate out of Meeting.
Participation will have MeetingId, ParticipantId, and Status.
It will also have its own version.
This way, when participant X Accepts the meeting request, only the relevant Participation will be modified, and the rest will be left intact.
And, when rescheduling the meeting, a "Meeting Rescheduled" event will be published and an event handler will respond to it by resetting all of the Participations' statuses to "NotAccepted" regardless of their current version.
On the one hand this sounds logical in the sense that a meeting's version shouldn't be incremented just because someone accepted/denied its request.
On the other hand, modeling Participation as a standalone aggregate doesn't sound quite right to me, because it is has no meaning outside of the context of the meeting.
Anyway, would love to get feedback on this and see the various approaches to this problem.

Although this could be solved by re-reading the Meeting's document and retrying the Accept command, it's quite annoying considering how often this could happen.
This looks like a modeling error. You should keep in mind that the meeting aggregate is not the book of record for the participants availability - the real world is. So the message shouldn't be AcceptInvitation, but instead InvitationAccepted. There shouldn't be a conflict about this, because the domain model doesn't get to veto events outside of its authority boundary.
You might, depending on your implementation, end up with a concurrent modification exception in your plumbing, but that's something that you should be handling automatically (ie: expected version any, or a retry).
Another approach is to ignore the meeting's version when executing the Accept command, but this introduces a new problem: what happens if, after sending the meeting requests, the meeting has been rescheduled?
The solution here is to model more carefully. Yes, sometimes you will get a message that accepts or declines an invitation that has expired.
Put another way: race conditions don't exist.
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
What happens to Alice, who replied instantly to the invitation, when the meeting is rescheduled? Why wouldn't the same thing happen to Bob, when his reply arrives just after the meeting is rescheduled?
Participation as a standalone aggregate doesn't sound quite right to me, because it is has no meaning outside of the context of the meeting.
I find that heuristic isn't particularly effective. It's much more important to understand whether entities can change state independently, or if their changes need to be coordinated.
Actually, the Meeting aggregate is used to track the participants availability. That's what it purpose is. Unless I didn't fully understand you...
It's a bit subtle, and I didn't spell it out very well.
Suppose the model says that I'm available, but an emergency in the real world calls me away. What happens? Am I blocked from going to the hospital because the model says I have to go to a meeting? Can somebody cancel my emergency by changing the invitation I've submitted?
Furthermore, if I'm away on an emergency, are you available for a meeting that is scheduled for the same time as the meeting you and I were going to have?
In this space, the real world is the authority for whether or not somebody is available. The model is just looking at a cached copy of a message describing whether or not somebody was available in the past.
The cached information being used by the model is not guaranteed to be complete. See Greg Young on warehouse systems and exception reports.
which makes me think that perhaps the Meeting aggregate should have two version fields: one will be a strong version which, when incremented, represents a breaking change, and another soft version for non-breaking changes. Does this make any sense?
Not really. Version is not, as far as I know, a term taken from the ubiquitous language of scheduling meetings. It's meta data, if it exists at all, and the business rules in your model should not depend upon meta data.
I agree, but a Meeting ID (or any ID for that matter) is also not part of the ubiquitous language, yet I might pass it back and forth between my domain world and external worlds.

Separate entity or modifier when updating a model for specific users

I'm quite new to DDD so apologies if this is well writ but I have struggled to find an answer.
In our domain we have a representation of a 'Normal Shift', e.g. Afternoon: 15.00 - 18.00. It is possible however for shifts to be modified either on specific days of the week (Mon - Sun) or for specific 'Locations'. The end result should be the same shift (Afternoon for example) but with a new time on a specific day / at a specific location.
How should we model these updated shifts? So far we've come to.
A common 'Shift' object which can be applied as a Normal Shift and also associated to a day or location.
A model to denote the change 'Shift Adjustment'.
Unique models for each shift, with some kind of relationship so they can be applied. E.g. 'Location Shift'.
We keep toggling between 1 & 3. 1 seems like a more natural language fit yet 3 feels more complete in that the business logic isn't hiding real facets of the model.
Any help would be greatly appreciated!

First, I think this initially has nothing to do with DDD. Rather it is about how you represent these concepts in your domain model.
You don't state explicitly the behavior you require. So, I'm guessing you may want to be able to send a message to an appropriate object in your model to get you a shift ... FindShift(name,[day,location]). FindShift would return either a default shift, or a custom one if it exists. Providing the day and location might be optional.
One simple way to represent this is as follows:-
A Shift has an optional day and location. This solution requires a business rule (a constraint) preventing applicable duplicates.
There are other ways to model this, especially in regards to making "default" shifts explicit. But it's hard to do so if you don't state the behavior required, rather than just a structural description.

Dynamics CRM 2011 Import Data Duplication Rules

I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.
Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.
Let me give an example. For example these are the particular rules for Import:
First Name, exact match, 10 pts
Last Name, exact match, 15 pts
Email, exact match, 20 pts
Mobile Phone, exact match, 5 pts
And then the Threshold value => 19 pts
Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate
If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate
What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?
Thank you for all the help.
Updated
Hi Konrad, thank you for your suggestions, let me elaborate here:
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.
Workflow. Run a process removing any instance calculated as a duplicate.
Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.

I haven't been fiddling a lot with duplicate detection but looking at your criteria you might be able to make rules that match those, pretty much three rules to cover your cases, full name match, last name and mobile phone match and email match.
If you want to do the points system I haven't seen any out of the box components that solve this, however CRM Extensions have a product called Import Manager that might have that kind of duplicate detection. They claim to have customized duplicate checking. Might be worth asking them about this.
Otherwise it's custom coding that will solve this problem.

I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
Workflow. Run a process removing any instance calculated as a duplicate.
You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).
1 3 5 7 -> 1 _ 5 _
3 1 5 7 -> _ 3 _ 7
Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.
Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.
In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string