Clarify different behaviour of neo4j and neo4j-driver

Clarify different behaviour of neo4j and neo4j-driver - node.js

I have a rudimentary database that has just a couple of nodes and relationships. When I run a match (n) return n command on the local web client provided with neo4j it returns all the nodes and relationships that's in the database, as seen in the picture below.
However when I run exactly the same command in a node.js project using the neo4j-driver module, it only returns the three nodes and none of the two relationships are included.
After a little bit of fiddling with it, I noticed that to retrieve the relationships too, I must issue something like match (n)-[r]-(m) return *. My first question is why is there such a difference? Is the local web client just trying to do a bit more to help the user?
Furthermore I find the returned records object a little bit confusing. Running this match (n)-[r]-(m) return * command returns 4 items in the result.records object out of which 2-2 are almost identical pair-wise. In a simplified view this is what it returns:
item 0: [Jack node, Jill node, Jack -> Jill relationship]
item 1: [Jill node, Jack node, Jack -> Jill relationship]
item 2: [George node, Jill node, George -> Jill relationship]
item 3: [Jill node, George node, George -> Jill relationship]
So items 0 and 1 of the result.records object differ only by the order of their elements. Same for items 2 and 3.
Question two is what am I supposed to do with this if I want to display the graph on a web page? Look for unique IDs of the nodes and relationships in all the different combinations returned?
Question three: maybe there's a better way to achieve what I'm trying to do?

The Neo4j web browser is indeed just trying to be helpful and the visualization will connect nodes if they have relationships between them (there's an option to turn this behaviour off btw). However, the resultset will not contain those if you didn't ask for them (as it shouldn't). Look on the other reponse tabs (table, text, code) in the browser to see the actual resultset.
This query may help you :
match p=(n)-[r]-(m) return p
But yes, you are correct, you will have to unpack the result in your application in order to be able to do your own interpretation. It is a case of the you get what you asked for problem that quite a few Neo4j users face. It is due to the fact that Cypher can return quite a few different things (tabular results, nodes, nodes and relationships, paths, subgraphs, ...) and the driver has to provide for all of them.
Have a look at the code tab in the browser to get a feel of what your application will have to work with (what you actually get depends on your application language of choice). It's not very difficult but it does require a bit of getting used to.
Hope this helps.
Regards, Tom
P.S. Doubles in the results are to be expected with such generic queries. Neo4j does pattern matching and your pattern does not have a direction on the relationship nor does it have labels or relationshiptypes. That's going to return quite a few matches where for example (jill)-[:nominated]-(jack) but obviously it also matches (jack)-[:nominated]-(jill). Both match the pattern. Using DISTINCT may help a bit but you really should be more explicit in the pattern.

Related

How to handle too long Gherkin scenario lines?

I have some scenarios with too many parameters and most of the parameters causes the variation of scenarios. Therefore, I need to include parameter details in scenario name to give insight about the scenario. However, this causes too long Scenario lines.
For Example:
Scenario: Create list for Today's unique stuff of 'X' item with multiple string attribute values and 'distinct count' aggregation
Given I create a 'Create List' request and name as 'New List'
When I add 'X' item to 'Create List' request
And I add item attribute to current list query on list preview request
| attribute | operator | values |
| id | EXMATCH | id1,id2 |
And I add list aggregation to current list query on 'Create List' request
| aggField | aggType |
| stuff | DISTINCT_COUNT |
And I send request to 'Create List' request date as 'TODAY'
Then 'success' parameter in response should be true
And received list name should be equal to created list name
And received list queries in 'Create List' response should be equal to created list queries
Another Scenario:
Scenario: Create list for Today's unique stuff of 'X' item with multiple integer attribute values and 'sum' aggregation
Or:
Scenario: Create list for Today's unique stuff of 'X' item with multiple integer attribute values, 'sum' aggregation and <some other parameter related conditions which causes too long scenario name>
This can go on and on according to the number of different parameters who effects the scenario.
I have a feeling like there must be best practices writing clearer and shorter scenario names. Are there any?
How should I handle these long scenario names? Or, can I find easer/shorter way of express the content of scenario?

Cucumber allows you to use natural language (as opposed to a programming language) to write your scenarios. You can use all the tools of natural language to simplify your scenarios.
The two most powerful tools to simplify are
abstraction
naming
These tools work hand in hand. With abstraction you take something with alot of details and abstract it into something simpler that removes the details. You use naming to give this new thing a name. If your name is good you can know talk about your complex thing using your new simple term and no longer have to talk about the details.
To make you scenarios simpler you need to abstract, remove the details, and give things good names. The way to do this is to read your scenario and differentiate between WHAT you are doing and HOW you are doing it. Then have your scenarios focus on only saying WHAT they are doing and not saying anything about HOW they are doing it.
One additional tool when thinking about WHAT something is doing is to also think about WHY someone is doing the thing. So lets have a look at your scenario and formulate a few questions.
We do this all the time in computing. Every time we write a method/function we are abstracting and naming. We do this even more often in real life. When you order a coffee you don't say
"I'd like a double expresso in a warmed cup with 3oz of milk wet
foamed at 60C poured with a swan pattern"
You say
"I'd like a flat white"
And of course a double expresso is just another abstraction for a set of instructions that talks about water temperature, number of grammes of coffee, grind settings (extra fine), pressure of water etc. etc.
By using abstraction and naming we can talk eloquently about coffee with all its complexity without mentioning any of the details.
So what is the 'flat white' or 'double expresso' for your scenario?
Your scenario seems to be about creating some sort of a list.
WHAT sort of list is this?
WHY are you creating it?
WHAT will people use this list for?
WHY will people find this list useful?
Once you have asked and answered these questions you can start thinking about how to simplify. When you answer these questions, use the answers to
name your feature
describe your feature
write a preamble for your feature (the bit between Feature and the first Scenario)
write you Scenario titles
You shouldn't start writing a scenario until you have all of this done, and have a Feature that tells you WHAT your scenarios are going to be about and WHY its important for you to do these things.
You also talk about the parameters you are adding causing a variation in the scenarios. So for each parameter you are adding you should be asking
WHAT sort of variation does this parameter cause?
WHY is this variation important? Is it important enough to have its own scenario?
Again think about sets of parameters creating named things like a mocha, cortado or latte.
When you have answered these questions you can remove the parameters from your scenarios. Each set of parameters that creates a variation. For each variation you can remove the parameters by abstracting and giving a name to the variation
If you apply this approach and answer these questions then you transform your scenarios into something much simpler

Shaping API endpoints and response

Hello Stackoverflow,
I'm writing API's for quite a bit of time right now and now it came to work with one of these bigger api's. Started wondering how to shape this API, as many times I've seen on a bigger platforms that one big entity (for example product page in shop) is loaded separately (We can see that item body loaded, but comments are still fetching etc.).
Usually what I've done was attaching comments as a relation in SQL query, so my frontend queried single API Endpoint like:
http://api.example.com/items/:id
And it returned all necessary data like seller info, photos etc.
Logically seller info and photos are small pieces of data (Item can only have 1 seller and no more than 10 photos for example), but number of comments might be way larger collection with relationship (comment author).
Does it make sense to separate one endpoint into 2 independent endpoints like:
http://api.example.com/items/:id
http://api.example.com/items/:id/comments
What are downsides of this approach? Is it common practice? Or maybe I misunderstood some concept?
One downside might be 2 request performed, but on the other hand, first endpoint should return data faster (as it's lighter than fetching n of comments), so page might be displayed faster and display spinner for comments section. This way I'll be able to paginate comments too.
Are there any improvements that might be included in this separation of endpoints? Or maybe I'm totally wrong and it should be done totally different way?

I think it is a good approach if:
The number of comments of one item can be large, because with this approach you could paginate it easier.
If you are going to need to access to the comments of one item without needing rest of item information
I think any of the previous conditions justify this decition, and yes, it is common approach.

Union find in python3

I know how to implement union find in general, but I was thinking of whether there would be a way to utilize the set structure in python to achieve the same result.
For example, we can union sets pretty easily. But I'm not sure how to determine if two elements are in the same set using just sets.
So, I am wondering if there is a data structure in python that would support such operation, other than the usual implementation?

You could always solve this problem by visualizing it as a tree and its nodes connecting to each other via the root, and then looking up the tree if you want to know if two nodes are connected. If the two nodes you are comparing has the same root (they are in the same tree), than they are connected.
To connect two nodes, just go to the root of each tree they are in, and make one root become the parent of the other.
This video will give you a great intuition about it:
https://www.youtube.com/watch?v=YIFWCpquoS8&list=PLUX6FBiUa2g4YWs6HkkCpXL6ru02i7y3Q&index=1
The connection between the tree nodes can be made via pointers in a language which supports it, but if your language dont (python), than you can create your own pointers by storing positions and links via an array.
The array would be such that its positions would represent your nodes, and the values inside it represents the connection of the specific node to its root. On the beginning, the position in the array is filled with the node number because the nodes has initially no parent, but as you connect nodes, the roots changes, and the array has to represent this. Actually, the value stored there is the identificator of the root.
But try visualizing the problem visually first instead of thinking of arrays and too much mathematical artificats. Visually dealing with it makes the solution sound banal, and can be a good guidance while writing code.
I say this because I have watched the video from Robert Sedgewick I just posted, with a graphical simulation of the solution, and implemented myself without paying too much attention to the code on his book. The intuition the video gave me is much more valuable than any mathematics.
It will help you to encapsulate the nodes into a class, with the following methods:
climbTreeFromNodeUpToRoot
setNewParentToThisNodeAndUpdateHeights
The first method, as the name says, takes you from a node and goes up the tree until finding the root of it, which is then returned.
If you compare two nodes with this method (actually, the roots returned by it), you know easily if they are connected by just comparing their roots.
Once you want to connected them, you go up the trees of both nodes, and ask one root to take the other one as its parent.
The trees can grow very big in height (sorry I dont use the official nomeclature, but this is the one that makes sense to me), so this simple approach will get very slow when you have to climb the tree at a later time.
To prevent trees from becoming to high, dont just set one root as the parent to another without criterium, but attach the smallest tree (in terms of height, not quantity of elements) to the highest one.
For this, you need to know the heights of each tree, and this information you can store on their respective root (via an extra array in your case, or an extra pointer from each node in other languages). This information should be updated everytime another tree connects to it.
It is not possible for a tree to know that she just got a new tree attached to it, so its important that every tree attaching to a second one informs the second as to update its height.
This information can be sent to the root of the second tree, and later used to judge (as writen before) which tree is the smallest. Remember, attaching a small tree to a big one instead of the opposite will save you incredible amounts of time.

Do you want something like this?
myset = ...
all(elt in myset for elt in (a,b))

Dynamics CRM 2011 Import Data Duplication Rules

I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.
Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.
Let me give an example. For example these are the particular rules for Import:
First Name, exact match, 10 pts
Last Name, exact match, 15 pts
Email, exact match, 20 pts
Mobile Phone, exact match, 5 pts
And then the Threshold value => 19 pts
Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate
If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate
What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?
Thank you for all the help.
Updated
Hi Konrad, thank you for your suggestions, let me elaborate here:
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.
Workflow. Run a process removing any instance calculated as a duplicate.
Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.

I haven't been fiddling a lot with duplicate detection but looking at your criteria you might be able to make rules that match those, pretty much three rules to cover your cases, full name match, last name and mobile phone match and email match.
If you want to do the points system I haven't seen any out of the box components that solve this, however CRM Extensions have a product called Import Manager that might have that kind of duplicate detection. They claim to have customized duplicate checking. Might be worth asking them about this.
Otherwise it's custom coding that will solve this problem.

I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
Workflow. Run a process removing any instance calculated as a duplicate.
You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).
1 3 5 7 -> 1 _ 5 _
3 1 5 7 -> _ 3 _ 7
Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.
Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.
In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?

Freebase: Format search result to list all properties of object of unknown type(s)

I'm trying to write a MQL query to format a search result in freebase (the "output" parameter in the search API). I essentially want to find the (simple) values of all the properties of a given search result (without knowing anything about the types of the result a priori). By "simple", I mean only the default properties if the values are complex objects.
E.g., if I search for "Yo La Tengo" and this takes me to the result for "/en/yo_la_tengo", I want to be able to get the group's members (I just need names, not instruments or dates started), albums (again, just names), films contributed to (again, just names), etc.
Is there a simple way to do this with a search output query, given that I know nothing about the types? I imagine there's some sort of reflection magic I can use, and I've tried mucking about with "/type/reflect", but I'm not getting anywhere. I'm brand-new to MQL (though I have extensive SQL experience), so this is a little daunting. Any ideas?
Edit: So to clarify, I think the problem I'm seeing is due to mediator types like "performance" (an actor in a film) or "marriage". E.g., with a query about Yo La Tengo, I can see most (all?) information that I'm interested in, but a similar query about [The Muppet Movie]( freebase.com/api/service/search?limit=1&mql_output=%5B%7B%22%2Ftype%2Freflect%2Fany_reverse%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_master%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_value%22%3A%5B%7B%7D%5D%7D%5D&query=The%20Muppet%20Movie -- sorry, SO thinks I'm a spammer so I can't make this a link), I don't see Frank Oz reference at all (probably because his performance is referenced instead). Is there a generic way for me to "follow" mediator types to get all their properties? E.g., is there a single output MQL that would allow me to get the actor in a performance (when linked form a film search result) and give the the spouse in a marriage (when linked from a person)?

Querying not only every property, but then following those properties another ply deep in the graph for all search results is going to be an incredibly expensive operation. What is the use case for this? Do you really have a UI where the user can see and effectively absorb all this information? To answer the question directly though, it's not possible to unpack mediator types automatically using mql_output on the search API.
I'd suggest combining a basic set of information on the search query with a deeper set of information on a topic that the user has expressed interest (e.g. by hovering over). This UI experience would be similar to that of Freebase Suggest.
In the years since the question was originally asked there have been some additional useful things added such as the "notable" pseudo-property which lets you see what the topic is notable for.
Of course everyone also needs to be moving to the new API, so the queries would be:
https://www.googleapis.com/freebase/v1/search?query=%22the%20muppet%20movie%22&limit=1&indent=true
https://www.googleapis.com/freebase/v1/topic/en/the_muppet_movie

AFAIK there is no way to do this in outright MQL, but you can:
Get all the properties of an object or type of object, then
Programmatically construct another MQL query to get those objects you want to know more about.
Look at this example:
[{
"type|=": [
"/film/actor",
"/tv/tv_actor",
"/celebrities/celebrity"
],
"*": [{}]
}]
It grabs all the properties of all objects that have the type actor, tv_actor, or celebrity. When you run it, you'll see all the possible "follow" points you can explore.
This is not exactly what you want, but it should get you closer.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string