How to find repeating sub-pattern in path traversing graph neo4j - node.js

To start things off, I'm using a Node server to hit the REST API, is it possible to access the Java CostEvaluator Interface using Cypher or the REST API? I'm unwilling to write this in Java (honesty, lol).
I've reviewed the docs pretty extensively, and I can't even find a place where cypher's allShortestPaths is documented. It'd be wonderful to know what arguments that could take, so if you know where i can read about those, please let me know in comments.
Moving on:
The graph is formed of patterns such as this: sub_path = (start:STEPNODE)<-[cost:COST_REL {cost}]-(axis_node:AXISNODE)-[:TRANSIT_TO]->(end:STEPNODE). There are only 14 (:STEPNODE), which helps constrain things a bit... but there may be many thousands of (:AXISNODE)s. Each (:AXISNODE) has a unique relationship with any (:STEPNODE) pair.
sub_path is a repeating pattern, in pseudo:(node1:STEPNODE)-{sub_path_1_to_2 {cost}}-(node2:STEPNODE)-{sub_path_2_to_3 {cost}}-(node3:STEPNODE)-{sub_path_3_to_4 {cost}}-(node4:STEPNODE), but (node1:STEPNODE)-{sub_path_1_to_4 {cost}}-(node4:STEPNODE) may also exist.
Each sub path has an important {cost} which is measured on the [COST_REL] relationship. I'd like to find the least expensive path from (begin:STEPNODE)-[sub_path*1..5]->(end:STEPNODE) where the total cost is the sum of all {cost}. I see this as being a matter of finding the sum of sub_path.cost, but I haven't yet found a strategy that supports passing a sub_path or cost argument to allShortestPaths() or any similar function with Cypher. On the Algo Endpoint of the API, there's an argument for dijkstra with a cost_property but this endpoint doesn't seem to support passing a sub_path argument.
I'm find with a sub-optimal solution. But I would really like to avoid making more than a few API calls to find a cheap route through the

Related

REST API-server dilemma

I plan to build a server that will use a REST API on NODE.JS with Meteor.
What are the differences between these two methods of writing an API:
1.http://meteorpedia.com/read/REST_API
example: someSrver.com/post/:_id
someSrver.com/post?id=_id
Thanks
I think there's no really difference at least you are handling request correctly.
I think the second style suit better if you have to pass server a lot of parameters for example a service that provide high res image, where you can specify tile size, coordinates and other things like that.
If you use api as an interface to database usually first style is used.
I've develop a rest interface using this library: https://atmospherejs.com/nimble/restivus , it's very easy to use and it use first style.
So, you should read up on REST principles and API design if you want to really understand the why's.
But generally, the rule of thumb is that a URL should represent a resource. The paths generally represent a given "thing" and the querystring represents some kind of filter on that "thing".
So if you "post" object is logically its own entity (e.g.a blog post), then it'd have its own unique url such that a GET to www.example.com/posts/:id would return the one specific blog entry you're talking about.
GET /posts would map to a list of all posts, for example, and GET /posts?tagged=cheetahs would get you a list of all posts filtered to return just those with the tag of 'cheetahs' assigned to them.
This is all rules of thumb and standards. The implementation really doesn't matter and most servers don't care; but there is value in following the standards as they tend to be more maintainable, elegant, and help you not have to make a million design decisions. If you ever want others to integrate with you, it makes it easier for them to know what to expect as well.
According to the URI standard, the query is for non-hierarchical filters while, the path is for hierarchical ones.
I would use query if we are talking about a filtered collection, so the result will be a representation of a collection, for example json []. On the other hand if we are talking about an item, then I would use the path and the json would be an object/item {}. But this is only my own style, you can use which one you prefer. (The URI structure has only routing purposes if you use REST with HATEOAS. I assume you don't.)
In REST the URL/URI is the address of an item or a collection of items. So, to get all addresses for customer 2 you could do this:
/api/customer/2/addresses
If however you wanted just those addresses with a postcode you could go:
/api/customer/2/addresses?withPostcode=1
In this case, the first URL/URI represents a thing/things whereas the second has a modifier, restriction or filter applied to it.
Therefore someSrver.com/post/:_id means get me the post which is known by that ID (though ideally it would be someSrver.com/posts/:_id - note the plural). Whereas the second one (someSrver.com/post?id=_id) implies that everything to the left of the question mark has already narrowed down your thing/things and they now need filtering by an ID property (in this case) on the thing.
It's a subtle distinction in many cases, but I'd sum it up as the first applying a selector/location and the second applying a selector/location with a filter.
Although I didn't implement REST API server in node yet, I want to share with you few important points when you design your server:
Try to use flat paths for the controllers, nested paths are causing confusion.
Avoid custom HTTP methods such as PUT, HEAD and PATCH, not all firewalls like them.
Use applicative error codes with HTTP 200 OK and use the HTTP protocol error codes only for HTTP errors.
See more: http://restafar.com/create-new-rest-server/

Simple Project in ServiceStack

I'm really struggling with the examples and documentation on ServiceStack. I want to do something really simple but none of the examples given seem to map exactly on what I need. I'm also thrown by the new API section on the website and whether that renders the rest of the (basic) documentation obsolete.
I'm just trying to wrap a number of database entities in a service that exposes CRUD REST and SOAP endpoints (need to retain some SOAP support for use by legacy clients/applications).
Let's call these entities
x: id, description
y: id, name
(they are not related in any way - think I can cope with related ones once I get my head around the very basics)
So I've built a solution:
MyAPI
Global.asx
Web.config
MyAPI.Logic
DB Access code?
MyAPI.SeviceInterface
MyAPiService.cs?
MyAPI.ServiceModel
Operations
x.cs
x.Response
y.cs
y.Response
Types
Don't think I need this but like to overengineer my early projects to make future changes easier
Hopefully this seems sensible
Given the very basic format of entity x, what is the best way to structure x.cs and MyAPIService.cs (I assume entity y would just be treated the same) to achieve basic CRUD operations for both REST and SOAP?
Small point but can I implement two GETs - one that passes in an id (and returns a specific x) and one that doesn't receive an id (and returns a list of all x's)?
I've looked at every link on stackoverflow and servicestack.net already so please no pointers to those - I think I'm just missing the point of the exisitng documentation!!
Many Thanks in Advance
Andy

what algorithm does freebase use to match by name?

I'm trying to build a local version of the freebase search api using their quad dumps. I'm wondering what algorithm they use to match names? As an example, if you go to freebase.com and type in "Hiking" you get
"Apo Hiking Society"
"Hiking"
"Hiking Georgia"
"Hiking Virginia's national forests"
"Hiking trail"
Wow, a lot of guesses! I hope I don't muddy the waters too much by not guessing too.
The auto-complete box is basically powered by Freebase Suggest which is powered, in turn, by the Freebase Search service. Strings which are indexed by the search service for matching include: 1) the name, 2) all aliases in the given language, 3) link anchor text from the associated Wikipedia articles and 4) identifiers (called keys by Freebase), which includes things like Wikipedia article titles (and redirects).
How the various things are weighted/boosted hasn't been disclosed, but you can get a feel for things by playing with it for while. As you can see from the API, there's also the ability to do filtering/weighting by types and other criteria and this can come into play depending on the context. For example, if you're adding a record label to an album, topics which are typed as record labels will get a boost relative to things which aren't (but you can still get to things of other types to allow for the use case where your target topic doesn't hasn't had the appropriate type applied yet).
So that gives you a little insight into how their service works, but why not build a search service that does what you need since you're starting from scratch anyway?
BTW, pre-Google the Metaweb search implementation was based on top of Lucene, so you could definitely do worse than using that as your starting point. You can read some of the details in the mailing list archive
Probably they use an inverted Index over selected fields, such as the English name, aliases and the Wikipedia snippet displayed. In your application you can achieve that using something like Lucene.
For the algorithm side, I find the following paper a good overview
Zobel and Moffat (2006): "Inverted Files for Text Search Engines".
Most likely it's a trie with lexicographical order.
There are a number of algorithms available: Boyer-Moore, Smith-Waterman-Gotoh, Knuth Morriss-Pratt etc. You might also want to check up on Edit distance algorithms such as Levenshtein. You will need to play around to see which best suits your purpose.
An implementation of such algorithms is the Simmetrics library by the University of Sheffield.

How to estimate search application's efficiency?

I hope it belongs here.
Can anyone please tell me is there any method to compare different search applications working in the same domain with the same dataset?
The problem is they are quite different - one is a web application which looks up the database where items are grouped in categories, and another one is a rich client which makes search by keywords.
Is there any standard test giudes for that purpose?
There are testing methods. You may use e.g. Precision/Recall or the F beta method to estimate a rate which computes the "efficiency". However you need to make a reference set by yourself. That means you will somehow measure not the efficiency in the domain, more likely the efficiency compared to your own reasoning.
The more you need to make sure that your reference set is representative for the data you have.
In most cases common reasoning will give you also the result.
If you want to measure the performance in matters of speed you need to formulate a set of assumed queries against the search and query your search engine with these at a given rate. Thats doable with every common loadtesting tool.

Generic graphing and charting solutions

I'm looking for a generic charting solution, ideally not a hosted one that provides the following features:
Charting a tuple of values where the values are:
1) A service identifier (e.g. CPU usage)
2) A client identifier within that service (e.g. server IP)
3) A value
4) A timestamp with millisecond/second resolution.
Optional:
I'd like to also extend the concept of a client identifier further, taking the above example further, I'd like to store statistics for each core separately, so, another identifier would be Core 1/Core 2..
Now, to make sure I'm clearly stating my problem, I don't want a utility that collects these statistics. I'd like something that stores them, but, this is also not mandatory, I can always store them in MySQL, or such.
What I'm looking for is something that takes values such as these, and charts them nicely, in a multitude of ways (timelines, motion, and the usual ones [pie, bar..]). Essentially, a nice visualization package that allows me to make use of all this data. I'd be collecting data from multiple services, multiple applications, and the datapoints will be of varying resolution. Some of the data will include multiple layers of nesting, some none. (For example, CPU would go down to Server IP, CPU#, whereas memory would only be Server IP, but would include a different identifier, i.e free/used/cached as the "secondary' identifier. Something like average request latency might not have a secondary identifier at all, in the case of ping). What I'm trying to get across is that having multiple layers of identifiers would be great. To add one final example of where multiple identifiers would be great: adding an extra identifier on top of ip/cpu#, namely, process name. I think the advantages of that are obvious.
For some applications, we might collect data at a very narrow scope, focusing on every aspect, in other cases, it might be a more general statistic. When stuff goes wrong, both come in useful, the first to quickly say "something just went wrong", and the second to say "why?".
Further, it would be a nice thing if the charting application threw out "bad" values, that is, if for some reason our monitoring program started to throw values of 300% CPU used on a single core for 10 seconds, it'd be nice if the charts themselves didn't reflect it in the long run. Some sort of smoothing, maybe? This could obviously be done at the data-layer though, so its not a requirement at all.
Finally, comparing two points in time, or comparing two different client identifiers of the same service etc without too much effort would be great.
I'm not partial to any specific language, although I'd prefer something in (one of the following) PHP, Python, C/C++, C#, as these are languages I'm familiar with. It doesn't have to be open source, it doesn't have to be a library, I'm open to using whatever fits my purpose the best.
More of a P.S than a requirement: I'd like to have pretty charts that are easy for non-technical people to understand, and act upon too (and like looking at!).
I'm open to clarifying, and, in advance, thanks for your time!
I am pretty sure that protovis meets all your requirements. But it has a bit of a learning curve. You are meant to learn by examples, and there are plenty to work from. It makes some pretty nice graphs by default. Every value can be a function, so you can do things like get rid of your "Bad" values.

Resources