Im trying to self join,merge parent field and get results as separate documents
data:
[
{_key="1",name":"a",mf:"xyz"},
{_key="2","name":"b", "parent":"1"},
{_key="3","name":"c", "parent":"1"},
{_key="4",name":"d",mf:"xyzw"},
{_key="5","name":"e", "parent":"4"},
]
query:
for i in data
let o=i.parent>0 ? (for d in data filter i._key==d.parent return merge(d,{mf:i.mf}) : i
return o
expected result:
[
{_key="1",name":"a",mf:"xyz"},
{_key="2","name":"b", "parent":"1",mf:"xyz"},
{_key="3","name":"c", "parent":"1",mf:"xyz"},
{_key="4",name":"d",mf:"xyzw"},
{_key="5","name":"e", "parent":"4",mf:"xyzw"},
}
is this possible to do in arangodb ?
here you can find examples on how to join collections (and self joins further down the page).
In your particular case the query could look something like this:
for i in data
return i.parent == null ? i : MERGE(i, {
mf: (for j in data filter j._key == i.parent return j.mf)[0]
})
Related
I have one JavaRdd records
I would like to create 3 JavaRdd from records depending on condition:
JavaRdd<MyClass> records1 =records1.filter(record -> “A”.equals(record.getName()));
JavaRdd<MyClass> records2 =records1.filter(record -> “B”.equals(record.getName()));
JavaRdd<MyClass> records13=records1.filter(record -> “C”.equals(record.getName()));
The problem is, that I can do like I show above, but my records may have millions record and I don’t want to scan all records 3 times.
So I want to do it in one iteration over the records.
I need something like this:
records
.forEach(record -> {
if (“A”.equals(records.getName()))
{
records1(record);
}
else if (“B”.equals(records.getName()))
{
records2(record);
}
else if (“C”.equals(records.getName()))
{
records3(record);
}
});
How can I achieve this in Spark usin JavaRDD?
In my idea you can use "MapToPair" and new a Tuple2 object in each of your if condition block. Then your key in the Tuple2 will help you to find each rdd objects type. In other words, Tuple2s key shows the type of the objects you wanted to store in one rdd and it's value is your main data.
your code would be something like below:
JavaPairRdd<String,MyClass> records1 =records.forEach(record -> {
String key = "";
if (“A”.equals(record.getName()))
{
key="A";
}
else if ("B".equals(record.getName()))
{
key="B";
}
else if ("C".equals(record.getName()))
{
key="C";
}
return new Tuple2<>(key, record);
});
the resulting pairrdd objects can be divided by different keys you have used at foreach method.
I wrote this query and as my understanding of the business rules has improved I have modified it.
In this most recent iteration I was testing to see if indeed I had some redundancy that could be removed. Let me first give you the query then the error.
public List<ExternalForums> GetAllExternalForums(int extforumBoardId)
{
List<ExternalForums> xtrnlfrm = new List<ExternalForums>();
var query = _forumExternalBoardsRepository.Table
.Where(id => id.Id == extforumBoardId)
.Select(ExtForum => ExtForum.ExternalForums);
foreach (ExternalForums item in query)
{
xtrnlfrm.Add(new ExternalForums { Id = item.Id , ForumName = item.ForumName, ForumUrl = item.ForumUrl });
}
return xtrnlfrm;
}
Now in case it isn't obvious the query select is returning List of ExternalForums. I then loop through said list and add the items to another List of ExternalForums object. This is the redundancy I was expecting to remove.
Precompiler was gtg so I ran through it one time to very everything was kosher before refactoring and ran into a strange error as I began the loop.
Unable to cast object of System.Collections.Generic.HashSet
NamSpcA.NamSpcB.ExternalForums to type NamSpcA.NamSpcB.ExternalForums.
Huh? They are the same object types.
So am I doing something wrong in the way I am projecting my select?
TIA
var query = _forumExternalBoardsRepository.Table
.Where(id => id.Id == extforumBoardId)
.Select(ExtForum => ExtForum.ExternalForums);
This query returns IEnumerable<T> where T is type of ExtForum.ExternalForums property, which I would expect to be another collection, this time of ExternalForum. And the error message matches that, saying you have IEnumerable<HashSet<ExternalForums>>.
If you need that collection of collections to be flattened into one big collection of ExternalForums use SelectMany instead:
var query = _forumExternalBoardsRepository.Table
.Where(id => id.Id == extforumBoardId)
.SelectMany(ExtForum => ExtForum.ExternalForums);
I was following this guide on couchdb http://guide.couchdb.org/draft/cookbook.html#unique in order to return a distinct list from a view.
My map function looks like:
function(doc) {
if(doc.PartnerName !=null) {
emit(doc.PartnerName, null);
}
}
And, I have a reduce function:
function(keys, values) {
return true;
}
When I run this by hitting:
/dbName/_design/Partners/_view/my-view-name
I get this back:
{"rows":[
{"key":null,"value":true}
]}
If I add ?reduce=false to the end, I get back sort of the desired result:
{
"total_rows":11,"offset":0,
"rows":[
{"id":"a","key":"PARTNER_ONE","value":null},
{"id":"b","key":"PARTNER_ONE","value":null},
{"id":"c","key":"PARTNER_ONE","value":null},
{"id":"d","key":"PARTNER_ONE","value":null},
{"id":"e","key":"PARTNER_ONE","value":null},
{"id":"f","key":"PARTNER_ONE","value":null},
{"id":"g","key":"PARTNER_TWO","value":null},
{"id":"h","key":"PARTNER_TWO","value":null},
{"id":"i","key":"PARTNER_TWO","value":null},
{"id":"j","key":"PARTNER_THREE","value":null},
{"id":"k","key":"PARTNER_FOUR","value":null}
]}
However, I'm ideally trying to get a distinct list, so in the above example, it'd be PARTNER_ONE, PARTNER_TWO, PARTNER_THREE, PARTNER_FOUR
I think you are missing the group=true parameter. Try to query
/dbName/_design/Partners/_view/my-view-name?group=true
and see if that gives you the correct result.
Given the following object structure:
{
key1: "...",
key2: "...",
data: "..."
}
Is there any way to get this object from a CouchDB by quering both key1 and key2 without setting up two different views (one for each key) like:
select * from ... where key1=123 or key2=123
Kind regards,
Artjom
edit:
Here is a better description of the problem:
The object described above is a serialized game state. A game has exactly one creator user (key1) and his opponent (key2). For a given user I would like to get all games where he is involved (both as creator and opponent).
Emit both keys (or only one if equal):
function(doc) {
if (doc.hasOwnProperty('key1')) {
emit(doc.key1, 1);
}
if (doc.hasOwnProperty('key2') && doc.key1 !== doc.key2) {
emit(doc.key2, 1);
}
}
Query with (properly url-encoded):
?include_docs=true&key=123
or with multiple values:
?include_docs=true&keys=[123,567,...]
UPDATE: updated to query multiple values with a single query.
You could create a CouchDB view which produces output such as:
["key1", 111],
["key1", 123],
["key2", 111],
["key2", 123],
etc.
It is very simple to write a map view in javascript:
function(doc) {
emit(["key1", doc["key1"]], null);
emit(["key2", doc["key2"]], null);
}
When querying, you can query using multiple keys:
{"keys": [["key1", 123], ["key2", 123]]}
You can send that JSON as the data in a POST to the view. Or preferably use an API for your programming language. The results of this query will be each row in the view that matches either key. So, every document which matches on both key1 and key2 will return two rows in the view results.
I also was struggling with simular question, how to use
"select * from ... where key1=123 or key2=123".
The following view would allow you to lookup customer documents by the LastName or FirstName fields:
function(doc) {
if (doc.Type == "customer") {
emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
emit(doc.FirstName, {LastName: doc.LastName, Address: doc.Address});
}
}
I am using this for a web service that queries all my docs and returns every doc that matches both the existence of a node and the query. In this example I am using the node 'detail' for the search. If you would like to search a different node, you need to specify.
This is my first Stack Overflow post, so I hope I can help someone out :)
***Python Code
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
import httplib, json
from tornado.options import define,options
define("port", default=8000, help="run on the given port", type=int)
class MainHandler(tornado.web.RequestHandler):
def get(self):
db_host = 'YOUR_COUCHDB_SERVER'
db_port = 5984
db_name = 'YOUR_COUCHDB_DATABASE'
node = self.get_argument('node',None)
query = self.get_argument('query',None)
cleared = None
cleared = 1 if node else self.write('You have not supplied an object node.<br>')
cleared = 2 if query else self.write('You have not supplied a query string.<br>')
if cleared is 2:
uri = ''.join(['/', db_name, '/', '_design/keysearch/_view/' + node + '/?startkey="' + query + '"&endkey="' + query + '\u9999"'])
connection = httplib.HTTPConnection(db_host, db_port)
headers = {"Accept": "application/json"}
connection.request("GET", uri, None, headers)
response = connection.getresponse()
self.write(json.dumps(json.loads(response.read()), sort_keys=True, indent=4))
class Application(tornado.web.Application):
def __init__(self):
handlers = [
(r"/", MainHandler)
]
settings = dict(
debug = True
)
tornado.web.Application.__init__(self, handlers, **settings)
def main():
tornado.options.parse_command_line()
http_server = tornado.httpserver.HTTPServer(Application())
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()
if __name__ == '__main__':
main()
***CouchDB Design View
{
"_id": "_design/keysearch",
"language": "javascript",
"views": {
"detail": {
"map": "function(doc) { var docs = doc['detail'].match(/[A-Za-z0-9]+/g); if(docs) { for(var each in docs) { emit(docs[each],doc); } } }"
}
}
}
Let's say I have blog entries like these in my CouchDB database:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Mary", "postdate":"20110411", "subject":"and this", "message":"blah-blah"}
{"name":"Joe", "postdate":"20110410", "subject":"And other thing", "message":"yada-yada"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
It's pretty easy to get a list of all posts. But how do I get a list of latest posts from all the users?
Like that:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
Try with this map function:
function(doc) {
if (doc.postdate && doc.name) {
emit([doc.name, doc.postdate], 1);
}
}
and the following reduce function:
function(keys, values, rereduce) {
var max = 0,
ks = rereduce ? values : keys;
for (var i = 1, l = ks.length; i < l; ++i) {
if (ks[max][0][1] < ks[i][0][1]) max = i;
}
return ks[max];
}
and query it with group_level=1. It gives you the _id of the posts, then you can retrieve them all with a single query with the keys parameter or using a POST.
I am not sure if this is the best approach, but it seems to work.
UPDATE: fixed map to handle rereduce correctly.
You're going to emit the postdate as the key because keys are sorted. For example, this is what your map function will look like...
function(doc) {
if(doc.postdate) {
emit(doc.postdate, doc);
}
}
That will give you all the docs sorted ascending by postdate. If you want descending then query with ?descending=true
Cheers.