arangodb-river-elasticsearch -- _mapping mismatch

arangodb-river-elasticsearch -- _mapping mismatch - arangodb

When creating rivers for composed objects, the resulting _mapping is set with the complete nested object definition rather than String field. This causes the data import to fail because the object references are not "dereferenced".
E.g.
collection1: {name: "test", items: [collection2/123, collection2/124] }
collection1: {somefield: "test"}
The resulting _mapping after creating the river for those collections within a single index is:
collection1: {name: String, items: { properties: { somefield: String } } }.
Importing data fails with the following error:
org.elasticsearch.index.mapper.MapperParsingException: object mapping [items] trying to serialize a value with no field associated with it, current value [collection1/123]
How can I either tell the arango db river to dereference the nested objects or set the mapping properly to work with references?

Rivers are now deprecated. I created a mixin for elasticsearch which updates the index when I save/update/delete objects (through my custom ODM).
Simply make yourself a wrapper around your data access layer with high level functions that also updates the ES index.
For example:
class Base(ArangoBase, es.Base):
def save(self):
ret = ArangoBase.save(self)
es.Base.save_es(self)
return ret
def update(self):
ret = ArangoBase.update(self)
es.Base.save_es(self)
return ret
def delete(self):
ret = ArangoBase.delete(self)
es.Base.delete_es(self)
return ret
from elasticsearch import Elasticsearch
class Base(object):
_es = None
_es_index = 'chopchop'
_es_type = None
def save_es(self):
self._es.index(index=self._es_index, doc_type=self._es_type, body=self._doc(), id=self.id)
def delete_es(self):
self._es.delete(index=self._es_index, doc_type=self._es_type, id=self.id)

Related

Transforming list to map in Groovy

I'm trying to transform a list of strings:
def description = """user:some.mail#gmail.com;groups:Team-1, Team-2
user:some.othermail#gmail.com;groups:Team-2, Team-3
user:another.mail#gmail.com;groups:Team-1, Team-3
some other text"""
description = description.split('\\r\\n|\\n|\\r').findAll { it.startsWith('user') }
into a map so it looked like this:
[some.mail#gmail.com: "Team-2, Team-3", some.othermail#gmail.com: "Team-2, Team-3", another.mail#gmail.com: "Team-1, Team-3"]
so I could later iterate getting an e-mail address and corresponding teams.
Sadly with the below code I was only able to partly achieve it and only for one item of the list. I'm stuck with getting it into a loop and get the full result.
def userData = [:]
userData = description[0].split(';').inject([:]) { map, token ->
token.split(':').with {
map[it[0].trim()] = it[1].trim()
}
map
}
Can you give me a hint as for how I could get a map with all the items from the list?

You can use collectEntries method on a list:
def description = """user:some.mail#gmail.com;groups:Team-1, Team-2
user:some.othermail#gmail.com;groups:Team-2, Team-3
user:another.mail#gmail.com;groups:Team-1, Team-3
some other text"""
description = description.split('\\r\\n|\\n|\\r').findAll { it.startsWith('user') }
def map = description.collectEntries {
// split "user:some.mail#gmail.com;groups:Team-1, Team-2"
def split = it.split(';')
// remove "user:" prefix
def email = split[0].split(':')[1]
// remove "groups:" prefix
def groups = split[1].split(':')[1]
// create a map entry
[(email), groups]
}
Then running map.forEach {k, v -> println "key: '${k}', value: '${v}'"} prints following: (standard map to string may be a little bit chaotic in this case)
key: 'some.mail#gmail.com', value: 'Team-1, Team-2'
key: 'some.othermail#gmail.com', value: 'Team-2, Team-3'
key: 'another.mail#gmail.com', value: 'Team-1, Team-3'

How can I store data coming from Binance Websocket?

I am currently trying to store some data streaming from a Binance Miniticker Websocket, but I can't figure out a way to do so.
I would like to append the data to an existing dictionary, so that I can access it as historical data.
def miniticker_socket(msg):
''' define how to process incoming WebSocket messages '''
if msg[0]['e'] != 'error':
for item in msg:
miniticker["{0}".format(item['s'])] = {'low': [],'high': [], 'open': [], 'close':[],'timestamp':[], 'symbol':[] }
miniticker[item['s']]['close'].append(msg[msg.index(item)]['c'])
print(miniticker[item['s']])
else:
print('there has been an issue')
bsm = BinanceSocketManager(client)
#ticker_key = bsm.start_symbol_ticker_socket(crypto, ticker_socket)
miniticker_key = bsm.start_miniticker_socket(miniticker_socket)
bsm.start()
The issue I'm having in the code above is that the data does not get appened, because every time the Websocket calls back the function, it also defines the dictionary as empty. I can't define the dictionary outside the Websocket because the name of the dictionary is given by the item['s'] element inside the socket.
I also tried returning the whole data and calling the callback function in another function but this generates another error saying "msg is not defined."
I would appreciate your feedback on this!
Thanks

Also, you can try to check if a key already exists in the dictionary miniticker:
key = "{0}".format(item['s'])
if key not in miniticker.keys():
miniticker["{0}".format(item['s'])] = {...}
So you will not redefine it as an empty dictionary each time

I think what you might want is a global variable dictionary containing dictionary values that come in from the ticker. You will need something unique for the keys of the global dictionary.
For example, you could use a string datetime:
timestamp_key = datetime.datetime.now().isoformat()
global_dict[timestamp_key] = miniticker["{0}".format(item['s'])] = {'low': [],'high': [], 'open': [], 'close':[],'timestamp':[], 'symbol':[] }
global_dict[timestamp_key][item['s']]['close'].append(msg[msg.index(item)]['c'])
The global dict would end up something like this:
{
"2020-03-25T17:14:19.382748": {
"your_data_key1": { "more": "data" }
},
"2021-03-25T17:15:19.249148": {
"your_data_key1": { "more": "data_from_another_update" }
}
}

Convert WebService Response into Json Arrary and Jsobobject using Groovy

I am testing RESTful webservice using SoapUI. We use Groovy for that.
I am using jsonslurper to parse the response as Object type.
Our reponse is similar to this:
{
"language":[
{
"result":"PASS",
"name":"ENGLISH",
"fromAndToDate":null
},
{
"result":"FAIL",
"name":"MATHS",
"fromAndToDate": {
"from":"02/09/2016",
"end":"02/09/2016"
}
},
{
"result":"PASS",
"name":"PHYSICS",
"fromAndToDate":null
}
]
}
After this, I stuck up on how to.
Get Array (because this is array (starts with -language)
How to get value from this each array cell by passing the key (I should get the value of result key, if name='MATHS' only.)
I could do it using Java, but as just now learning Groovy I could not understand this. We have different keys with same names.

You can just parse it in to a map, then use standard groovy functions:
def response = '''{
"language":[
{"result":"PASS","name":"ENGLISH","fromAndToDate":null},
{"result":"FAIL","name":"MATHS","fromAndToDate":{"from":"02/09/2016","end":"02/09/2016"}},
{"result":"PASS","name":"PHYSICS","fromAndToDate":null}
]
}'''
import groovy.json.*
// Parse the Json string
def parsed = new JsonSlurper().parseText(response)
// Get the value of "languages" (the list of results)
def listOfCourses = parsed.language
// For this list of results, find the one where name equals 'MATHS'
def maths = listOfCourses.find { it.name == 'MATHS' }

How to query for node's subtree in gremlin?

Having a graph, actually it is a tree: vertexes are nodes, edges are labeled as "subnode" and directed from child to parent.
I need to make gremlin query to get recursive structure as this:
node_info = [properties: node.map(),
subnodes: [...list of node_info items...]]
Groovy function describes more precisely what I need to get:
def get_node_hierarchy(node_id) {
def get_hierarchy(node) {
def hierarchy_list = []
for (subnode in node.in('subnode')) {
sub_hierarchy = get_hierarchy(subnode)
hierarchy_list.add(sub_hierarchy)
}
[properties: node.map(), subnodes: hierarchy_list]
}
node = g.V('node_id', node_id).next()
get_hierarchy(node)
}
result = get_node_hierarchy(1)
Is it possible to implement this using a single Gremlin query?

Writing dynamic query results into file

I am trying to write a generic program in Groovy that will get the SQL from config file along with other parameters and put them into file.
here is the program:
def config = new ConfigSlurper().parse(new File("config.properties").toURL())
Sql sql = Sql.newInstance(config.db.url, config.db.login, config.db.password, config.db.driver);
def fileToWrite = new File(config.copy.location)
def writer = fileToWrite.newWriter()
writer.write(config.file.headers)
sql.eachRow(config.sql){ res->
writer.write(config.file.rows)
}
in the config the sql is something like this:
sql="select * from mydb"
and
file.rows="${res.column1}|${res.column2}|${res.column3}\n"
when I run it I get
[:]|[:]|[:]
[:]|[:]|[:]
[:]|[:]|[:]
in the file. If I substitute
writer.write(config.file.rows)
to
writer.write("${res.column1}|${res.column2}|${res.column3}\n")
it outputs the actual results. What do I need to do different to get the results?

You accomplish this by using lazy evaluation of the Gstring combined with altering the delegate.
First make the Gstring lazy by making the values be the results of calling Closures:
file.rows="${->res.column1}|${->res.column2}|${-> res.column3}"
Then prior to evaluating alter the delegate of the closures:
config.file.rows.values.each {
if (Closure.class.isAssignableFrom(it.getClass())) {
it.resolveStrategy = Closure.DELEGATE_FIRST
it.delegate = this
}
}
The delegate must have the variable res in scope. Here is a full working example:
class Test {
Map res
void run() {
String configText = '''file.rows="${->res.column1}|${->res.column2}|${-> res.column3}"
sql="select * from mydb"'''
def slurper = new ConfigSlurper()
def config = slurper.parse(configText)
config.file.rows.values.each {
if (Closure.class.isAssignableFrom(it.getClass())) {
it.resolveStrategy = Closure.DELEGATE_FIRST
it.delegate = this
}
}
def results = [
[column1: 1, column2: 2, column3: 3],
[column1: 4, column2: 5, column3: 6],
]
results.each {
res = it
println config.file.rows.toString()
}
}
}
new Test().run()

The good news is that the ConfigSlurper is more than capable of doing the GString variable substitution for you as intended. The bad news is that it does this substitution when it calls the parse() method, way up above, long before you have a res variable to substitute into the parser. The other bad news is that if the variables being substituted are not defined in the config file itself, then you have to supply them to the slurper in advance, via the binding property.
So, to get the effect you want you have to parse the properties through each pass of eachRow. Does that mean you have to create a new ConfigSlurper re-read the file once for every row? No. You will have to create a new ConfigObject for each pass, but you can reuse the ConfigSlurper and the file text, as follows:
def slurper = new ConfigSlurper();
def configText = new File("scripts/config.properties").text
def config = slurper.parse(configText)
Sql sql = Sql.newInstance(config.db.url, config.db.login, config.db.password, config.db.driver);
def fileToWrite = new File(config.copy.location)
def writer = fileToWrite.newWriter()
writer.write(config.file.headers)
sql.eachRow(config.sql){ result ->
slurper.binding = [res:result]
def reconfig = slurper.parse(configText)
print(reconfig.file.rows)
}
Please notice that I changed the name of the Closure parameter from res to result. I did this to emphasize that the slurper was drawing the name res from the binding map key, not from the closure parameter name.
If you want to reduce wasted "reparsing" time and effort, you could separate the file.rows property into its own separate file. i would still read in that file text once and reuse the text in the "per row" parsing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

arangodb-river-elasticsearch -- _mapping mismatch - arangodb

Related

Transforming list to map in Groovy

How can I store data coming from Binance Websocket?

Convert WebService Response into Json Arrary and Jsobobject using Groovy

How to query for node's subtree in gremlin?

Writing dynamic query results into file

Categories

Resources