I'm processing through Telegram history (txt file) and I need to extract & process quite complex (nested) multiline pattern.
Here's the whole pattern
Free_Trade_Calls__AltSignals:IOC/ BTC (bittrex)
BUY : 0.00164
SELL :
TARGET 1 : 0.00180
TARGET 2 : 0.00205
TARGET 3 : 0.00240
STOP LOS : 0.000120
2018-04-19 15:46:57 Free_Trade_Calls__AltSignals:TARGET
basically I am looking for a pattern starting with
Free_Trade_Calls__AltSignals: ^%(
and ending with a timestamp.
Inside that pattern (telegram message)
- exchange - in brackets in the 1st line
- extract value after BUY
- SELL values in a array of 3 SELL[3] : target 1-3
- STOP loss value (it can be either STOP, STOP LOSS, STOP LOS)....
I've found this Logstash grok multiline message but I am very new to logstash firend advised it to me. I was trying to parse this text in NodeJS but it really is pain in the ass and mad about it.
Thanks Rob :)
Since you need to grab values from each line, you don't need to use multi-line modifier. You can skip empty line with %{SPACE} character.
For your given log, this pattern can be used,
Free_Trade_Calls__AltSignals:.*\(%{WORD:exchange}\)\s*BUY\s*:\s*%{NUMBER:BUY}\s*SELL :\s*TARGET 1\s*:\s*%{NUMBER:TARGET_1}\s*TARGET 2\s*:\s*%{NUMBER:TARGET_2}\s*TARGET 3\s*:\s*%{NUMBER:TARGET_3}\s*.*:\s*%{NUMBER:StopLoss}
please note that \s* equals to %{SPACE}
It will output,
{
"exchange": [
[
"bittrex"
]
],
"BUY": [
[
"0.00164"
]
],
"BASE10NUM": [
[
"0.00164",
"0.00180",
"0.00205",
"0.00240",
"0.000120"
]
],
"TARGET_1": [
[
"0.00180"
]
],
"TARGET_2": [
[
"0.00205"
]
],
"TARGET_3": [
[
"0.00240"
]
],
"StopLoss": [
[
"0.000120"
]
]
}
Background
I have a rocksdb collection that contains three fields: _id, author, subreddit.
Problem
I would like to create a Arango graph that creates a graph connecting these two existing columns. But the examples and the drivers seem to only accept collections as its edge definitions.
Issue
The ArangoDb documentation is lacking information on how I can create a graph using edges and nodes pulled from the same collection.
EDIT:
Solution
This was fixed with a code change at this Arangodb issues ticket.
Here's one way to do it using jq, a JSON-oriented command-line tool.
First, an outline of the steps:
1) Use arangoexport to export your author/subredit collection to a file, say, exported.json;
2) Run the jq script, nodes_and_edges.jq, shown below;
3) Use arangoimp to import the JSON produced in (2) into ArangoDB.
There are several ways the graph can be stored in ArangoDB, so ultimately you might wish to tweak nodes_and_edges.jq accordingly (e.g. to generate the nodes first, and then the edges).
INDEX
If your jq does not have INDEX defined, then use this:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
def INDEX(idx_expr): INDEX(.[]; idx_expr);
nodes_and_edges.jq
# This module is for generating JSON suitable for importing into ArangoDB.
### Generic Functions
# nodes/2
# $name must be the name of the ArangoDB collection of nodes corresponding to $key.
# The scheme for generating key names can be altered by changing the first
# argument of assign_keys, e.g. to "" if no prefix is wanted.
def nodes($key; $name):
map( {($key): .[$key]} ) | assign_keys($name[0:1] + "_"; 1);
def assign_keys(prefix; start):
. as $in
| reduce range(0;length) as $i ([];
. + [$in[$i] + {"_key": "\(prefix)\(start+$i)"}]);
# nodes_and_edges facilitates the normalization of an implicit graph
# in an ArangoDB "document" collection of objects having $from and $to keys.
# The input should be an array of JSON objects, as produced
# by arangoexport for a single collection.
# If $nodesq is truthy, then the JSON for both the nodes and edges is emitted,
# otherwise only the JSON for the edges is emitted.
#
# The first four arguments should be strings.
#
# $from and $to should be the key names in . to be used for the from-to edges;
# $name1 and $name2 should be the names of the corresponding collections of nodes.
def nodes_and_edges($from; $to; $name1; $name2; $nodesq ):
def dict($s): INDEX(.[$s]) | map_values(._key);
def objects: to_entries[] | {($from): .key, "_key": .value};
(nodes($from; $name1) | dict($from)) as $fdict
| (nodes($to; $name2) | dict($to) ) as $tdict
| (if $nodesq then $fdict, $tdict | objects
else empty end),
(.[] | {_from: "\($name1)/\($fdict[.[$from]])",
_to: "\($name2)/\($tdict[.[$to]])"} ) ;
### Problem-Specific Functions
# If you wish to generate the collections separately,
# then these will come in handy:
def authors: nodes("author"; "authors");
def subredits: nodes("subredit"; "subredits");
def nodes_and_edges:
nodes_and_edges("author"; "subredit"; "authors"; "subredits"; true);
nodes_and_edges
Invocation
jq -cf extract_nodes_edges.jq exported.json
This invocation will produce a set of JSONL (JSON-Lines) for "authors", one for "subredits" and an edge collection.
Example
exported.json
[
{"_id":"test/115159","_key":"115159","_rev":"_V8JSdTS---","author": "A", "subredit": "S1"},
{"_id":"test/145120","_key":"145120","_rev":"_V8ONdZa---","author": "B", "subredit": "S2"},
{"_id":"test/114474","_key":"114474","_rev":"_V8JZJJS---","author": "C", "subredit": "S3"}
]
Output
{"author":"A","_key":"name_1"}
{"author":"B","_key":"name_2"}
{"author":"C","_key":"name_3"}
{"subredit":"S1","_key":"sid_1"}
{"subredit":"S2","_key":"sid_2"}
{"subredit":"S3","_key":"sid_3"}
{"_from":"authors/name_1","_to":"subredits/sid_1"}
{"_from":"authors/name_2","_to":"subredits/sid_2"}
{"_from":"authors/name_3","_to":"subredits/sid_3"}
Please note that the following queries take a while to complete on this huge dataset, however they should complete sucessfully after some hours.
We start the arangoimp to import our base dataset:
arangoimp --create-collection true --collection RawSubReddits --type jsonl ./RC_2017-01
We use arangosh to create the collections where our final data is going to live in:
db._create("authors")
db._createEdgeCollection("authorsToSubreddits")
We fill the authors collection by simply ignoring any subsequently occuring duplicate authors;
We will calculate the _key of the author by using the MD5 function,
so it obeys the restrictions for allowed chars in _key, and we can know it later on by calling MD5() again on the author field:
db._query(`
FOR item IN RawSubReddits
INSERT {
_key: MD5(item.author),
author: item.author
} INTO authors
OPTIONS { ignoreErrors: true }`);
After the we have filled the second vertex collection (we will keep the imported collection as the first vertex collection) we have to calculate the edges.
Since each author can have created several subreds, its most probably going to be several edges originating from each author. As previously mentioned,
we can use the MD5()-function again to reference the author previously created:
db._query(`
FOR onesubred IN RawSubReddits
INSERT {
_from: CONCAT('authors/', MD5(onesubred.author)),
_to: CONCAT('RawSubReddits/', onesubred._key)
} INTO authorsToSubreddits")
After the edge collection is filled (which may again take a while - we're talking about 40 million edges herer, right? - we create the graph description:
db._graphs.save({
"_key": "reddits",
"orphanCollections" : [ ],
"edgeDefinitions" : [
{
"collection": "authorsToSubreddits",
"from": ["authors"],
"to": ["RawSubReddits"]
}
]
})
We now can use the UI to browse the graphs, or use AQL queries to browse the graph. Lets pick the sort of random first author from that list:
db._query(`for author IN authors LIMIT 1 RETURN author`).toArray()
[
{
"_key" : "1cec812d4e44b95e5a11f3cbb15f7980",
"_id" : "authors/1cec812d4e44b95e5a11f3cbb15f7980",
"_rev" : "_W_Eu-----_",
"author" : "punchyourbuns"
}
]
We identified an author, and now run a graph query for him:
db._query(`FOR vertex, edge, path IN 0..1
OUTBOUND 'authors/1cec812d4e44b95e5a11f3cbb15f7980'
GRAPH 'reddits'
RETURN path`).toArray()
One of the resulting paths looks like that:
{
"edges" : [
{
"_key" : "128327199",
"_id" : "authorsToSubreddits/128327199",
"_from" : "authors/1cec812d4e44b95e5a11f3cbb15f7980",
"_to" : "RawSubReddits/38026350",
"_rev" : "_W_LOxgm--F"
}
],
"vertices" : [
{
"_key" : "1cec812d4e44b95e5a11f3cbb15f7980",
"_id" : "authors/1cec812d4e44b95e5a11f3cbb15f7980",
"_rev" : "_W_HAL-y--_",
"author" : "punchyourbuns"
},
{
"_key" : "38026350",
"_id" : "RawSubReddits/38026350",
"_rev" : "_W-JS0na--b",
"distinguished" : null,
"created_utc" : 1484537478,
"id" : "dchfe6e",
"edited" : false,
"parent_id" : "t1_dch51v3",
"body" : "I don't understand tension at all."
"Mine is set to auto."
"I'll replace the needle and rethread. Thanks!",
"stickied" : false,
"gilded" : 0,
"subreddit" : "sewing",
"author" : "punchyourbuns",
"score" : 3,
"link_id" : "t3_5o66d0",
"author_flair_text" : null,
"author_flair_css_class" : null,
"controversiality" : 0,
"retrieved_on" : 1486085797,
"subreddit_id" : "t5_2sczp"
}
]
}
For a graph you need an edge collection for the edges and vertex collections for the nodes. You can't create a graph using only one collection.
Maybe this topic in the documentations is helpful for you.
Here's an AQL solution, which however presupposes that all the referenced collections already exist, and that UPSERT is not necessary.
FOR v IN testcollection
LET a = v.author
LET s = v.subredit
FILTER a
FILTER s
LET fid = (INSERT {author: a} INTO authors RETURN NEW._id)[0]
LET tid = (INSERT {subredit: s} INTO subredits RETURN NEW._id)[0]
INSERT {_from: fid, _to: tid} INTO author_of
RETURN [fid, tid]
I have a *.csv file that store two columns of float data.
I am using this function to import it but it generates the data not separated with comma.
data=np.genfromtxt("data.csv", delimiter=',', dtype=float)
output:
[[ 403.14915 150.560364 ]
[ 403.7822265 135.13165 ]
[ 404.5017 163.4669 ]
[ 434.02465 168.023224 ]
[ 373.7655 177.904114 ]
[ 450.608429 208.4187315]
[ 454.39475 239.9666595]
[ 453.8055 248.4082 ]
[ 457.5625305 247.70315 ]
[ 451.729431 258.19335 ]
[ 366.74405 225.169922 ]
[ 377.0055235 258.110077 ]
[ 380.3581 261.760071 ]
[ 383.98615 262.33805 ]
[ 388.2516785 272.715332 ]
[ 408.378174 200.9713135]]
How to format it to get a numpy array like
[[ 403.14915, 150.560364 ]
[ 403.7822265, 135.13165 ],....]
?
NumPy doesn't display commas when you print arrays. If you really want to see them, you can use
print(repr(data))
The repr function forces a str representation not ment for "nice" printing, but for the literal representation you would use yourself to type the data in your code.
Does arangodb provide a utility to list clusters for a given edge definition?
E.g. Given the graph:
Tyrion ----sibling---> Cercei ---sibling---> Jamie
Bran ---sibling--> Arya ---sibling--> Jon
I'd want something like the following:
my_graph._getClusters({edge: "sibling"}) -> [ [Tyrion, Cercei, Jamie], [Bran, Arya, Jon] ]
Provided you have a graph named siblings, then the following query will find all paths in the graph that are connected by edges with type sibling and that have a (path) length of 3. This should match the example data you provided:
LET options = {
followEdges: [
{ type: 'sibling' }
]
}
FOR i IN GRAPH_TRAVERSAL('sibling', { }, "outbound", options)
FILTER LENGTH(i) == 3
RETURN i[*].vertex._key
Omitting or adjusting the FILTER will also find longer or shorter paths in the graph.
Instead of creating objects by writing:
obj: object [
name: "Fork"
id: 1020
]
I'd like to write something like...
obj: something-or-another [name id] ["Fork" 1020]
...and get the same result. An ideal solution would also permit:
obj: something-or-another [name: id:] ["Fork" 1020]
Easy enough to write a something-or-another, but does this fit something already "in the box"?
I don't believe there's a baked-in way to do this. Not difficult though:
func [words values][
set words: context append map-each word words [to set-word! word] none values
words
]
I suppose I could break this down a little:
func [
"Creates an Object From a Block of Words, Assigns Values"
words [block!] "Words used to create object"
values "Value(s) to assign"
][
words: map-each word words [to set-word! word] ; The requisite set-words
append words none ; Initial value for all words
words: context words ; Create our object
set words values ; Assigns our value(s) to the object
words ; returns the object
]
You might employ a different method to interleave blocks, such as:
func [words [block!] values [block!]][
collect [
repeat index max length? words length? values [
keep words/:index
keep values/:index
]
]
]
Here's something that requires at least Rebol 3:
func [
"Create an object based on some words and values."
words [any-word! block!] "Word or block of words"
values [any-type!] "Value or block of values"
/local object
][
object: make object! either block? words [length? words] [1]
set bind/copy/new :words object :values
object
]
If you want to also allow setting unset values, try this:
func [
"Create an object based on some words and values."
words [any-word! block!] "Word or block of words"
values [any-type!] "Value or block of values"
/any "Allows setting words to any value, including unset"
/local object
][
object: make object! either block? words [length? words] [1]
apply :set [bind/copy/new :words object :values any]
object
]
Both of these create objects with self, so if you want to create an object without self you have to do some fancier tricks. See the selfless proposal for details.
I wrote a similar function (Rebol2) just a few days ago:
build-object: func [
"Builds an object from a block"
names [block!] "Field names"
/values val [block!] "Initial values"
/local o name value
] [
o: copy []
names: compose names
o: either values [
parse names [
some [
set name [word! | set-word!]
(append o to-set-word name)
| skip
]
]
set/pad reflect o: context append o none 'words val
o
] [
if any [
parse names [
some [
set name [word! | set-word!]
(append o reduce [to-set-word name none])
]
]
parse names [
(clear o) some [
set name [word! | set-word!] set value any-type!
(append o reduce [to-set-word name :value])
]
]
] [context o]
]
o
]
To build your object you could write any of: (create a function as f: does ["x"])
build-object [name "Fork" id 1020]
build-object [name: "Fork" id: 1020]
build-object/values [name id] ["Fork" 1020]
build-object/values [name: id:] ["Fork" 1020]
build-object [name f]
build-object [name (f)] ;block is composed
build-object [name 5 id f]
build-object [name 5 id 'f]
You can also make objects with fields set to none if you leave the values out, e.g. with
build-object [name id]