I am new to Neo4j and graph databases, but I have a background in relational databases.
My question a request for advice on how to replicate an SQL Server query efficiently in Neo4j.
I am starting a new project which I think is suited to Neo4j due to a number of friend-of-a-friend type relationships I will have to store. I am using Neo4j 1.8.1 and C# to write my application
One part of my project has a section which has a structure comparable to Twitter, and this is where I need help.
I will use a Twitter analogy to explain my problem:
I have a list of text blobs (tweets) and each blob can be in 0, 1 or many categories (hash tags). Unlike Twitter, I also have users which are linked to 0, 1 or many categories.
I conceive the graph looking something like this:
T = text blob node, C = category node, U = user node
T-------C-------U
\_____/ \_____
/ \ \
T-------C-------U
\_____
\
T-------C-------U
_____/ \_____
/ \
T U
When the application is running, I estimate there will be about 10,000,000 records (I will probably archive any more than this), about 100 categories and about 1000 users.
Currently I have a simple SQL Server database to test this:
__________ ______________ ___________ ______________ ________
|Text | |TextCategory | |Category | |UserCategory | |User |
|----------| |--------------| |-----------| |--------------| |--------|
|TextId |-------|TextId |------|CategoryId |-----|UserId |----|UserId |
|Text | |CategoryId | |Name | |CategoryId | |Name |
|DateAdded | |DateAdded | |-----------| |--------------| |--------|
|----------| |--------------|
By copying the DateAdded field from the Text table to the TextCategory table and adding indexes to the 2 linking tables, I can run the following query to return all text items belonging to a category that a user subscribes to ordered by date:
SELECT t.*
FROM Text t
INNER JOIN tc TextCategory ON tc.TextId = b.TextId
WHERE tc.CategoryId IN
(
SELECT CategoyId
FROM UserCategory
WHERE UserId = #UserId
)
ORDER BY tc.AddedDate
In reality I would page the results but for simplicity, I have left this out.
How can I replicate this query in my Neo4j database efficeintly? Is a sub query like this possible in Cypher?
If I used something like this:
u-[:SUBSCRIBES_TO]->c<-[:BELONGS_TO]-t
(My Cypher skills are still quite infantile)
I could scan through the text nodes but I would not be able to use an index on the user. I would end up checking each text node to see if it was linked to the user.
If I scanned through all relationships linked to a user, I would not be able to take advantage of the date ordering index on the text nodes to page results avoid scanning all nodes to find the 10 earliest for example.
As I have mentioned I come from an RDBMS background and I am still thinking about this in a relational database way, so if I am wrong in my theories please let me know.
I think this translates quite directly to neo.
You could put the user nodes in an index and then query like you already stated:
start u=users(<USERID>) match u-[:SUBSCRIBES_TO]->c<-[:BELONGS_TO]-t return t order by t.AddedDate skip(<SKIPPED>) limit(<PAGESIZE>)
Unless i missed something you answered it already.
Related
I have a table that has many other columns such as username, hostname, etc. One of the columns also stores a certain Query.
UserQueryTable
Username
Hostname
CustomQuery
Sam
xyz
some_query_1
David
abc
some_query_2
Rock
mno
some_query_3
Well
stu
some_query_4
When I run a kql such as :
UserQueryTable | where Username == "Sam"
I get:
Username
Hostname
CustomQuery
Sam
xyz
some_query_1
Note the "some_query_1" value under CustomQuery? That is an actual KQL query that is also part of the table result. I want to find a way where I can retrieve the "some_query_1" and EXECUTE it right after my KQL "UserQueryTable | where Username == "Sam""
That CustomQuery query will give me additional info about my alert and I need to get that Query string from the table and execute it.
The CustomQuery in the table looks something like this
let alertedEvent = datatable(compressedRec: string)
[' -----redacted----7ziphQG4Di05dfsdfdsgdgS6uThq4H5fclBccCH6wW8M//sdfgty==']
| extend raw = todynamic(zlib_decompress_from_base64_string(compressedRec)) | evaluate bag_unpack(raw) | project-away compressedRec;
alertedEvent
So basically the 1st Query returns a result where one of the returned column itself contains Queries and I want to be able to run the returned Queries.
The Query_ == CustomQuery
I tried using the User-defined functions but have not been able to come up with something that works. Please help!
AFAIK, for security reasons, you can't do that.
To accomplish what you want you would need to write a client app that get the results of the first query and the runs the second ones.
For reference you can't even reference table names "dynamically" on KQL.
If I understand your question correctly, you have a query that returns a list of queries and would like to get as a result a list of queries out of this set that was actually run.
In that case, you can:
Use .show queries command which returns the list of queries that was executed on your cluster(read more here). Notice .show queries would return the list of queries you ran, or - if you have database admin permissions - the list of queries anyone ran on the database.
Enable diagnostic settings on your cluster, and send Query logs (read more here). This would send all queries that were executed on your cluster to a log analytics workspace of your choosing.
You can then use either of these options and join with your table to figure out which queries were actually executed. For instance, using the first option:
.show queries | join datatable (Query_: string)
[
"Table | where somecol contains 1",
"Table | where somecol contains 2"
] on $left.Text == $right.Query_
There is a better solution than running the query, you can extract the compressed text with a regex and decompress in the same line
Table
| extend Compressed = extract(#"\['([^;]+)']",1,<CompressedTextQuery>)
| extend raw = todynamic(zlib_decompress_from_base64_string(Compressed))
I would like to have query that would return something like for single vm. So query should be showing results of single vm and what kinda log type / solutions it has used and how much.
I don't know if this is even possible to do anything similar maybe? Tips?
With this query I'm able to list total usage for all vm's reporting to laws but I would like to have more details about a single vm
find where TimeGenerated > ago(30d) project _BilledSize, _IsBillable, Computer
| where _IsBillable == true
| extend computerName = tolower(tostring(split(Computer, '.')[0]))
| summarize BillableDataBytes = sum(_BilledSize) by computerName
| sort by BillableDataBytes nulls last
Mostly you would be able to accomplish it by querying standard columns or properties _BilledSize, Type, _IsBillable and Computer.
Below is the sample query for your reference:
union withsource=tt *
| where TimeGenerated between (ago(7d) .. now())
| where _IsBillable == true
| where isnotempty(Computer)
| where Computer == "MM-VM-RHEL-7"
| summarize BillableDataBytes = sum(_BilledSize) by Computer, _IsBillable, Type
| render piechart
Below is the screenshot for illustration:
Related references:
Log data usage - Understanding ingested data volume
Standard columns in logs
I'm building a fastAPI app and I have a complicated query that I'm trying to avoid doing as multiple individual queries where I concat the results.
I have the following tables that all have foreign keys:
CHANGE_LOG: change_id | original (FK ROSTER.shift_id) | new (FK ROSTER.shift_id) | change_type (FK CONFIG_CHANGE_TYPES)
ROSTER: shift_id | shift_type (FK CONFIG_SHIFT_TYPES) | shift_start | shift_end | user_id (FK USERS)
CONFIG_CHANGE_TYPES: change_type_id | change_type_name
CONFIG_SHIFT_TYPES: shift_type_id | shift_type_name
USERS: user_id | user_name
FK= Foreign Key
I need to return the following information:
user_name, change_type_name, and shift_start shift_end and shift_type_name for those whose shift_id matches the original or new in the CHANGE_LOG row.
The catch is that the CHANGE_LOG table might have both original and new, only an original but no new, or only a new but no original. But as the user can select a few options from drop down boxes before submitting the request, I also need to be able to include a filter to single out:
just one user, or all users
any change_type, or a group of change_types
The issue is that I can't find a way to get the user_name guaranteed for each row without inspecting it afterwards because I don't know if the new or original exist or are set to null.
Is there a way in SQLalchemy to have an optional filter in the query where I can say if the original exists use that to get the user_id, but if not then use the new to get the user_id.
Also, if i have a query that definitely finds those with original and new shifts, it will never find those with only one of them as the criteria will never match.
I've also read this and similar ones, and while they'll resolve the issue of conditionally setting some of the filters, it doesn't get around the issue of part nulls returning nothing at all, rather than half the data.
This one seems to solve that problem, but I have no idea how to implement it.
I know it's complicated, so let me know if I've done a poor job of explaining the question.
Sorted. The solution was to use the outerjoin option.
I'm sure the syntax can be more elegant than my solution if I properly engage in adding relationships when defining each class, but what I end up with is explicit and I think it makes it easier to read... at least for me.
Since I'm using a few tables more than once in the same query for different information, it was important to alias those, otherwise I ended up with a conflict (which 'user_id' did you want - it's not clear). For those playing at home, here's my general solution:
new=aliased(ROSTER)
original=aliased(ROSTER)
o_name=aliased(CONFIG_SHIFT_TYPES)
n_name=aliased(CONFIG_SHIFT_TYPES)
pd.read_sql(
db.query(
CHANGE_LOG.change_id,
CHANGE_LOG.created,
CONFIG_CHANGE_TYPES.change_name,
o_name.shift_name.label('original_type'),
n_name.shift_name.label('new_type'),
OPERATORS.operator_name
)
.outerjoin(original, original.shift_id==CHANGE_LOG.original_shift)
.outerjoin(new, new.shift_id==CHANGE_LOG.new_shift)
.outerjoin (CONFIG_CHANGE_TYPES,CONFIG_CHANGE_TYPES.change_id==CHANGE_LOG.change_type)
.outerjoin(CONFIG_SHIFT_TYPES, CONFIG_SHIFT_TYPES.shift_id==new.roster_shift_id)
.outerjoin(o_name, o_name.shift_id==original.roster_shift_id)
.outerjoin(n_name, n_name.shift_id==new.roster_shift_id)
.outerjoin(USERS, or_(USERS.operator_id==original.user_id, USERS.user_id==new.user_id)
).statement, engine)
I am using MonetDB to store and analyze email campaign data. I have one table with about 6 million data. Table has around 30 columns.
When I select some of the data, I realise that data are not correctly inserted/updated.
When I fire "select contactId, email, templateId from statistics.marketing_sent where contactid = '974c47e2'", I expect the following result.
+-----------+---------------------+------------+
| contactid | email | templateid |
+===========+=====================+============+
| 974c47e2 | tom#frerickslaw.com | 34 |
+-----------+---------------------+------------+
But I receive the following result and found email is wrong.
+-----------+-----------------+------------+
| contactid | email | templateid |
+===========+=================+============+
| 974c47e2 | frank#fsfco.com | 34 |
+-----------+-----------------+------------+
I double check my nodejs program that insert and update the data. I do not find any issue.
After that what I do is, I have created new empty table and start inserting/updating on that table. New table has 500k~ data and all ware correct. But I need all those data in main table.
So, I fire "insert into statistics.marketing_sent select * from statistics.marketing_sent_2". And I again found that data are incorrect.
Is there any one who face this kind of issue?
This is weird. Are you absolutely sure that "frank#fsfco.com" is the email associated with the contactid "974c47e2"? Can you please double check the original data that you have inserted into MonetDB (i.e. not selected from "statistics.marketing_sent_2", because how do you know the data in this table are correct?)
Having said that, your problem looks like wrong string pointers have been involved in processing your query. I have seen earlier this year that in some extremely specific cases, the wrong value is used as the OFFSET to point to the string in the string HEAP. That problem was fixed several release ago.
With which MonetDB version were your data inserted? Is this a reproducible problem with the latest release of MonetDB (i.e. Jul2017-SP1)? If the problem persists, can you please submit a bug report (https://www.monetdb.org/bugzilla/) with the table schema, sample data and script/queries to reproduce your problem? Thanks
Jennie
First at all, I'm sorry if I'm re-posting, but I couldn't find anything related.
I'm working on a application that handle very sensitive data. I want to filter this data by user's role.
I've done this (in another job) using Doctrine Filters, but I can't find any information about how to do this using Sequelize (over PostgreSQL).
Eg:
sensitive_information:
| user_id | sensible_value |
+---------+----------------+
| 1 | something |
| 2 | something_else |
I need this:
SELECT *
FROM sensitive_information
WHERE user_id = 1; /** I need this to be added automatically in all
queries to sensitive_information */
So, user 1 never will see information of another user. That's the goal.
Is it possible? I'm open to suggestions.
Thanks in advance.
This can be addressed by using Sequelize scopes.
Since scopes can be functions:
Scopes are defined in the model definition and can be finder objects, or functions returning finder objects
it's easy to define a scope that filters information according to the user that is logged in.
I haven't implemented this exactly scope yet but I have a good idea on how to address it if anyone needs helps with it.