Logstash: is it possible to save documents in memory?

Logstash: is it possible to save documents in memory? - logstash

I am trying to save data in memory that I would be able to retrieve quickly in my filter part. Indeed, when i receive new documents i want to retrieve former documents related in order to compute some new metrics.
Can anyone tell me if it is possible and if yes how could I achieve that ?
Thank you very much.
Joe

The closest thing to achieve this would be to use the elasticsearch filter in order to query an ES cluster for some document or the unofficial memcached filter, which is probably more up to that task given the features of memcached.
I'm not aware of any official/unofficial redis or hazelcast filters, though, but that would also be an option since they are both caching technologies.
You should also have a look at the existing metrics filter which might also be of some help depending on your use case, which, by the way, you should detail a bit more if you desire more precise help.

Related

Neo4J text search through Node.Js app

There are questions like this on here, but no answers.
I need to implement a feature where the two types of nodes (labelled :Hashtags and :Statements) in my Neo4J 2.0 database can be searched by the users from my Node.Js app.
So that means the users enter something they need into a search field, click search, and get the results. A better scenario is that it's more responsive and finds possible matches on the fly.
How would you implement that?
I have some ideas, but unsure about which one to go for:
Each time the user makes a search, make this kind of Cypher query (not very efficient to query the database so much, I guess, and won't work for responsive results suggestions):
MATCH (h:Hashtag{name:"user_query"}), (s:Hashtag{name:"user_query"}) RETURN h,s;
Install something like Elastic Search and let it handle the search (this is what the guys from Linkurio.us have done)
In the first option the .name property of those labeled nodes is, of course, indexed.
The second option seems to be more robust, but I really would like to avoid having to install extra software and having this kind of dependencies.
Maybe you know of a better solution?
Thank you!

I don't understand why the first option would not be responsive?
After all the Neo4j indexing by default is using Lucene, the same as elastic search?
And with an index (or unique constraint) the lookup should be instant.
Did you actually test the performance? (Make sure to use parameters for the actual value)

Implement site wide search with neo4j db using node-neo4j

I am using node-neo4j to communicate with my neo4j. Following github.com/aseemk/node-neo4j-template was a real help to get started. Still learning my way to get things done, I am looking to solve a few issues, I'd appreciate any heads up you give me.
Implement site wide search.
We have users indexed with their email id's, and want to index stories/posts by tags or keywords. How do we search across all nodes, do we maintain indices for all nodes of various types, what would be a good approach? Should I go with google to enable this feature? How to index same node with multiple tags/keywords?
Specify custom id's for nodes
We are fine with integer indices for nodes, but since these id's can be re-used, we would like to identify nodes with unique id's, Is there a way to make neo4j use uuid's, adding an uid attribute would do but want to avoid having to maintain two id's.
Traversing nodes
How do we traverse nodes using node-neo4j, Cipher-lang looks like the answer, I am yet to get used to it. Does node-neo4j help do this out of the box?
Transactions
I may sound silly, but can I do transactional operations with node-neo4j?
Too many questions, I feel most of my doubts would clear once I get more used to querying the db, but any input from you will give me a headstart.

You probably should have broken this up into separate questions. I can answer a couple of them but not all.
Yes, node-neo4j can handle Cypher out of the box, with the query method: https://github.com/thingdom/node-neo4j/blob/develop/lib/GraphDatabase._coffee#L179. Help with Cypher--you should watch this intro video: http://vimeopro.com/neo4j/webinars/video/48603403
For your uuid, you probably should add a separate attribute to the nodes, and have an index on it--just ignore the regular ids except during transient queries where it's more convenient. As far as I know there's no way to override the incrementing ID--that sure would be nice, though.
Hope that helps.

Using Lucene to index private data, should I have a separate index for each user or a single index

I am developing an Azure based website and I want to provide search capabilities using Lucene. (structured json objects would be indexed and stored in Lucene and other content such as Word documents, etc. would be indexed in lucene but stored in blob storage) I want the search to be secure, such that one user would never see a document belonging to another user. I want to allow ad-hoc searches as typed by the user. Lastly, I want to query programmatically to return predefined sets of data, such as "all notes for user X". I think I understand how to add properties to each document to achieve these 3 objectives. (I am listing them here so if anyone is kind enough to answer, they will have better idea of what I am trying to do)
My questions revolve around performance and security.
Can I improve document security by having a separate index for each user, or is including the user's ID as a parameter in each search sufficient?
Can I improve indexing speed and total throughput of the system by having a separate index for each user? My thinking is that having separate indexes would allow me to scale the system by having multiple index writers (perhaps even on different server instances) working at the same time, each on their own index.
Any insight would be greatly appreciated.
Regards,
Nate

Of course, one index.
You can do even better than what you suggested by using ManifoldCF (Apache product that knows how to handle Solr) to manage security.
And one off topic, uninformed suggestion: I'd rather use CloudBees or Heroku (or Amazon) instead of Azure.

Until you will use several machines for indexing I guess it's more convenient to use single index. Lucene community done a lot of work to make indexing process as efficient as it can. So unless you intentionally want to implement distributed indexing I doesn't recommend you to split indexes.
However there are several reasons why you would want to split indexes:
if your machine have several IO devices which could be utilized in parallel. In this case, if you are IO bound, splitting indexes is good idea.
splitting document fields between indexes (this is what ParallelReader is supposed for). This is more exotic form of splitting, but it may be a good idea if search is performed using different groups of fields. Suppose, we have two search query types: the first is using field name and type, and the second is using fields price and discount. If those fields are updated at different rate (I guess, name updates are far more rarely than price updates), updating only part of index would require less IO resources. This will give more overall throughput to the system.

real time indexing of live data stream

Let us say, I am maintaining an index of a lot of documents. I want to update the index for newly arriving data to make it as real time as possible. What kind of indexing tool do I need to look at ? I have looked at Sphinx and Lucene and from previous posts, they are recommended for real time indexing.
The delta indexing mechanism used in Sphinx looks like a pretty neat idea.
Some questions I have are
1) How quickly can the document be searchable once it arrives ?
2) How efficient is the index merge process ? (merging the delta index and the main Index)
I understand these are very general questions and I wanted to get an idea if using Sphinx would be the right way to go about this problem.

Sphinx have real-time indexes which allow add/update/delete indexes on the fly.

You can look at Apache Solr (NRT) and Elastic Search for real-time implementations using Lucene. You can look at some benchmarks.

NoSQL database with high read performances (write accesses are not significant)?

I'm working on a "real-time" website using Nodejs. Currently, I'm using Redis because I need high performance for read-access. The write accesses are not really significant for my use case.
In addition, Redis does not have a query language for the search. So, I create my indexes manually and I use some unions/intersections/... to find some values.
I think that it will be easier to use MongoDB with a embedded finding system and a ORM-like (Mongoose for example). The problem is that I'm not sure that MongoDB is the best choice for my usecase.
What is your advices about the NoSQL DB that I need ? Redis ? CouchDB ? MongoDB ? Cassandra ? etc.
I repeat: I want to have a real good performance for the read accesses and for the searches (the write accesses are not significant), the simplest possible (orm-like ? finding system ? etc.)
Thanks.

I believe that redis would be the better solution for the following reasons.
You require fast read access and redis provides the fastest solution since the keys are in memory, if not most.
Although mongodb is easier to query in the general case, your problem domain is narrow and once you decide how you would like to query the data, you can put the correct data structures and indexes in place.

I would say that Redis is a good fit for your DB, and you should look at something like Solr or elasticsearch to provide your searching.

CouchDB will do better in write heavy environment. I don't use it though.
MongoDB will do better on read heavy environment.
For search and indexing:
MongoDB would require separate index for each of your search criteria for better performance (at least this is what I remember).
Proper index is important in MongoDB. And no joins!!
Here are some links you might go through:
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Hope these will help you find the right db
Goodluck

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Logstash: is it possible to save documents in memory? - logstash

Related

Neo4J text search through Node.Js app

Implement site wide search with neo4j db using node-neo4j

Using Lucene to index private data, should I have a separate index for each user or a single index

real time indexing of live data stream

NoSQL database with high read performances (write accesses are not significant)?

Categories

Resources