Why there is not any Bolt for storing crawl results in Stormcrawler when we are using RDBMS? - stormcrawler

I want to use Stormcrawler with an RDBMS engines like Oracle, MySQL, or Postgres. But in the storm-crawler-sql module, we only have a SqlSpout and a StatusUpdaterBolt. We did not find any class for indexing crawl results to the SQL database. Is there any technical reason behind this?

What's wrong with the IndexerBolt?

Related

How to Migrating Lucene to Solr

I am Japanese and I am using a translator so sorry if my English is strange.
I am working on a website for my job and I am looking for a way to migrate the search function from Lucene to Solr.
Is there any software or other software out there that would make this possible? (Is it distributed on official websites?)
And if not, what means are available? Please let us know how to do this. If you have a similar answer, please provide a link to it.
You cannot migrate Lucene to Solr as Solr is built on top of Lucene.
Lucene is the core search engine library, Solr is the search server.
Apache Lucene is the base version for the search engine while Apache Solr is inherited Lucene with new inbuilt features which Apache Lucene don't provide out of the box.
There are no direct option or tools available for the same.
There are many option available for indexing data to solr.
It all depends on where is your data. like if you have data in database one way is to use DataImportHandler of solr to index data in solr.
Please refer the documentation for more options here

Nodejs/Vuejs implementing Elasticsearch

I am new to Elasticsearch and also confused how do I actually start implementing it. I have developed an office management software where on a daily basis tasks and other information based to that task belonging to a specific clients are stored. I have written API's in nodejs and the front-end in vuejs and MySQL db is used. So I want to implement a search functionality using Elasticsearch wherein user can search the tasks with any parameters they would like to.
Listed below are some of my questions
Now do Elasticsearch will work as an another db. If so, then how do I keep the record updated in Elasticsearch db as well.
Would it effect the efficiency in any way.
Also what is kibana and logstash in simple terms.
Is implementing Elasticsearch on client side is a good idea? Is Yes, then how can I implement Elasticsearch and kibana using vuejs?
I am confused with all the above things, can anyone kindly share their knowledge on the above listed questions and also tell which articles/docs/videos should I refer for implementing Elasticsearch in the best possible way?
Elasticsearch
It is a data store, all the JSON data will(Single Record/Row) be stored in indexes(Tables)
Update the records in elasticsearch using your backend only, even though we have packages available to connect the frontend to Elasticsearch.
Efficiency, nothing gets affected except the new stack in your application.
Implementing elasticsearch on the client-side is not a recommended option, Same code same API can be used till your MySQL DB connection, add a function to save update any data along with MySQL save call.
Example : MySQLConfig.SaveStudent(student)
ElasticsearchConfig.SaveStudent(student)
Till here there is no code change needed to save/update/delete/getByPrimaryID/GetByParamSearch,
For `getByPrimaryID/GetByParamSearch` search, you have to create a different API either to elasticsearch to MySQL but not both.
Kibana
GUI for your Elasticsearch - Look at it like dbForge Studio, MySQL Workbench, phpMyAdmin
Other than GUI it has a lot of other functionalities like cluster monitoring, all the elastic stack monitoring, analytics, and so on.
Logstash
It ships many files and save it into elasticsearch index, this is not needed until u need it for use cases like
application-prod.log to searchable index
Kafka topic to searchable index
MySQL Table to searchable index
Huge list of use-cases available to ship anything and make it a searchable index
To understand clearly about index, mappings, document in elasticsearch vs database, table, scheme, record in MySQL read from here

Where to use Neo4j

I'm actually trying to learn new things...
I used SQL for a long time, using MySQL and recently discovered document-oriented databases.
I came across graph-databases & Neo4j and want to try it through NodeJS but I really don't get the point.
Should I use Neo4j coupled with another DB? Like storing my data into MySQL & relationships in Neo4j?
Or may I use Neo4j to store data (like posts)?
Neo4j is often used as the primary database, see https://github.com/thingdom/node-neo4j for a node.js driver. Also, depending on your use case, you can use it with MySQL in different scenarios for complex queries that take a long time in MySQL like recommendations and other path queries, see http://docs.neo4j.org/chunked/snapshot/data-modeling-examples.html for some interesting starting examples.
/peter

searching and retrieving data using node.js?

I was wondering is node.js good fit for searching massive amount of data, i know its main use is for asynchronous sceanrious like chat, ftp and real time etc. I was thinking of using node.js with mongodb to search 300,000 records of books for the library at my university, and see if it would oppose to using php & mysql. any advice would be great thanks.
Node.js would be a fine application interface for searching data .. but practically, so would PHP or many other languages :).
Your backend data storage solution (MySQL, MongoDB, ..) is a harder choice and really depends on the how you want to index and search the data.
If your main goal is search you probably want to look into a search application based on something like Apache Lucene. These typically use a relational database backend, although some newer efforts like ElasticSearch do have growing community support for ingesting data from sources like MongoDB (ref: MongoDB River Plugin for ElasticSearch).
Since you mentioned book search and libraries, you might also want to look into ILS (Integrated Library Search) applications which may already solve that problem. There are several open source products such as Koha and Evergreen.
Look at MongooseJS
Absolute perfect fit in my opinion.

NoSQL Word Proximity

Are there any NoSQL databases that support word proximity searching similar to lucene?
I have a client that would like the flexibility of NoSQL with the search power of a Lucene or some other search tool. The average amount of data to be searched is 200GB
Take a look at tjake's Solandra (former Lucandra). "Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra."
Solandra "supports most out-of-the-box Solr functionality (search, faceting, highlights)"
If you can manage a .NET/Win solution also check out RavenDB - has lucene baked into it. If not, Schild's answer is a good one. You can also use lucene separately with MongoDB but your app would have to maintain the index itself...
Lucene is a NoSQL database.
Probably too late to be useful but check out MarkLogic. It's a document database with integrated full-text search (not bolt-on Lucene). You can see a quick demo via http://developer.marklogic.com/try/corona/index

Resources