I am thinking of using solr to index the log files generated by applications and allow the support staff to serach the log for trouble shooting. Any body ever did this kind of thing using solr?
Rackspace uses Hadoop and Solr to index terabytes of log data:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
Related
I need some help on finding a way to manage my log information.
I have 20 windows servers build with application on glassfish which generate logs everyday, so to manage these log in case i need to find something specific from all my servers im trying to group all these data on a single server (windows or linux) and filter them according to my specs.
Best regard Egis
It's too broad question but a common solution it's ELK Stack
elasticsearch - to store the data
logstash - to process the data, installing it on servers that generate log to send they to elasticsearch server
kibana - visualize the data
An article explaining the stack solution
https://www.guru99.com/elk-stack-tutorial.html
I am new to Elasticsearch and also confused how do I actually start implementing it. I have developed an office management software where on a daily basis tasks and other information based to that task belonging to a specific clients are stored. I have written API's in nodejs and the front-end in vuejs and MySQL db is used. So I want to implement a search functionality using Elasticsearch wherein user can search the tasks with any parameters they would like to.
Listed below are some of my questions
Now do Elasticsearch will work as an another db. If so, then how do I keep the record updated in Elasticsearch db as well.
Would it effect the efficiency in any way.
Also what is kibana and logstash in simple terms.
Is implementing Elasticsearch on client side is a good idea? Is Yes, then how can I implement Elasticsearch and kibana using vuejs?
I am confused with all the above things, can anyone kindly share their knowledge on the above listed questions and also tell which articles/docs/videos should I refer for implementing Elasticsearch in the best possible way?
Elasticsearch
It is a data store, all the JSON data will(Single Record/Row) be stored in indexes(Tables)
Update the records in elasticsearch using your backend only, even though we have packages available to connect the frontend to Elasticsearch.
Efficiency, nothing gets affected except the new stack in your application.
Implementing elasticsearch on the client-side is not a recommended option, Same code same API can be used till your MySQL DB connection, add a function to save update any data along with MySQL save call.
Example : MySQLConfig.SaveStudent(student)
ElasticsearchConfig.SaveStudent(student)
Till here there is no code change needed to save/update/delete/getByPrimaryID/GetByParamSearch,
For `getByPrimaryID/GetByParamSearch` search, you have to create a different API either to elasticsearch to MySQL but not both.
Kibana
GUI for your Elasticsearch - Look at it like dbForge Studio, MySQL Workbench, phpMyAdmin
Other than GUI it has a lot of other functionalities like cluster monitoring, all the elastic stack monitoring, analytics, and so on.
Logstash
It ships many files and save it into elasticsearch index, this is not needed until u need it for use cases like
application-prod.log to searchable index
Kafka topic to searchable index
MySQL Table to searchable index
Huge list of use-cases available to ship anything and make it a searchable index
To understand clearly about index, mappings, document in elasticsearch vs database, table, scheme, record in MySQL read from here
We have huge log files(~ 100s of Gigs) on multiple web servers that are needed to be searched in real time. These log files are written multiple times/second by different apps. We have recently installed a hadoop cluster on some servers for this purpose. In order to implement search on these logs, I have thought of this design: there is a process running on web servers which creates an inverted-index of logs and cache it in-memory (on web servers itself) and push to HDFS via flume to be stored in Hive when the cache is full (this is much like an LRU cache). This helps in two ways when something is searched for: most recent logs are returned from in-memory cache and is fast and older logs are returned from disk. And since user wants to see latest logs first, this technique works. Can somebody verify if this design will work and scale properly. Are there any better alternatives around?
Thanks
You could store the inverted index in HBase to provide more real-time access to your older logs.
HBase would also likely be a viable alternative to your in-memory cache. You could do this if you wanted to unify the storage platform instead of having it split up. It will obviously be slower than memcached or redis.
A completely different approach could be using Lucene/Solr to index your logs. This has a lot of nice features out of the box for searching.
I need some sort of hosted search API for my website where I can submit content and search content with fuzzy logic, where spelling mistakes and grammar won't affect results.
I want to use solr/lucene or whatever technology is out there, without needing to install stuff on my server to reduce setup complexity.
What solr/lucene/othersearch hosting services are there?
I'm read some other posts on stackoverflow, but they are either no longer in business or are wordpress extensions that require server installation (i.e. the processing is done on the server).
You might consider Websolr, of which I am a cofounder, which is exactly the sort of service that you describe.
The thing is, Solr is highly dependant on its datamodel. Or rather how your users search will really affect the way you structure the data model in Solr. As far as I know there aren’t any really good hosting services for Solr yet because you almost always need to do such extensive modifications to the Solr configuration (most notably the schema.xml).
However, with that said, Solr is really easy to get up and running. The example application is bundled with Jetty and runs more or less directly after download.
So unless you have immense scaling issues (read 5-10+ milj documents or a really high query per second load) I’d recommend you to actually install the application on your own server.
Amazon CloudSearch is the best alternate if you do not want to worry about hosting.
http://aws.amazon.com/cloudsearch/
http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/SvcIntro.html
gotosolr - http://gotosolr.com/en
Apache Solr indexes are distributed on 2 hosting companies.
Security is managed by Https and basic http authentication.
Real-time statistics.
Also ready for agencies with multi-accounts and
multi-subscriptions.
Supports Drupal and WPSOLR (https://wordpress.org/plugins/wpsolr-search-engine/)
We are finding it very hard to monitor the logs spread over a cluster of four managed servers. So, I am trying to build a simple log4j appender which uses solrj api to store the logs in the solr server. The idea is to use leverage REST of solr to build a better GUI which could help us
search the logs and the display the previous and the next 50 lines or so and
tail the logs
Being awful on front ends, I am trying to cookup something with GWT (a prototype version). I am planning to host the project on googlecode under ASL.
Greatly appreciate if you could throw some insights on
Whether it makes sense to create a project like this ?
Is using Solr for this an overkill?
Any suggestions on web framework/tool which will help me build a tab-based front end for tailing.
You can use a combination of logstash (for shipping and filtering logs) + elasticsearch (for indexing and storage) + kibana (for a pretty GUI).
The loggly folks have also built logstash, which can be backed by quite a few things, including lucene via elastic search. It can forward to graylog also.
Totally doable thing. Many folks have done the roll your own. A couple of useful links.. there is an online service, www.loggly.com that does this. They are actually based on Solr as the core storage engine! Obviously they have built a proprietary interface.
Another option is http://www.graylog2.org/. It is opensource. Not backed by Solr, but still very cool!