Drupal using Solr externally on another machine?

Drupal using Solr externally on another machine? - search

I'd like to use the Drupal solr search module with the Apache Solr Search hosted on an external machine. I know that Acquia offer this as a service. But it's not an affordable option for me. I'd like to install Solr on an inexpensive VPS and have all my various Drupal sites which are on different hosts accessing the search functions. Am I barking up the wrong tree?

The issues that Mauricio brings up are valid, however I'm certain that it's possible to set up a solr server on a separate server without problems for 2 reasons.
1. we are currently using such a setup.
2. acquia.com offers solr as a service (which might be a good solution for you if you don't want to deal with running or setting up your own solr server).
We are currently using a seperate solr server, and because it is in the same local network as the webserver (both on Rackspace cloud) there are no latency issues.
Security is a concern and should not be taken lightly since solr is has very little built in.
The easiest type of security to setup is limiting access to the solr server to only the webserver. There are however more flexible security solutions, but they will probably take some more setting up.

Sure, you can do that. But keep these things in mind:
Security: if your Solr instance is not in the same local network as your Drupal sites, you'll have to carefully set up security in your Solr's Tomcat/Jetty. Having a publicly accessible Solr instance could be a major security problem.
Latency: another issue with a remote Solr server is latency. It's not cool if your carefully optimized website takes 2s to return the search results.
Bandwidth between Solr and the Drupal website could also be a problem but I guess latency is a bigger issue.

Related

Elasticsearch requirements to run on my server

So I am implementing search in my frontend and my backend . I found that most used search engines are algolia and elasticsearch. I decided to go with elasticsearch since it is open source and less costy but I am scared it would be a great overhead to my server if i decided not to host it on elastic cloud or aws or azure.
My Question: Will it be an overhead to my server or its load wont be much

Elasticsearch will run even with a 2GB ram, what you need to consider is what is the usage and how much data you planning to store and how you going to search them.
Better to have a try, setting up is very easy.

Create multiple front-ends hitting same data source

I want to create and host 4-5 websites using the same database. The only difference between the sites will be:
branding (colours and header)
data will be filtered per website (through sql query) and
Each site will be on a separate domain (but can be hosted on same server)
My 1st thought was to use API / Rest model and provision five front-ends in their own sub-domain. But as sites can be hosted on same server (I'm assuming one hosting account which enables multiple sub-domains), I think I can simply connect all sites with connection string to same database, avoiding complexities of using REST.
Is this possible and would i run into database conflicts doing this?
If later, I wanted to add a mobile app client, then will I need to build out a rest interface anyway?
Thanks

The right thing to do here depends a lot on your specific use case, expected load, preferred backend/edge technology, future plans, etc.
Site domains and servers -
The main point here is that you can host your domains/subdomains on the same or different servers. You simply need to update the DNS to point to the correct IP (update the subdomain's A record).
Note: If these sites are all public-facing, then I highly recommend using an edge/proxy server and even consider a load balancer, depending on expected number of visitors (Nginx, or Apache Web Server)
Decoupled architecture is almost always preferred -
I would definitely have an API/REST layer to abstract the database from the sites. This ensures that you establish a contract through which any clients can interact with the backend, including your mobile application. You also don't have to duplicate DB-specific code across the various clients. What if you decided to change your schema? Or even your database solution? Then all clients will be broken and your customers would be unhappy. As a guiding principle, think: if I change any one thing in my architecture, how many other things will need to change as a result? In terms of scalability, this architecture will also allow you to easily spin up more instances of whatever it is you need (databases, REST service, etc) should the need arise.
How do I build and deploy a REST API?Re: #2, to set up a simple custom REST service running on Node.js (and express), this is a good tutorial. The example also walks through setting up and integrating with an in-memory MongoDB database.
Database collisions?If you follow the above steps, this should be a moot point. Node.js/express and the databases expose ways to configure connection pools if the defaults do not suffice. Again, this will depend on your needs - how many concurrent users you expect.

Securing elasticsearch

I am completely new to elasticsearch but I like it very much. The only thing I can't find and can't get done is to secure elasticsearch for production systems. I read a lot about using nginx as a proxy in front of elasticsearch but I never used nginx and never worked with proxies.
Is this the typical way to secure elasticsearch in production systems?
If so, are there any tutorials or nice reads that could help me to implement this feature. I really would like to use elasticsearch in our production system instead of solr and tomcat.

There's an article about securing Elasticsearch which covers quite a few points to be aware of here: http://www.found.no/foundation/elasticsearch-security/ (Full disclosure: I wrote it and work for Found)
There's also some things here you should know: http://www.found.no/foundation/elasticsearch-in-production/
To summarize the summary:
At the moment, Elasticsearch does not consider security to be its job. Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”.
Disable dynamic scripts. They are dangerous.
Understand the sometimes tricky configuration is required to limit access controls to indexes.
Consider the performance implications of multiple tenants, a weakness or a bad query in one can bring down an entire cluster!

Proxying ES traffic through nginx with, say, basic auth enabled is one way of handling this (but use HTTPS to protect the credentials). Even without basic auth in your proxy rules, you might, for instance, restrict access to various endpoints to specific users or from specific IP addresses.
What we do in one of our environments is to use Docker. Docker containers are only accessible to the world AND/OR other Docker containers if you explicitly define them as such. By default, they are blind.
In our docker-compose setup, we have the following containers defined:
nginx - Handles all web requests, serves up static files and proxies API queries to a container named 'middleware'
middleware - A Java server that handles and authenticates all API requests. It interacts with the following three containers, each of which is exposed only to middleware:
redis
mongodb
elasticsearch
The net effect of this arrangement is the access to elasticsearch can only be through the middleware piece, which ensures authentication, roles and permissions are correctly handled before any queries are sent through.
A full docker environment is more work to setup than a simple nginx proxy, but the end result is something that is more flexible, scalable and secure.

Here's a very important addition to the info presented in answers above. I would have added it as a comment, but don't yet have the reputation to do so.
While this thread is old(ish), people like me still end up here via Google.
Main point: this link is referenced in Alex Brasetvik's post:
https://www.elastic.co/blog/found-elasticsearch-security
He has since updated it with this passage:
Update April 7, 2015: Elastic has released Shield, a product which provides comprehensive security for Elasticsearch, including encrypted communications, role-based access control, AD/LDAP integration and Auditing. The following article was authored before Shield was available.
You can find a wealth of information about Shield here: here
A very key point to note is this requires version 1.5 or newer.

Ya I also have the same question but I found one plugin which is provide by elasticsearch team i.e shield it is limited version for production you need to buy a license and please find attached link for your perusal.
https://www.elastic.co/guide/en/shield/current/index.html

How to use mod_security as standalone?

I've seen the module named standalone in the package of Mod_Security; but I'm not sure how to use it after making and installing it!
Is there any good resources for the start up?

It does not appear to be possible; based on what the ModSecurity website says for its modes of operation:
Reverse proxies are effectively HTTP routers, designed
to stand between web servers and their clients. When you install a
dedicated Apache reverse proxy and add ModSecurity to it, you get a
"proper" network web application firewall, which you can use to
protect any number of web servers on the same network. Many security
practitioners prefer having a separate security layer. With it you get
complete isolation from the systems you are protecting. On the
performance front, a standalone ModSecurity will have resources
dedicated to it, which means that you will be able to do more (i.e.,
have more complex rules). The main disadvantage of this approach is
the new point of failure, which will need to be addressed with a
high-availability setup of two or more reverse proxies.
They are considering it separate by created a dedicated host that is used for proxying to internal hosts.
That works; but it's technically not standalone.
I also filed a bug, and it was confirmed by Felipe Zimmerle:
Standalone is a wrapper to Apache internals that allows ModSecurity to be executed. That wrapper still demand Apache pieces. It is true that you can extend your application using the Standalone version although, you will need some Apache pieces

As you have noted ModSecurity is an add on to an existing web server - originally as an Apache module (hence the name) but now also available for Nginx and IIS.
You can run it in embedded mode (i.e. as part of your main web server) or run it in reverse proxy mode (which is basically the same but you set up a separate web server and run it on that, and then direct all traffic through that).
To be perfectly honest I've never found much point in the reverse proxy method. I guess it does mean you could use it on non-supported web servers (i.e. if you are not using Apache, Nginx nor IIS), and it would reduce the load on your main web server, but other than that it seems like an extra step and infrastructure for no real gains. Some people might also prefer to do the ModSecurity checks in front of several web servers but I woudl argue if you have several web servers, then it is likely for performance and resiliency reasons so why not spread the ModSecurity to this level too rather than creating a single point of failure which might be a bottleneck in front of it. Only other reason would be to apply session level rules (e.g. if people are changing session ids), which might ultimately be spread between different web servers but I've never been convinced that those rules are that great anyway.
When I build ModSecurity I get a mod_security2.so library being built but no separate standalone file(s) so I presume you're just seeing this from hunting through the source (I do see a standalone)? I'd say just because there is a "standalone" folder in the source is not a guarantee that it can run as a completely separate, standalone piece.
I'd question why you want to run this as a standalone app even if you could? Web servers have a lot of functionality in them and depending on ModSecurity, which was written for web security, rather than web security and all the other things a web server does (e.g. be quick, understand HTTP protocol, gzip and ungzip...etc), needlessly stretches what ModSecurity would need to handle. So why not use a web server to take care of this and let ModSecurity do what it's good at?
If you are using ModSecurity then I guess you have web apps (presumably with a web server), so why not use it through that?
Finally is there any problem with installing this through Apache (or Nginx or IIS)? It's free software that's well supported and easy to set up.
I guess ultimately I don't understand the reason for your question. Is there a particular problem you are trying to solve, or is this more just curiosity?

Solr / Lucene / Search Hosting

I need some sort of hosted search API for my website where I can submit content and search content with fuzzy logic, where spelling mistakes and grammar won't affect results.
I want to use solr/lucene or whatever technology is out there, without needing to install stuff on my server to reduce setup complexity.
What solr/lucene/othersearch hosting services are there?
I'm read some other posts on stackoverflow, but they are either no longer in business or are wordpress extensions that require server installation (i.e. the processing is done on the server).

You might consider Websolr, of which I am a cofounder, which is exactly the sort of service that you describe.

The thing is, Solr is highly dependant on its datamodel. Or rather how your users search will really affect the way you structure the data model in Solr. As far as I know there aren’t any really good hosting services for Solr yet because you almost always need to do such extensive modifications to the Solr configuration (most notably the schema.xml).
However, with that said, Solr is really easy to get up and running. The example application is bundled with Jetty and runs more or less directly after download.
So unless you have immense scaling issues (read 5-10+ milj documents or a really high query per second load) I’d recommend you to actually install the application on your own server.

Amazon CloudSearch is the best alternate if you do not want to worry about hosting.
http://aws.amazon.com/cloudsearch/
http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/SvcIntro.html

gotosolr - http://gotosolr.com/en
Apache Solr indexes are distributed on 2 hosting companies.
Security is managed by Https and basic http authentication.
Real-time statistics.
Also ready for agencies with multi-accounts and
multi-subscriptions.
Supports Drupal and WPSOLR (https://wordpress.org/plugins/wpsolr-search-engine/)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string