How to dump all databases with ArangoDB - arangodb

I have ArangoDB running locally with databases, collections, data and graphs from several different projects in it. I'd like to back up everything so I can rebuild my system. I know how to do a backup of a single database, but because I have a bunch I'm hoping to do this in one shot.
Essentially, I'm looking for the ArangoDB equivalent of
mysqldump -u root -p --all-databases > alldb.sql
Obviously the ArangoDB equivalent of
mysql -u root -p < alldb.sql
Would be good to know too.

As of release 3.3, arangodump does not support dumping all databases at once. It is per database.
To make it dump all databases, it can be invoked in a loop over all databases, e.g.
# loop over all databases
for db in `arangosh --javascript.execute-string "db._databases().forEach(function(db) { print(db); });"` # host, user and password go here...
do
arangodump --sever.database "$db" # host, user and password go here...
done
This will work if there is one user that has access privileges for all databases.

While the previous script is almost correct, it won't work with multiple database, since it will start complaining about the dump directory, and asks you to add --overwrite true to the command. This won't work as well, since it'll only output the latest database.
We use the following script, which is slightly changed from the one from stj's answer (or at least the following is part of the backup procedure) to get a dump of all the databases we have:
USER=...
PASSWORD=...
for db in $(arangosh --server.username "$USER" --server.password "$PASSWORD" --javascript.execute-string "db._databases().forEach(function(db) { print(db); });")
do
arangodump --output-directory ~/dump/"$db" --overwrite true --server.username "$USER" --server.password "$PASSWORD" --server.database "$db"
done

I came across this thread and saw that there are a bunch of custom solutions. Just wanted to mention that:
As of Arango v.3.5.0, arangodump (as well as arangorestore) support the --all-databases true parameter
arangodump got an option --all-databases to make it dump all available databases instead of just a single database specified via the option --server.database.
When set to true, this makes arangodump dump all available databases the current user has access to. The option --all-databases cannot be used in combination with the option --server.database.
When --all-databases is used, arangodump will create a subdirectory with the data of each dumped database. Databases will be dumped one after the after. However, inside each database, the collections of the database can be dumped in parallel using multiple threads. When dumping all databases, the consistency guarantees of arangodump are the same as when dumping multiple single database individually, so the dump does not provide cross-database consistency of the data.
arangorestore got an option --all-databases to make it restore all databases from inside the subdirectories of the specified dump directory, instead of just the single database specified via the option --server.database.
Using the option for arangorestore only makes sense for dumps created with arangodump and the --all-databases option. As for arangodump, arangorestore cannot be invoked with the both options --all-databases and --server.database at the same time. Additionally, the option --force-same-database cannot be used together with --all-databases.
If the to-be-restored databases do not exist on the target server, then restoring data into them will fail unless the option --create-database is also specified for arangorestore. Please note that in this case a database user must be used that has access to the _system database, in order to create the databases on restore.
Reference: https://www.arangodb.com/docs/stable/release-notes-new-features35.html#dump-and-restore-all-databases

Related

Setting File and Process Ownership to Non-existent Users

I'd like to know the security implications of changing the ownership of files, and starting process by, a UID that has no mapping to a current user on the system.
Let's say there's a file /foobar, and we've changed it's UID:GID to 1010:1010 — where those IDs are not currently mapped to any existing user — and set it's permissions to 0600. Other users will not be able to perform read or write operations on the file. If the file were instead in a directory owned by an existing user, that user could write to the file. And obvioulsy if a user were created that received the UID 1010, they could also write to the file. If we say that no such user will be created, though, is this a good way to keep that file secure from other users?
The reason I'm asking is that I don't want to run my Docker containers as root, but also don't want to get into the mess of managing user remapping with subuids. I thought the answer could be to run and own things in the container as the UID of the user who owns the files on the host. This seems to work just fine, and the non-existant container user (with the UID of the existing host user) is able to write to those files.
Though I feel there's some important security aspect I'm missing.
The important security aspect here is that, when you docker run the container, you can bind-mount any host directory and run as any user.
# What you expect
docker run \
-u $(id -u) \
-v $(pwd):/data \
--rm \
data-processing-image \
process /data/input.txt
# What's also possible
docker run \
-u root \
-v /:/data \
--rm \
data-processing-image \
cat /data/etc/shadow # to print out the host's encrypted password file
Some tools might print out error messages if the current user ID isn't in /etc/passwd or if the current group ID isn't in /etc/groups, but it'd be a little unusual to run into these cases in Docker. (You wouldn't typically run a heavily customized interactive shell in a container, for instance.) Actual security enforcement is based on the numeric user and group IDs and there aren't particular consequences if the database files don't include them.
One specific case I'll note is the Docker Hub postgres image, which allows running the database as an arbitrary user (not necessarily the in-container postgres user), mostly to support bind-mounted host data directories. The "Arbitrary --user Notes" section there notes
postgres doesn't care what UID it runs as (as long as the owner of /var/lib/postgresql/data matches), but initdb does care (and needs the user to exist in /etc/passwd)
And this statement can be generalized to say that, for enforcing filesystem permissions, the user doesn't need to be in /etc/passwd so long as all of the numeric uids line up, but specific processes may have other expectations.

How to provide read backward compatibility after enabling role-based authentication in cassandra?

We are going to change cassandra setting from authenticator: AllowAllAuthentication to authenticator: PasswordAuthenticator
to enable role-based authentication. There will be two roles:
admin which is a superuser
read-only which is only allowed to read.
I would like to provide backward compatibility for users of the cassandra cluster. More specifically,
many users use
shell script that uses cqlsh
python cassandra package
php cassandra package
to only read data from cassandra. Currently they don't specify any username or password. Therefore
I would like to make read-only role some sort of a "default" role, i.e. if no username and password provided,
then the role is automatically set to read-only so the users can read data and thus clients don't need to change their code.
Is there a way to do this? I'm currently having trouble in the following two parts:
the default user is cassandra if there is no role / user specified in cqlsh. I did not find a way to set default user / role.
and for the default user cassandra, I still have to set a password for it.
Any suggestions would be appreciated! Thanks in advance.
I come from an oracle background, were I've done "sqlplus "/as sysdba"" for years. I like it because the O/S authenticates me. Now, there is something similar in Cassandra, but it isn't secure. Basically in your home directory there is a subdirectory called ".cassandra" (hidden). In that directory there is a file (if there isn't, create one) called "cqlshrc" (so ~/.cassandra/cqlshrc). That file you can add authentication information that will allow someone to log in by simply typing "cqlsh" without anything else (unless you're doing remote where you need "host" and "port"). The cqlshrc file has, among other things an authentication section that looks like this:
[authentication]
username = <your_user_name>
password = <your_password>
So you could simply put your desired username and password in that file and you're essentially able to connect without supplying your username and password (You could also run "cqlsh -u your_user_name" and it will find your password in your cqlshrc file as well).
You can see a few obvious issues here:
1) The password is in clear text
2) If you change the password you need to change the password in the cqlshrc file
I do not recommend you use the "cassandra" user for ANYTHING. In fact, I'd drop it. The reason is because the cassandra user does everything with CL=quorum. We found this out when investigating huge I/O requests coming from OpsCenter and our backup tool (as you can see, we use DSE). They were all using cassandra and pounding on the node(s) that had the cassandra authentication information. It's baked into the code apparently to have CL=quorum - kinda dumb. Anyway, the above is one way to have users log in with a specific user and not provide credentials making it pretty easy to switch.
Hope that helps
-Jim

Couchdb add user database configuration

I want to use Couchdb to create a offline first app, where users can add documents.
Only the user who created a document should be able to change it, otherwise it should only be readable. For this i wanted to use the "peruser" mechanism of couchdb and replicate these documents into a main database where everyone can read.
Is it possible to automatically get the replication and other configurations (like design documents) configured when the database is created by the couch_peruser options?
I found a possible way myself:
add a validation function to the main database to deny writes (http://docs.couchdb.org/en/2.1.1/ddocs/ddocs.html#vdufun)
use _db_updates endpoint to monitor database creation (http://docs.couchdb.org/en/2.1.1/api/server/common.html#db-updates)
create a _replicator document to set up a continuous replication from userdb to main db (http://docs.couchdb.org/en/2.1.1/replication/replicator.html)
One thing to look about is that maintaining a lot of continuous replications requires a lot of system resources.
Another way is to create authorships with design documents. With this aproach we don't need to maintain replications to the main database, because every entry can be hold in one database (main database in my case).
http://guide.couchdb.org/draft/validation.html#authorship

CouchDB _replicator database requires a password for the local target?

I'm using the CouchDB _replicator database and am surprised to find that I have to put a full URL to localhost:5984 with username and password in the "target" field; just the database name by itself doesn't work. Does CouchDB just work this way or am I doing something wrong?
Part of CouchDB's real power is the consistency of its approach. Replication just uses standard REST/HTTP(S) requests to do its work. That's why it's so easy to replicate locally or across the world.
The only gotcha here is that CouchDB cheats slightly for (unsecured) local DBs by allowing you to provide just the DB name, not a full URL - although the actual replication calls prepend the rest of the URL to the DB name and go through the same process as any other request.
So, think of replication the same as you'd think of curl from the command line of your local machine, that way having to provide the auth credentials should feel more intuitive.

Multiple remote databases, single local database (fancy replication)

I have a PouchDB app that manages users.
Users have a local PouchDB instance that replicates with a single CouchDB database. Pretty simple.
This is where things get a bit complicated. I am introducing the concept of "groups" to my design. Groups will be different CouchDB databases but locally, they should be a part of the user database.
I was reading a bit about "fancy replication" in the pouchDB site and this seems to be the solution I am after.
Now, my question is, how do I do it? More specifically, How do I replicate from multiple remote databases into a single local one? Some code examples will be super.
From my diagram below, you will notice that I need to essentially add databases dynamically based on the groups the user is in. A critique of my design will also be appreciated.
Should the flow be something like this:
Retrieve all user docs from his/her DB into localUserDB
var groupDB = new PouchDB('remote-group-url');
groupDB.replicate.to(localUserDB);
(any performance issues with multiple pouchdb instances 0_0?)
Locally, when the user makes a change related to a specific group, we determine the corresponding database and replicate by doing something like:
localUserDB.replicate.to(groupDB) (Do I need filtered replication?)
Replicate from many remote databases to your local one:
remoteDB1.replicate.to(localDB);
remoteDB2.replicate.to(localDB);
remoteDB3.replicate.to(localDB);
// etc.
Then do a filtered replication from your local database to the remote database that is supposed to receive changes:
localDB.replicate.to(remoteDB1, {
filter: function (doc) {
return doc.shouldBeReplicated;
}
});
Why filtered replication? Because your local database contains documents from many sources, and you don't want to replicate everything back to the one remote database.
Why a filter function? Since you are replicating from the local database, there's no performance gain from using design docs, views, etc. Just pass in a filter function; it's simpler. :)
Hope that helps!
Edit: okay, it sounds like the names of the groups that the user belongs to are actually included in the first database, which is what you mean by "iterate over." No, you probably shouldn't do this. :) You are trying to circumvent CouchDB's built-in authentication/privilege system.
Instead you should use CouchDB's built-in roles, apply those roles to the user, and then use a "database per role" scheme to ensure users only have access to their proper group DBs. Users can always query the _users API to see what roles they belong to. Simple!
For more details, read the pouchdb-authentication README.

Resources