Trying to understanding the full workflow of a git-crypt based secret keeping solution.
The tool itself works pretty nicely when on a dev machine, even scaling to multiple developers seems to work fine.
However, it is not clear to me how will this work when deployed to a multiple servers on a cloud, some are created on-demand:
Challenge of unattended creation of GPG key on the new server (someone needs to create the passphrase, or is it in a source control, and than, what is all this even worth?)
Once a GPG is created, how is it being added to the ring?
Say we decide to skip #1 and just share a key across servers, how is the passphrase being supplied as part of the "git-crypt unlock" process?
I've really tried to search, and just couldn't find a good end-to-end workflow.
Like many Linux tools, git-crypt is an example of doing only one thing and doing it well. This philosophy dictates that any one utility doesn't try to provide a whole suite of tools or an ecosystem, just one function that can be chained with others however you like. In this case git-crypt doesn't bill itself as a deployment tool or have any particular integrations into a workflow. Its job is just to allow the git repository to store sensitive data that can be used in some checkouts but not others. The use cases can vary, as will how you chain it with other tools.
Based on the wording of your question I would also clarify that git-crypt is not a "secret keeping solution". In fact it doesn't keep your secrets at all, it just allows you to shuffle around where you do keep them. In this case it enables you to keep secret data in a repository along side non-secret information, but it only does so at the expense of putting the secret keeping burden on another tool. It exchanges one secret for another: your project's version controlled secret component(s) for a GPG key. How you manage the secret is still up to you, but now the secret you need to handle is a GPG key.
Holding the secrets is still up to you. In the case of you and other developers that likely means having a GPG private key file kicking around in your home directory, hopefully protected by a passphrase that is entered into an agent before being dispensed to other programs like git-crypt that call for it.
In the case of being able to automatically deploy software to a server, something somewhere has to be trusted with real secrets. This is often the top-level tool like Ansible or Puppet, or perhaps a CI environment like Gitlab, Travis, or Circle. Usually you wouldn't trust anything but your top level deployment tool with knowing when to inject secrets in an environment and when not to (or in the case of development / staging / production environments, which secrets to inject).
I am not familiar with Circle, but I know with Travis under your projects Settings tab there is an Environment Variables section that you can use to pass private information into the virtual machine. There is some documentation for how to use this. Gitlab's build in CI system has something similar, and can pass different secrets to test vs. deploy environments, etc.
I would suggest the most likely use case for your work flow is to:
Create a special secret variable for use on your production machines that has the passphrase for a GPG key used only for deployments. Whatever you use to create your machines should drop a copy of this key into the system and use this variable to unlock it and add it to an agent.
The deploy script for your project would checkout your git project code, then check for a GPG agent. If an agent is loaded it can try to decrypted the the checkout.
In the case of a developer's personal machine this will find their key, in the case of the auto-created machines it will find the deploy key. Either way you can manage access to the secrets in the deployment environment like one more developer on the project.
Whatever tool you use to create the machines becomes responsible for holding and injecting the secrets, probably in the form of a private key file and a passphrase in an environment variable that is used to load the key file into an agent.
Related
The rule-of-thumb/best practice which I only occasionally see challenged is that you should never commit environment-specific config files (e.g. .env, .ini, etc) to version control. The closest accepted practice I can find is that you may commit a default config file, but with the expectation that you manually edit the config on initial deployment.
However, in a DevOps role I'm not just writing app logic, I'm also automating the deployment of my code and its multiple environments. As a result, there's a specific need to keep track of what my configs look like so I may (re-)deploy my app if /when an environment needs to be recreated. For the same reason as with traditional code then, the most appealing solution is to store my config in the repo, but the question is what's the best and most scalable way to do so?
Obviously I'm not talking about storing any secrets in my config file. I'm also not against a solution that doesn't involve the repo. But I think discounting the repo outright is a bit silly and is a case of adhering to practice out of tradition more than its practical value.
How have others tackled this issue?
EDIT: Ideally, I think, there would be some extension for git that would allow env-specific configs to be associated with their app's repo, but would be segregated (stored in a separate repo?) in such a way as to avoid downloading an env's config when forking/branching a project. That seems well outside the scope of what's available though.
There are two sets of approaches for this. One uses a configuration from a secret store, such as Vault, to store the configuration of your data independent of your repository and inject it through the environment. This lives outside of the repository entirely, but can be configured for different environments and ensures your data is securely encrypted.
The other, where you want to store some configuration in the repository, usually consists of storing the file in a separate directory as a sort of template and then copying it into place and editing it. The place it is used in production is typically ignored. You may choose to use a script for editing it or edit it by hand.
You can also store configuration in a separate, highly restricted repository, but that has all of the problems of checking secrets into a repository.
I have created a Python module which I would like to distribute via PyPI. It relies on a third party API which in turn requires a free API key.
Yesterday I asked this question on how to reference a YAML file (which would contain the API keys) using the true path of the module. However that got me thinking of other ways;
Ask the users to save the API key as an environment variable and have the script check for the existence of said variable
Ask the user to pass in the API key as an **kwargs argument when creating a new instance of the object e.g.
thing = CreateThing(user_key = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', api_key = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
I would like to see what the community thinks on this topic.
I have created a Python module which I would like to distribute via PyPI. It relies on a third party API which in turn requires a free API key.
Even being a free api-key you should never have it in your code, even less have your code distributed to the public with it.
My advice is to never have any secrets on your code, not even default secrets as many developers like to put in their calls to get values from environment variables, configuration files, databases or whatsoever they retrieve them from.
When dealing with secrets you must always raise an exception when you fail to obtain one... Once more don't use default values from your code, not even with the excuse that they will be used only during development and/or testing.
I recommend you to read this article I wrote about leaking secrets in your code to understand the consequences of doing so, like this one:
Hackers can, for example, use exposed cloud credentials to spin up servers for bitcoin mining, for launching DDOS attacks, etc and you will be the one paying the bill in the end as in the famous "My $2375 Amazon EC2 Mistake"...
While the article is in the context of leaking secrets in the code a mobile app, must of the article applies to any type of code we write and commit into repositories.
About your proposed solution
Ask the users to save the API key as an environment variable and have the script check for the existence of said variable
Ask the user to pass in the API key as an **kwargs argument when creating a new instance of the object e.g.
thing = CreateThing(user_key = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', api_key = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
The number 1 is a good one and you sould use here the dot env file approach, maybe using a package like this one. But remember to raise an exception if the values does not exist, please never use defaults from your code.
Regarding solution 2 it is more explicit for the developer using your library, but you should recommend also them the .env file and help them understand how to properly manage them. For example secrets used on .env files should be retrieved from a vault software.
A SECURITY CALL OF ATTENTION
Dot env files cannot be committed into the source code at any time and when committing the .env.example, into your git repo, it must not contain any default values.
Oh you may think if I commit it accidentally to Github I will just clean my commits, rewrite the history and do a force push. Well think twice and see why that will not solve the problem you have created:
Well I have bad news for you... it seems that some services cache all github commits, thus hackers can check these services or employ the same techniques to immediately scan any commit sent to github in a matter of seconds.
Source: the blog post I linked above.
And remember what I have quoted earlier "My $2375 Amazon EC2 Mistake, that was due to leaked credentials in an accidental Github commit.
From this answer: it is recommended to use the OS keyring. In my lib odsclient I ended up implementing a series of alternatives, from the most secure (keyring) to the less ones (OS environment variable, git-ignored text file, arg-passed apikey possibly obfusctated with getpass()).
You might wish to have a look for inspiration, see this part of the doc in particular.
I load sensitive details (eg, AWS Secret) into my node project with dotenv & an .env file, which I include in my .gitignore, as I've read that's best practice.
If I want to add others to the project, or even clone the repo on another system, what's the most efficient/safe way to transmit these sensitive details? I assume email/google drive are out, but I'm not sure what's 'in.'
My repo is private--does that mean the 'don't check-in API keys' advice is less concrete? As I see it, anyone with permission to see the repo will likely need relevant API keys, so it doesn't seem too unreasonable to simply check them in.
Security is always a trade-off with convenience. There are no real absolutes. Are you cool with anyone who accesses the repository and the (potential) git host having access to the keys? Are all developers' computer secure and use disk encryption? What do the keys give access to? Everything on AWS? Do you trust people with access to the repo enough to stay secure and not accidentally share things further? The point is that you need to try to fully understand in which ways the secrets may be leaked.
Personally I rather keep them outside of git and distribute keys to developers via usb sticks or encrypted email. And only to the people who need them. Only a subset of our developers actually need access.
Just make sure that you replace all your secrets if you do decide it's too risky. Don't just delete them from the repository. Once they are in there, they are in there forever.
Ubuntu 14.04
I'm not too sure about this, If I look in the contents of ~/.ssh/ I have a few files in there, I'm just about to setup a key for use with BitBucket.
I'm not sure if I'm meant to have multiple keys for different purposes or if I should have one key that is used for lots of things to identify me.
Cheers
Anyway, the first thing you need is to create a pair of private and public ssh keys. It could be done by executing ssh-keygen command in the terminal.
To be short - the public key (id_rsa.pub) is used by the third-party servers and services like BitBucket to identify you. So you need to provide them this information. For example, add a public key to BitBucket account settings.
The same private/public keys pair could be used by multiple servers and services to identify you at the same time so usually you don't need to create multiple pairs.
I use one key per workstation. On each workstation, I generate a new public/private key pair, and then add that to the authorized keys file (or GitHub/Bitbucket account) of all of the machines I need to interact with via SSH.
That way, if my machine is lost, stolen, or I need to replace the hard drive, I can just de-authorize that one machine by deleting its public key from all of the services, while not needing to rotate my keys on all machines.
I have never found a good reason to create a separate key pair per service on a given workstation; that just increases the management overhead without much tangible benefit. You might do it if you were very privacy minded, and didn't want separate services to correlate your keys, but if you're that privacy minded you should already be accessing everything through Tor and probably have entirely separate accounts for each to avoid leaking any information at all.
Problem
I am setting up a set of e2e tests on an existing web-app. This requires automated login on a login-page (mail & password). So far, as I am still developing the tests, I have been putting the test account credentials in cleartext in my test scripts. I have been removing the credentials manually before each commit, but it will not hold for proper automated testing on a server somewhere, nor if all the developers should be able to run tests from the comfort of their own computers. Furthermore, the tests need to be able to run with several different sets of user credentials, and credential safety is critical. Since we need to test for access rights, it seems that we cannot avoid having at least one test account with access to confidential data.
Question
So my question is: What strategies do you know of, or use, for safely storing and using test credentials in testing environments on developer machines, separate servers, or both?
Prior research
I have spent a few days looking around the web (mostly StackOverflow, and many attempts at using my Google-fu) as well as asking colleagues, but without finding any known and used strategies for handling and storing credentials in tests. I reckon that many skilled programmers must already have solved this problem in numerous ways.
StackOverflow kindly suggested these somewhat similar questions, which offer some interesting strategies:
Safely storing credentials when I need to retrieve the password for use, where the accepted answer recommends encrypting the configuration file. It seems like a very interesting idea, but it is unclear to me how well this distributes across servers and individual developer computers, and how the logistics of this could be handled.
Storing credentials for automated use, where the asker responds to themself by stating that they simply put the credentials as cleartext in a file on their password-protected server. This might work for a single server, but I do think this is problematic if a number of local developer machines or separate test servers will be used for testing.
Case specifics
I think the question is of general interest regardless of the implementation details, but as they might be of interest they are provided here anyway.
I am using protractor for testing AngularJS apps, and am considering Grunt for further test automation. We plan on hooking the tests up on our Git server, and have it run tests at each commit to the master branch, so that we know it is never breaking. Or, not breaking during our tests, at least :)
I'm not sure what you mean when you say 'Strategies for safely storing and using user credentials testing environments'. You state that your tests need to be run with different sets of credentials. If your test is able to get to the credentials in clear text, so is any other application/user running under the same account.
Sure, you can encrypt the file storing the passwords, but you'd need to store the encryption key somewhere in the application or on the machine for the application to be able to decrypt it.
You could use asymmetric encryption to encrypt any credentials with the public key and only give access to the private key to the account running your tests. But still, anyone being able to log on under the account that runs your tests would be able to decrypt the credentials file and get to the passwords.
The best option is to not use confidential data in testing. I work for a company doing medical software, and we have a test domain in which we set up our software with well-known accounts and use fake data to test it.
Or if you want other developers to be able to run the tests under their own credentials, you could consider switching to Kerberos and avoid passwords all together.
I agree with the above answer, you can create a key, store it somewhere and use.
Else you can got for encryption, I found a link which may be helpful for you.
http://docstore.mik.ua/orelly/java-ent/security/ch13_05.htm