The rule-of-thumb/best practice which I only occasionally see challenged is that you should never commit environment-specific config files (e.g. .env, .ini, etc) to version control. The closest accepted practice I can find is that you may commit a default config file, but with the expectation that you manually edit the config on initial deployment.
However, in a DevOps role I'm not just writing app logic, I'm also automating the deployment of my code and its multiple environments. As a result, there's a specific need to keep track of what my configs look like so I may (re-)deploy my app if /when an environment needs to be recreated. For the same reason as with traditional code then, the most appealing solution is to store my config in the repo, but the question is what's the best and most scalable way to do so?
Obviously I'm not talking about storing any secrets in my config file. I'm also not against a solution that doesn't involve the repo. But I think discounting the repo outright is a bit silly and is a case of adhering to practice out of tradition more than its practical value.
How have others tackled this issue?
EDIT: Ideally, I think, there would be some extension for git that would allow env-specific configs to be associated with their app's repo, but would be segregated (stored in a separate repo?) in such a way as to avoid downloading an env's config when forking/branching a project. That seems well outside the scope of what's available though.
There are two sets of approaches for this. One uses a configuration from a secret store, such as Vault, to store the configuration of your data independent of your repository and inject it through the environment. This lives outside of the repository entirely, but can be configured for different environments and ensures your data is securely encrypted.
The other, where you want to store some configuration in the repository, usually consists of storing the file in a separate directory as a sort of template and then copying it into place and editing it. The place it is used in production is typically ignored. You may choose to use a script for editing it or edit it by hand.
You can also store configuration in a separate, highly restricted repository, but that has all of the problems of checking secrets into a repository.
Related
if we have a private github repo, where we save secrets and load it in from a .env file, this will keep sensitive information like usernames, passwords, api-keys, access keys etc. to a single source of truth, but is it bad practice in terms of security?
The data is technically exposed through the application, and if some malicious entities gets access to the source code, they will be able to see all the secrets, as if the repository access is compromised or something?
The alternative is injecting the data, at runtime (through a script, Docker container etc.), which would eliminate this vulnerability, but is it necessary?
First, it's recommended to not commit .env to git (add it to .gitignore) but rather have a .env.example which lists all the relevant variables (without values or with dummy values) and has code comments explaining what each of them is doing.
Second, the answer to your question is yes, you should never commit secrets to git, and even if you commit and then delete and commit again it still lives in git history which is also bad.
Today there are lots of malicious scripts scanning github repos at all times looking for such data and I heard more than once about AWS accounts that got hacked due to such mistakes, so the the bottom line is: keep your data safe!
And last, you want to keep these practices even if your github repo is private, because it can be made public by mistake and then it's a matter of seconds until your data gets exposed!
It's best practice to put sensitive environment variables into env.yml and reference them in serverless.yml. Of course, this also means not checking env.yml into a code repository.
So where's a safe place to store a backup of env.yml? We have a number of microservices, so we're accumulating several env.yml files for our projects. Even sharing them among devs and keeping them updated can become a bit of an issue - they really could benefit from version control but security trumps convenience so we keep them out of git.
I'd be interested to hear how others manage secrets config in general.
While the question was specifically about management of env.yml files, the bigger underlying question is how to manage sensitive environment variables. The link in the comment from Alex is all I needed. Our solution is so AWS-oriented that the AWS Parameter Store is worth exploring.
Alex DeBrie's article
Yan Cui's article on referencing parameter store values at runtime
I have created a Python module which I would like to distribute via PyPI. It relies on a third party API which in turn requires a free API key.
Yesterday I asked this question on how to reference a YAML file (which would contain the API keys) using the true path of the module. However that got me thinking of other ways;
Ask the users to save the API key as an environment variable and have the script check for the existence of said variable
Ask the user to pass in the API key as an **kwargs argument when creating a new instance of the object e.g.
thing = CreateThing(user_key = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', api_key = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
I would like to see what the community thinks on this topic.
I have created a Python module which I would like to distribute via PyPI. It relies on a third party API which in turn requires a free API key.
Even being a free api-key you should never have it in your code, even less have your code distributed to the public with it.
My advice is to never have any secrets on your code, not even default secrets as many developers like to put in their calls to get values from environment variables, configuration files, databases or whatsoever they retrieve them from.
When dealing with secrets you must always raise an exception when you fail to obtain one... Once more don't use default values from your code, not even with the excuse that they will be used only during development and/or testing.
I recommend you to read this article I wrote about leaking secrets in your code to understand the consequences of doing so, like this one:
Hackers can, for example, use exposed cloud credentials to spin up servers for bitcoin mining, for launching DDOS attacks, etc and you will be the one paying the bill in the end as in the famous "My $2375 Amazon EC2 Mistake"...
While the article is in the context of leaking secrets in the code a mobile app, must of the article applies to any type of code we write and commit into repositories.
About your proposed solution
Ask the users to save the API key as an environment variable and have the script check for the existence of said variable
Ask the user to pass in the API key as an **kwargs argument when creating a new instance of the object e.g.
thing = CreateThing(user_key = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', api_key = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
The number 1 is a good one and you sould use here the dot env file approach, maybe using a package like this one. But remember to raise an exception if the values does not exist, please never use defaults from your code.
Regarding solution 2 it is more explicit for the developer using your library, but you should recommend also them the .env file and help them understand how to properly manage them. For example secrets used on .env files should be retrieved from a vault software.
A SECURITY CALL OF ATTENTION
Dot env files cannot be committed into the source code at any time and when committing the .env.example, into your git repo, it must not contain any default values.
Oh you may think if I commit it accidentally to Github I will just clean my commits, rewrite the history and do a force push. Well think twice and see why that will not solve the problem you have created:
Well I have bad news for you... it seems that some services cache all github commits, thus hackers can check these services or employ the same techniques to immediately scan any commit sent to github in a matter of seconds.
Source: the blog post I linked above.
And remember what I have quoted earlier "My $2375 Amazon EC2 Mistake, that was due to leaked credentials in an accidental Github commit.
From this answer: it is recommended to use the OS keyring. In my lib odsclient I ended up implementing a series of alternatives, from the most secure (keyring) to the less ones (OS environment variable, git-ignored text file, arg-passed apikey possibly obfusctated with getpass()).
You might wish to have a look for inspiration, see this part of the doc in particular.
Trying to understanding the full workflow of a git-crypt based secret keeping solution.
The tool itself works pretty nicely when on a dev machine, even scaling to multiple developers seems to work fine.
However, it is not clear to me how will this work when deployed to a multiple servers on a cloud, some are created on-demand:
Challenge of unattended creation of GPG key on the new server (someone needs to create the passphrase, or is it in a source control, and than, what is all this even worth?)
Once a GPG is created, how is it being added to the ring?
Say we decide to skip #1 and just share a key across servers, how is the passphrase being supplied as part of the "git-crypt unlock" process?
I've really tried to search, and just couldn't find a good end-to-end workflow.
Like many Linux tools, git-crypt is an example of doing only one thing and doing it well. This philosophy dictates that any one utility doesn't try to provide a whole suite of tools or an ecosystem, just one function that can be chained with others however you like. In this case git-crypt doesn't bill itself as a deployment tool or have any particular integrations into a workflow. Its job is just to allow the git repository to store sensitive data that can be used in some checkouts but not others. The use cases can vary, as will how you chain it with other tools.
Based on the wording of your question I would also clarify that git-crypt is not a "secret keeping solution". In fact it doesn't keep your secrets at all, it just allows you to shuffle around where you do keep them. In this case it enables you to keep secret data in a repository along side non-secret information, but it only does so at the expense of putting the secret keeping burden on another tool. It exchanges one secret for another: your project's version controlled secret component(s) for a GPG key. How you manage the secret is still up to you, but now the secret you need to handle is a GPG key.
Holding the secrets is still up to you. In the case of you and other developers that likely means having a GPG private key file kicking around in your home directory, hopefully protected by a passphrase that is entered into an agent before being dispensed to other programs like git-crypt that call for it.
In the case of being able to automatically deploy software to a server, something somewhere has to be trusted with real secrets. This is often the top-level tool like Ansible or Puppet, or perhaps a CI environment like Gitlab, Travis, or Circle. Usually you wouldn't trust anything but your top level deployment tool with knowing when to inject secrets in an environment and when not to (or in the case of development / staging / production environments, which secrets to inject).
I am not familiar with Circle, but I know with Travis under your projects Settings tab there is an Environment Variables section that you can use to pass private information into the virtual machine. There is some documentation for how to use this. Gitlab's build in CI system has something similar, and can pass different secrets to test vs. deploy environments, etc.
I would suggest the most likely use case for your work flow is to:
Create a special secret variable for use on your production machines that has the passphrase for a GPG key used only for deployments. Whatever you use to create your machines should drop a copy of this key into the system and use this variable to unlock it and add it to an agent.
The deploy script for your project would checkout your git project code, then check for a GPG agent. If an agent is loaded it can try to decrypted the the checkout.
In the case of a developer's personal machine this will find their key, in the case of the auto-created machines it will find the deploy key. Either way you can manage access to the secrets in the deployment environment like one more developer on the project.
Whatever tool you use to create the machines becomes responsible for holding and injecting the secrets, probably in the form of a private key file and a passphrase in an environment variable that is used to load the key file into an agent.
I load sensitive details (eg, AWS Secret) into my node project with dotenv & an .env file, which I include in my .gitignore, as I've read that's best practice.
If I want to add others to the project, or even clone the repo on another system, what's the most efficient/safe way to transmit these sensitive details? I assume email/google drive are out, but I'm not sure what's 'in.'
My repo is private--does that mean the 'don't check-in API keys' advice is less concrete? As I see it, anyone with permission to see the repo will likely need relevant API keys, so it doesn't seem too unreasonable to simply check them in.
Security is always a trade-off with convenience. There are no real absolutes. Are you cool with anyone who accesses the repository and the (potential) git host having access to the keys? Are all developers' computer secure and use disk encryption? What do the keys give access to? Everything on AWS? Do you trust people with access to the repo enough to stay secure and not accidentally share things further? The point is that you need to try to fully understand in which ways the secrets may be leaked.
Personally I rather keep them outside of git and distribute keys to developers via usb sticks or encrypted email. And only to the people who need them. Only a subset of our developers actually need access.
Just make sure that you replace all your secrets if you do decide it's too risky. Don't just delete them from the repository. Once they are in there, they are in there forever.