It's best practice to put sensitive environment variables into env.yml and reference them in serverless.yml. Of course, this also means not checking env.yml into a code repository.
So where's a safe place to store a backup of env.yml? We have a number of microservices, so we're accumulating several env.yml files for our projects. Even sharing them among devs and keeping them updated can become a bit of an issue - they really could benefit from version control but security trumps convenience so we keep them out of git.
I'd be interested to hear how others manage secrets config in general.
While the question was specifically about management of env.yml files, the bigger underlying question is how to manage sensitive environment variables. The link in the comment from Alex is all I needed. Our solution is so AWS-oriented that the AWS Parameter Store is worth exploring.
Alex DeBrie's article
Yan Cui's article on referencing parameter store values at runtime
Related
The rule-of-thumb/best practice which I only occasionally see challenged is that you should never commit environment-specific config files (e.g. .env, .ini, etc) to version control. The closest accepted practice I can find is that you may commit a default config file, but with the expectation that you manually edit the config on initial deployment.
However, in a DevOps role I'm not just writing app logic, I'm also automating the deployment of my code and its multiple environments. As a result, there's a specific need to keep track of what my configs look like so I may (re-)deploy my app if /when an environment needs to be recreated. For the same reason as with traditional code then, the most appealing solution is to store my config in the repo, but the question is what's the best and most scalable way to do so?
Obviously I'm not talking about storing any secrets in my config file. I'm also not against a solution that doesn't involve the repo. But I think discounting the repo outright is a bit silly and is a case of adhering to practice out of tradition more than its practical value.
How have others tackled this issue?
EDIT: Ideally, I think, there would be some extension for git that would allow env-specific configs to be associated with their app's repo, but would be segregated (stored in a separate repo?) in such a way as to avoid downloading an env's config when forking/branching a project. That seems well outside the scope of what's available though.
There are two sets of approaches for this. One uses a configuration from a secret store, such as Vault, to store the configuration of your data independent of your repository and inject it through the environment. This lives outside of the repository entirely, but can be configured for different environments and ensures your data is securely encrypted.
The other, where you want to store some configuration in the repository, usually consists of storing the file in a separate directory as a sort of template and then copying it into place and editing it. The place it is used in production is typically ignored. You may choose to use a script for editing it or edit it by hand.
You can also store configuration in a separate, highly restricted repository, but that has all of the problems of checking secrets into a repository.
The official documentation lists the following practices for appsettings.json:
Never store passwords or other sensitive data in configuration provider code or in plain text configuration files.
Don't use production secrets in development or test environments.
Specify secrets outside of the project so that they can't be accidentally committed to a source code repository.
As far as I know the appsettings.json isn't served when you host the app on IIS and therefore can't be accessed from the web. We also host the source code ourselves (i.e. on our own servers). So as far as I can tell, the only real danger is when somebody manages to compromise the whole system and has actual access to the appsettings.json itself.
But are there other reasons for keeping sensitive data outside of appsettings.json? Are there other security aspects I'm overlooking?
I know there are several questions asking how to keep the appsettings.json secure, but not what the actual risks are.
There's many reasons, but the main one you've already mentioned:
it's usually much, much easier to get access to source code, than it is to get to well-guarded secrets (e.g. Azure Vault)
it's much easier to leak the secrets, possibly accidentally (via logs, or someone looking over your shoulder, or someone with access to the CI server)
you won't typically know you've leaked them, as there's typically no or a lot less auditing than with proper systems for keeping secrets
there's no way to limit the people that have access to specific secrets for specific environments
personally, I also dislike having specifically production secrets near my development setup. If I run code as a developer, I want to be 100% sure I'll never be accidentally running against a production environment ("oops, I tested that mass-delete feature...vs production"). If the prod secrets are just not there then there's no mistake to make
and probably many more reasons...
Basically, limiting the surface area for mistakes and security leaks will limit the chance for a problem, even if there is currently no reasonable combination of factors where a mistake or leak would happen.
I have created a Python module which I would like to distribute via PyPI. It relies on a third party API which in turn requires a free API key.
Yesterday I asked this question on how to reference a YAML file (which would contain the API keys) using the true path of the module. However that got me thinking of other ways;
Ask the users to save the API key as an environment variable and have the script check for the existence of said variable
Ask the user to pass in the API key as an **kwargs argument when creating a new instance of the object e.g.
thing = CreateThing(user_key = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', api_key = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
I would like to see what the community thinks on this topic.
I have created a Python module which I would like to distribute via PyPI. It relies on a third party API which in turn requires a free API key.
Even being a free api-key you should never have it in your code, even less have your code distributed to the public with it.
My advice is to never have any secrets on your code, not even default secrets as many developers like to put in their calls to get values from environment variables, configuration files, databases or whatsoever they retrieve them from.
When dealing with secrets you must always raise an exception when you fail to obtain one... Once more don't use default values from your code, not even with the excuse that they will be used only during development and/or testing.
I recommend you to read this article I wrote about leaking secrets in your code to understand the consequences of doing so, like this one:
Hackers can, for example, use exposed cloud credentials to spin up servers for bitcoin mining, for launching DDOS attacks, etc and you will be the one paying the bill in the end as in the famous "My $2375 Amazon EC2 Mistake"...
While the article is in the context of leaking secrets in the code a mobile app, must of the article applies to any type of code we write and commit into repositories.
About your proposed solution
Ask the users to save the API key as an environment variable and have the script check for the existence of said variable
Ask the user to pass in the API key as an **kwargs argument when creating a new instance of the object e.g.
thing = CreateThing(user_key = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', api_key = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
The number 1 is a good one and you sould use here the dot env file approach, maybe using a package like this one. But remember to raise an exception if the values does not exist, please never use defaults from your code.
Regarding solution 2 it is more explicit for the developer using your library, but you should recommend also them the .env file and help them understand how to properly manage them. For example secrets used on .env files should be retrieved from a vault software.
A SECURITY CALL OF ATTENTION
Dot env files cannot be committed into the source code at any time and when committing the .env.example, into your git repo, it must not contain any default values.
Oh you may think if I commit it accidentally to Github I will just clean my commits, rewrite the history and do a force push. Well think twice and see why that will not solve the problem you have created:
Well I have bad news for you... it seems that some services cache all github commits, thus hackers can check these services or employ the same techniques to immediately scan any commit sent to github in a matter of seconds.
Source: the blog post I linked above.
And remember what I have quoted earlier "My $2375 Amazon EC2 Mistake, that was due to leaked credentials in an accidental Github commit.
From this answer: it is recommended to use the OS keyring. In my lib odsclient I ended up implementing a series of alternatives, from the most secure (keyring) to the less ones (OS environment variable, git-ignored text file, arg-passed apikey possibly obfusctated with getpass()).
You might wish to have a look for inspiration, see this part of the doc in particular.
While my understanding is that using environment variables for configuring applications in different deployment environments is best practice, I don't know of a good method for managing these environments and populating the variables in them.
Here are the approaches I'm considering:
Populating them in the Upstart script we use to run our app. We use Ansible to provision our servers which currently copies over a static upstart scrip, however this could be templated with environment variables.
Same approach but with /etc/environment
Using something like envdir and once again using ansible to populate the files.
The other issue is where to store the values, I'm thinking redis, but am open to suggestion. Ansible has a "Vault" that I'm yet to look at which may be an option.
The values are things like API keys and database urls.
I'm really just wondering what approaches other people use. I'm open to all suggestions.
I think this question is going to solicit a lot of opinions, and probably a lot of conflicting opinions, but with that said here's some of my opinions:
/etc/environment is part of the OS and intended for configuration of interactive user shells. Don't use it for applications.
A templatized upstart config via ansible seems pretty reasonable to me. Just ensure the filesystem permissions are suitably locked-down to root read only if you intend to store sensitive data there.
You could also use a templatized application-specific config file such as /etc/myapp/config which has worked pretty well for many programs for a few decades. The whole environment-variables-are-better-than-config files position is really coming more from a PaaS perspective (heroku I believed popularized this approach by way of their 12-factor app site). So if you're deployment is PaaS or PaaS-style, envirnoment is convenient. But if you are installing your app on your own servers via Ansible, IMHO a straight-up config file is simpler to troubleshoot for the reasons I outline in my blog post environment variables considered harmful
If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Which features would be absolutely necessary?
What are the main concerns (security, ease-of-configuration, ...)?
I think the Windows Registry was not a bad idea, just the implementation didn't fullfill the promises. A common place for configurations including for example apache config, database config or mail server config wouldn't be a bad idea and might improve maintainability, especially if it has options for (protected) remote access.
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
A kernel-based registry? Why? Why? A thousand times, why? Might as well ask for a kernel-based musical postcard or inetd for all the point it is putting it in there. If it doesn't need to be in the kernel, it shouldn't be in. There are many other ways to implement a privileged process that don't require deep hackery like that...
If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Make sure that applications can change many entries at once in an atomic fashion.
Make sure that there are simple command-line tools to manipulate it.
Make sure that no critical part of the system needs it, so that it's always possible to boot to a point where you can fix things.
Make sure that backup programs back it up correctly!
Don't let chunks of executable data be stored in your registry.
If you must have a single repository, at least use a proper database so you have tools to restore, backup, recover it etc and you can interact with it without having a new set of custom APIs
the first one that come to my mind is somehow you need to avoid orphan registry entries. At the moment when you delete program you are also deleting the configuration files which are under some directory but after having a registry system you need to make sure when a program is deleted its configuration in registry should be deleted as well.
IMHO, the main problems with the windows registry are:
Binary format. This loses you the availability of a huge variety of very useful tools. In a binary format, tools like diff, search, version control etc. have to be specially implemented, rather than use the best of breed which are capable of operating on the common substrate of text. Text also offers the advantage of trivially embedded documentation / comments (also greppable), and easy programatic creation and parsing by external tools. It's also more flexible - sometimes configuration is better expressed with a full turing complete language than trying to shoehorn it into a structure of keys and subkeys.
Monolithic. It's a big advantage to have everything for application X contained in one place. Move to a new computer and want to keep your settings for it? Just copy the file. While this is theoretically possible with the registry, so long as everything is under a single key, in practice it's a non-starter. Settings tend to be diffused in various places, and it is generally difficult to find where. This is usually given as a strength of the registry, but "everything in one place" generally devolves to "Everything put somewhere in one huge place".
Too broad. Its easy to think of it as just a place for user settings, but in fact the registry becomes a dumping ground for everything. 90% of what's there is not designed for users to read or modify, but is in fact a database of the serialised form of various structures used by programs that want to persist information. This includes things like the entire COM registration system, installed apps, etc. Now this is stuff that needs to be stored, but the fact that its mixed in with things like user-configurable settings and stuff you might want to read dramatically lowers its value.