NixOps: How to deploy to an existing NixOS VM? - linux

I have almost the same problem as in this question, but it was never answered:
nixops: how to use local ssh key when deploying on machine with existing nixos (targetEnv is none)?
I'm not using Terraform though. Just NixOS + NixOps. So far, I:
Created a new VM on Vultr
Did a standard NixOS install from the current iso (20.09 pre something), setting a root password
Enabled ssh with root password authentication and did a nixos-rebuild switch
Manually generated an ssh keypair on my laptop
sshed into the VM with the password and added the public key to /root/.ssh/authorized_keys
Now I can ssh into the VM manually with the new key, as expected:
ssh -i .secrets/vultrtest1_rsa root#XXX.XXX.XXX.XXX
Cool. Next, I copied the existing NixOS config files to my laptop and tried to wire them up to NixOps. I tried a minimal test1.nix, as well as adding the deployment."none" and/or users.users.root.openssh sections below.
vultrtest1
├── configuration.nix
└── hardware-configuration.nix
test1.nix
# test1.nix
{
network.description = "vultr test 1";
network.enableRollback = true;
vultrtest1 = { config, pkgs, ... } : {
deployment.targetHost = "XXX.XXX.XXX.XXX";
imports = [ ./vultrtest1/configuration.nix ];
# deployment.targetEnv = "none"; # existing nixos vm
# same result with or without this section:
deployment."none" = {
sshPrivateKey = builtins.readFile ./secrets/vultrtest1_rsa;
sshPublicKey = builtins.readFile ./secrets/vultrtest1_rsa.pub;
sshPublicKeyDeployed = true;
};
# same result with or without this:
users.users.root.openssh.authorizedKeys.keyFiles = [ ./secrets/vultrtest1_rsa.pub ];
};
}
In all cases, when I try to create and deploy the network NixOps tries to generate another SSH key, then fails to log in with it:
$ nixops create test1.nix -d test1
created deployment ‘b4ac25fa-c842-11ea-9a84-00163e5e6c00’
b4ac25fa-c842-11ea-9a84-00163e5e6c00
$ nixops list
+--------------------------------------+-------+------------------------+------------+------+
| UUID | Name | Description | # Machines | Type |
+--------------------------------------+-------+------------------------+------------+------+
| b4ac25fa-c842-11ea-9a84-00163e5e6c00 | test1 | Unnamed NixOps network | 0 | |
+--------------------------------------+-------+------------------------+------------+------+
$ nixops deploy -d test1
vultrtest1> generating new SSH keypair... done
root#XXX.XXX.XXX.XXX: Permission denied (publickey,keyboard-interactive).
vultrtest1> could not connect to ‘root#XXX.XXX.XXX.XXX’, retrying in 1 seconds...
root#XXX.XXX.XXX.XXX: Permission denied (publickey,keyboard-interactive).
vultrtest1> could not connect to ‘root#XXX.XXX.XXX.XXX’, retrying in 2 seconds...
root#XXX.XXX.XXX.XXX: Permission denied (publickey,keyboard-interactive).
vultrtest1> could not connect to ‘root#XXX.XXX.XXX.XXX’, retrying in 4 seconds...
root#XXX.XXX.XXX.XXX: Permission denied (publickey,keyboard-interactive).
vultrtest1> could not connect to ‘root#XXX.XXX.XXX.XXX’, retrying in 8 seconds...
root#XXX.XXX.XXX.XXX: Permission denied (publickey,keyboard-interactive).
Traceback (most recent call last):
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/bin/..nixops-wrapped-wrapped", line 991, in <module>
args.op()
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/bin/..nixops-wrapped-wrapped", line 412, in op_deploy
max_concurrent_activate=args.max_concurrent_activate)
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 1063, in deploy
self.run_with_notify('deploy', lambda: self._deploy(**kwargs))
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 1052, in run_with_notify
f()
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 1063, in <lambda>
self.run_with_notify('deploy', lambda: self._deploy(**kwargs))
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 996, in _deploy
nixops.parallel.run_tasks(nr_workers=-1, tasks=self.active_resources.itervalues(), worker_fun=worker)
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/parallel.py", line 44, in thread_fun
result_queue.put((worker_fun(t), None, t.name))
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 979, in worker
os_release = r.run_command("cat /etc/os-release", capture_stdout=True)
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/backends/__init__.py", line 337, in run_command
return self.ssh.run_command(command, self.get_ssh_flags(), **kwargs)
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/ssh_util.py", line 280, in run_command
master = self.get_master(flags, timeout, user)
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/ssh_util.py", line 200, in get_master
compress=self._compress)
File "/nix/store/kybdy5m979h4kvswq2gx3la3rpw5cq5k-nixops-1.7/lib/python2.7/site-packages/nixops/ssh_util.py", line 57, in __init__
"‘{0}’".format(target)
nixops.ssh_util.SSHConnectionFailed: unable to start SSH master connection to ‘root#XXX.XXX.XXX.XXX’
What am I missing? Perhaps I can manually add the key NixOps just generated?
Update: I used SQLiteBrowser to look inside the NixOps state database and pasted the generated public key into authorized_keys. Now I can ssh in with the newly generated key manually, but NixOps still fails to deploy.

Solved it temporarily, in a not-very-satisfying way:
browsed the database for the public + private key NixOps generated
manually added those to authorized_keys on the VM
also added the old key to the local ~/.ssh with an entry in ~/.ssh/config
No idea why NixOps uses the local ssh config, or how to prevent that. The entry that works looks like:
Host XXX.XXX.XXX.XXX
HostName XXX.XXX.XXX.XXX
Port 22
User root
IdentityFile ~/.ssh/vultrtest1_rsa
Will wait a couple days, then mark this as the solution unless anyone can explain how to tell NixOps to use the local key from .secrets instead of ~/.ssh.

Looking at the source at
https://github.com/NixOS/nixops/blob/master/nix/options.nix
there is deployment.provisionSSHKey option
which says.
deployment.provisionSSHKey = mkOption {
type = types.bool;
default = true;
description = ''
This option specifies whether to let NixOps provision SSH deployment keys.
NixOps will by default generate an SSH key, store the private key in its state file,
and add the public key to the remote host.
Setting this option to <literal>false</literal> will disable this behaviour
and rely on you to manage your own SSH keys by yourself and to ensure
that <command>ssh</command> has access to any keys it requires.
'';
};
Maybe this can help? Once i'll get back to my Nixops machine, I'll give it a try.

Related

Ansible passing ssh username and Password using command line is not working

I need to pass username and password instead of using passwordless ssh keys.
I used below command for that
ansible-playbook -i hosts.ini main.yml --extra-vars "ansible_sudo_pass=$(ansible_password) ansible_user=$(ansible_user) ansible_ssh_pass=$(ansible_password)"
my inventory file hosts.ini
[all]
10.1.5.4
[defaults]
host_key_checking = false
[all:vars]
ansible_connection=ssh
timeout=20
Below is the error:
TASK [add_repo : Add repository] ***********************************************
fatal: [10.1.5.4]: FAILED! => {"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this. Please add this host's fingerprint to your known_hosts file to manage this host."}
Note I tried:
Removing in hosts.ini file
[defaults]
host_key_checking = false
I also tried by changing ansible_ssh_pass=$(ansible_password) to ansible_password=$(ansible_password)
Just posting answer for my question, it may help others in case they stuck with same issue.
I have added below entry in file /etc/ansible/ansible.cfg and it start working.
host_key_checking = False
Even I have added same above settings in hosts.ini file, seems it does not taken effect.

SSH on console google cloud permission denied (publickey) with google-cloud-sdk file error

I'm new on cloud computing and I'm trying to use SSH to control my VM instance but when I use command (with debug)
gcloud compute ssh my-instance-name --verbosity=debug
it's show error
DEBUG: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code
[255]. Traceback (most recent call last): File
"/google/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line
983, in Execute
resources = calliope_command.Run(cli=self, args=args) File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py",
line 784, in Run
resources = command_instance.Run(args) File "/google/google-cloud-sdk/lib/surface/compute/ssh.py", line 262, in
Run
return_code = cmd.Run(ssh_helper.env, force_connect=True) File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/util/ssh/ssh.py",
line 1256, in Run
raise CommandError(args[0], return_code=status) CommandError: [/usr/bin/ssh] exited with return code [255]. ERROR:
(gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I try to solve the problem in this link but it's not work
https://groups.google.com/forum/#!topic/gce-discussion/O-c10TM4ZLM
SSH error code 255 is a general error returned by GCP. You can try one of the following options.
1. Wait a few minutes and try again. It is possible that:
The instance has not finished starting up.
Metadata for SSH keys has not finished being propagated to the project or instance.
The Guest Environment has not yet read the SSH keys metadata.
2. Verify that SSH access to the instance is not blocked by a firewall.
gcloud compute firewall-rules list | grep "tcp:22"
If necessary, create a firewall rule to allow TCP 22 for a given VPC network, subnet, or instance tag.
gcloud compute firewall-rules create ssh-allow-incoming --priority=0 --allow=tcp:22 --network=[VPC-Network]
3. Make sure that the root volume is not out of disk space. Messages like the following will be visible in the console log when it is out of disk space:
...No space left on device...
...google-accounts: ERROR Exception calling the response handler.
[Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp',
'/usr/tmp', '/']...
4. Make sure that the instance has not run out of memory
5. Verify that temporary SSH Keys metadata is set for either the project or instance.
Finally you could follow any of their supported or third-party methods
Assuming you have the correct IAM permissions, it is much easier and preferred by GCP to use OSlogin to ssh into an instance, rather than manage ssh keys
in cloud shell, enter this
gcloud compute --project PROJECTID project-info add-metadata --metadata enable-oslogin=TRUE
This enables OSLogin on all instances in a project, instead of using ssh keys gcp will check your IAM permissions and authenticate based on those.
If you are not project owner, make sure you have the compute.osloginviewer or admin permissions in Cloud IAM
Once enables, try SSHing into the instance again using the command you posted.
This is not a concrete answer but I think at first you should set your project by :
gcloud config set project PROJECT_ID
Then
gcloud compute ssh my-instance-name --verbosity=debug
This link would be useful:
https://cloud.google.com/sdk/gcloud/reference/compute/ssh

Paramiko cannot open an ssh connection even with load_system_host_keys + WarningPolicy

I am connecting to a remote server with the following code:
ssh = paramiko.SSHClient()
ssh.load_system_host_keys()
ssh.set_missing_host_key_policy(paramiko.WarningPolicy())
ssh.connect(
hostname=settings.HOSTNAME,
port=settings.PORT,
username=settings.USERNAME,
)
When I'm on local server A, I can ssh onto the remote from the command line, suggesting it is in known_hosts. And the code works as expected.
On local server B, I can also ssh onto the remote from the command line. But when I try to use the above code I get:
/opt/mysite/virtualenv/lib/python3.5/site-packages/paramiko/client.py:763: UserWarning: Unknown ssh host key for [hostname]:22: b'12345'
key.get_fingerprint())))
...
File "/opt/mysite/virtualenv/lib/python3.5/site-packages/paramiko/client.py", line 416, in connect
look_for_keys, gss_auth, gss_kex, gss_deleg_creds, t.gss_host,
File "/opt/mysite/virtualenv/lib/python3.5/site-packages/paramiko/client.py", line 702, in _auth
raise SSHException('No authentication methods available')
paramiko.ssh_exception.SSHException: No authentication methods available
Unlike "SSH - Python with paramiko issue" I am using both load_system_host_keys and WarningPolicy, so I should not need to programatically add a password or key (and I don't need to on local server A).
Is there some system configuration step I've missed?
Try to use the fabric (this is written based on invoke + paramiko) instead of the paramiko and set the following parameters:
con = fabric.Connection('username#hostname' ,connect_kwargs={'password': 'yourpassword', 'allow_agent': False}
If it's keep falling, try to check if your password is still valid and you're not required to change your password.
I tested with the wrong user on local server B. The user running the Python process did not have ssh permissions after all. (Command line ssh failed for that user.) Once I gave it permissions, the connection worked as expected.

Why does command on GCE work on console terminal but not as a Cron job

I can run this command on my instance using web console;
gsutil rsync -d -r /my-path gs://my-bucket
But when I try on my remote ssh terminal I get this error;
root#instance-2: gsutil rsync -d -r /my-path gs://my-bucket
Building synchronization state...
INFO 0923 12:48:48.572446 multistore_file.py] Error decoding credential, skipping
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/multistore_file.py", line 381, in _refresh_data_cache
(key, credential) = self._decode_credential_from_json(cred_entry)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/multistore_file.py", line 400, in _decode_credential_from_json
credential = Credentials.new_from_json(json.dumps(cred_entry['credential']))
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/client.py", line 292, in new_from_json
return from_json(s)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/credentials_lib.py", line 356, in from_json
data['token_expiry'], oauth2client.client.EXPIRY_FORMAT)
TypeError: must be string, not None
Caught non-retryable exception while listing gs://my-bucket/: Could not reach metadata service: Not Found
At source listing 10000...
At source listing 20000...
At source listing 30000...
At source listing 40000...
CommandException: Caught non-retryable exception - aborting rsync
I solved this by switching the user to the default CGE one that is created when the project is created. Root on the VM does not have privileges to run gsutil commands it seems.

fabric keeps asking for password using SSH connection

I'm trying to connect to a windows azure instance using fabric, but despite I configure ssh conection to execute commands, fabric keeps asking for password.
This is my fabric file:
def azure1():
env.hosts = ['host.cloudapp.net:60770']
env.user = 'adminuser'
env.key_filename = './azure.key'
def what_is_my_name():
run('whoami')
I run it as:
fab -f fabfile.py azure1 what_is_my_name
or
fab -k -f fabfile.py -i azure.key -H adminuser#host.cloudapp.net:60770 -p password what_is_my_name
But nothing worked, it keeps asking for user password despite I enter it correctly.
Executing task 'what_is_my_name'
run: whoami
Login password for 'adminuser':
Login password for 'adminuser':
Login password for 'adminuser':
Login password for 'adminuser':
If I try to connect directly with ssh, it works perfectly.
ssh -i azure.key -p 60770 adminuser#host.cloudapp.net
I've tried the advises given in other questions (q1 q2 q3) but nothing works.
Any idea what I am doing wrong?
Thank you
Finally I found the problem is due to the public-private key pair generation.
I followed the steps provided in windows azure guide, there the keys are generated using openssl, so the process outcomes a public key stored in a pem file you must upload to your instance during creation process.
The problem is that this private key obtained is not correctly recognized by paramiko, so fabric won't work. If you try to open a ssh connection using paramiko from python interpreter:
>>> import paramiko, os
>>> paramiko.common.logging.basicConfig(level=paramiko.common.DEBUG)
>>> ssh = paramiko.SSHClient()
>>> ssh.load_host_keys('private_key_file.key') # private key file generated using openssl
>>> ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
>>> ssh.connect("web1.cloudapp.net",port=56317)
Gives me the error:
DEBUG:paramiko.transport:Trying SSH agent key a9d8dd41609191ebeedbe8df768ad8c9
DEBUG:paramiko.transport:userauth is OK
INFO:paramiko.transport:Authentication (publickey) failed.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".. /paramiko/client.py", line 337, in connect
self._auth(username, password, pkey, key_filenames, allow_agent, look_for_keys)
File ".. /paramiko/client.py", line 528, in _auth
raise saved_exception
paramiko.PasswordRequiredException: Private key file is encrypted
When the key file isn't encrypted.
To solve this, I created the key pair using openssh and then convert the public key to pem to upload it to azure:
# Create key with openssh
ssh-keygen -t rsa -b 2048 -f private_key_file.key
# extract public key and store as x.509 pem format
openssl req -x509 -days 365 -new -key private_key_file.key -out public_key_file.pem
# upload public_key_file.pem file during instance creation
# check connection to instance
ssh -i private_key_file.key -p 63534 adminweb#host.cloudapp.net
This solved the problem.
To debug fabric's ssh connections, add these lines to your fabfile:
import paramiko, os
paramiko.common.logging.basicConfig(level=paramiko.common.DEBUG)
This will print all of paramiko's debug messages. Paramiko is the ssh library that fabric uses.
Note that since Fabric 1.4 you have to specifically enable using ssh config:
env.use_ssh_config = True
(Note: I'm pretty sure absolutely certain that my fabfile used to work with Fabric > 1.5 without this option, but it doesn't now that I upgraded to 1.10).

Resources