VMAccessForLinux fails to provision on Azure RM VM - linux

I've tried absolutely everything I can think of to do a SSH reset of my user on my Linux VM (Hortonworks Sandbox to be precise).
The VMAccessForLinux will not install, it simply states that it fails to provision:
I've tried adding it as 1.*,1.1, 1.2 and now 1.4 as per https://github.com/Azure/azure-content/blob/master/articles/virtual-machines/virtual-machines-troubleshoot-ssh-connections.md
I can't access my SSH, and I can't do any of the Azure reset commands, either using Azure CLI or Azure PS.
The VM is a RM vm.
How can I resolve this?
In PS I get errors like:
I'm beyond tearing my hair out.
And before anyone suggest that I use the portal, this is what I'm offered there (thanks Azure):

I can't say if this is a universal fix, but I managed to resolve this issue, by using the following in the Azure CLI:
$ azure vm reset-access -n {VMNAME} -g {GROUPNAME} \
-u {SSH_USER} -p {SSH_PASS} -E 1.4 -vv --json
It did NOT work for my original user on the box though; I created ANOTHER user, and from there I did a password reset with a sudo on the box, then I could SSH into the box from that user.

Firstly, can you go through the instructions here if you've not already. The VM extension has changed recently and that is the latest doc to go through: https://azure.microsoft.com/en-us/blog/using-vmaccess-extension-to-reset-login-credentials-for-linux-vm/.
EDIT #1
Glad to see you resolved it by creating a new user with reset-access.
If azure vm reset-access should fail, the next step would have been to download this tool which can allow you to inspect the VHD when not logged onto the VM: https://github.com/paulmey/inspect-azure-vhd - and inspect Waagent log is /var/log/waagent.log (You can see extension updates here) and
extension.log in /var/log/azure/.

Related

SMB Client on azure server not deleting file from azure storage

I have a flask webapp running on an Ubuntu Azure sever. I also have an azure storage account, and to access the storage from the webapp, I use SMB. This has worked so far, with adding and updating files on the server, but I tried to delete a file and it didn't work. No error or anything, it just did nothing and the file is still on the server. I tried the command locally and it worked fine. Is there something I'm doing wrong and how could I fix this problem. Here's the command I've been using:
smbclient //name.file.core.windows.net/website -mSMB3 -e -Uname%password -c 'rm tempplugins/test2.ini'
This may not solve your exact problem, but I was attempting to perform operations on a file share on an Azure Storage Account from an Azure VM running CentOS, and I ran into several different problems. It took me a while to get the kinks worked out.
In my case, I had to use to use backslashes, but I had to double them so that they were escaped properly. Example:
smbclient \\\\storageaccount.file.core.windows.net\\sharename
Additionally, we weren't using an integrated active directory, and so we had to use the storage account name as the username and it had to be "prefixed" with "Azure" like "Azure\storageaccount". And don't forget that backslashes have to be doubled! Also, the password was the storage account key. Example:
smbclient \\\\storageaccount.file.core.windows.net\\sharename -U Azure\\storageaccount%key
I used the "-d" option to debug the command line options for smbclient. However, in my case, the "-d" option had to be on the end of the command or it interfered. If it hadn't been for the clues provided by "-d", I never would have gotten this to work. Example:
smbclient \\\\storageaccount.file.core.windows.net\\sharename -U Azure\\storageaccount%key -d
Here's a simple, one-liner that shows a directory of a file share on an Azure Storage Account. Example:
smbclient \\\\storageaccount.file.core.windows.net\\sharename -U Azure\\storageaccount%key -c dir -d
I hope that this helps someone else as I must of blown 2 to 3 hours to get this worked out.

"Incorrect padding" when trying to create managed Kubernetes cluster on Azure with AKS

I am working through the instructions outlined here to try and set up a Couchbase cluster on Azure Container Service (AKS). That tutorial is using terminal/Mac, and I'm using Powershell/Windows.
I'm getting an error before I even get to the Couchbase part. I successfully created a resource group (which I called "cb_ask_spike", and yes it does appear on the Portal) from the command line, but then I try to create an AKS cluster:
az aks create --resource-group cb_aks_spike --name cbakscluster
I also tried:
az aks create --resource-group cb_aks_spike --name cbakscluster --generate-ssh-keys
In both cases, I get an error:
az aks create: error: Incorrect padding
I don't know what this error message means, and I can't seem to find any reference to it in the documentation or anywhere. What am I doing wrong?
I'm using azure-cli v2.0.31.
I am fairly confident that I solved why I'm getting this error, and I've updated issue 6142 on azure-cli. At this time, I believe this is a bug, and it's not fixed, but there is a workaround.
First it's important to note that --generate-ssh generates a new ssh key in ~/.ssh
I had a hunch that since ~ for me is "C:\Users\Matthew Groves" that the space in the path was causing the problem. Sure enough, I created a new account called "mgroves". ~ is now "C:\Users\mgroves" and voila, I don't get the "incorrect padding" error message anymore.
So, the workaround is either to use a new account (huge pain) or rename the folder (this is what I have done, and it's also a huge pain and I'm still finding little problems here and there all throughout my system because of it.
In addition to the now approved answer there is a solution that doesn't require you to change any directory or account name and is also easy to implement as well.
As correctly stated in the other answers the Azure CLI cannot handle the actual location where the generated SSH keys will be stored if there is a space in the path. I.e. C:\Users\Admin Account\.ssh\.
When using the az aks create command you can either use --generate-ssh-keys to let the Azure CLI handle it, OR you can specify an already existing SSH key with --ssh-key-value.
I used Git Bash to generate a new SSH key pair in the C:\Users\Admin Account\.ssh\ directory:
ssh-keygen -f ~/.ssh/aks-ssh
Now create the Azure AKS cluster while pointing to this new SSH key with:
az aks create \
--resource-group YourResourceGroup \
--name YourClusterName \
--node-count 3 \
--kubernetes-version 1.16.8 \
--ssh-key-value ~\.ssh\aks-ssh.pub
And you are good to go!
Just verified today using az cli in Powershell for version 2.0.31. You might need to first run the az group and then create az aks command. Screenshot for your reference.

ssh on edge-node for azure HDInsight

I tried deploying a HDInsight cluster with an edge node.
I used https://github.com/Azure/azure-quickstart-templates/blob/master/101-hdinsight-linux-with-edge-node/azuredeploy.json for deployment.
After deployment is complete I tried ssh using following command:
ssh sshuser#new-edgenode.myclustertest-ssh.azurehdinsight.net:22
[myclustertest is the name of the cluster].
It gives following error:
ssh: Could not resolve hostname new-edgenode.myclustertest-ssh.azurehdinsight.net:22: Name or service not known
Do I need to add something to the azuredeploy.json to enable ssh access?
Looking at the https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-linux-use-ssh-unix I thought that
<edgenodename>.<clustername>-ssh.azurehdinsight.net
is enabled by default for external access.
Problem was in the ssh command.
I used the ssh command supplied from azure portal hoping that it would work seamlessly. I had to remove :22 from the command to make it work.
Modified command looks like this:
ssh sshuser#new-edgenode.myclustertest-ssh.azurehdinsight.net

HDP 2.5 Hortonworks ambari-admin-password-reset missing

I have downloaded the sandbox from hortonworks (Centos OS), then tried to follow the tutorial. It seems like the ambari-admin-password-reset command is not there and missing. I tried also to login with putty, the console asked me to change the password so I did.
now it seems like the command is there, but I have different passwords for the console and one for the putty for the same user.
I have tried to look for the reason why for the same user 'root' I have 2 different passwords (one for the virtual box console and one for the putty) that I can login with. I see different commands on each box. more than that when I share folder I can only see it on the virtual box console but not on the putty console) which is really frustrating.
How can I enforce that what I would see from putty would be the same as what I see from the virtual box console.
I think it somehow related to TTY but I am not sure.
EDIT:
running commands from the virtual box machine output:
grep "^passwd" /etc/nsswitch.conf
OUT: passwd: files sss
grep root /etc/passwd
OUT: rppt"x"0"0"root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
getent passwd root
OUT: root:x:0:0:root:/root:/bin/bash
EDIT:
I think this is all about docker containers. It seems like the machine 2222 port is the ssh port for the hdp 2.5 container and not for the hosting machine.
Now I get another problem. when running
docker exec sandbox ls
it is getting stuck. any help ?
Thanks for helpers
So now I had the time to analyze the sandbox vm, and write it up for other users.
As you stated correctly in your edit of the question, its the docker container setup of the sandbox, which confuses with two separate root users:
via ssh root#127.0.0.1 -p 2222 you get into the docker container called "sandbox". This is a CentOS release 6.8 (Final), containing all the HDP services, especially the ambari service. The configuration enforces a password change at first login for the root user. Inside this VM you can also execute the ambari-admin-password-reset and set there a password for the ambari admin.
via console access you reach the docker host running a Centos 7.2, here you can login with the default root password for the VM as found in the HDP docs.
Coming to your sub-question with the hanging docker exec, it seems to be a bug in that specific docker version. If you google that, you will find issues discussing this or similar problems with docker.
So I thought that it would be a good idea to just update the host via yum update. However this turned out to be a difficult path.
yum tried to update the kernel, but complained that there is not enough space on the boot partion.
So I moved the boot partion to the root partition:
edit /etc/fsab and comment out the boot entry
unmount /boot
mv /boot
cp -a /boot.org /boot
grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-install /dev/sda
reboot
After that I have found out that the docker configuration is broken and docker does not start anymore. In the logs it complained about
"Error starting daemon: error initializing graphdriver:
\"/var/lib/docker\" contains other graphdrivers: devicemapper; Please
cleanup or explicitly choose storage driver (-s )"
So I edited /etc/systemd/system/multi-user.target.wants/docker.service and changed the ExecStart setting to:
ExecStart=/usr/bin/dockerd --storage-driver=overlay
After a service docker start and a docker start sandbox. The container worked again and I could could login to the container and after a ambari-server restart everything worked again.
And now - with the new docker version 1.12.2, docker exec sandbox ls works again.
So to sum up the docker exec command has a bug in that specific version of the sandbox, but you should think twice if you want to upgrade your sandbox.
I ran into the same issue.
The HDP 2.5 sandbox runs all of its components in a docker container, but commands like docker exec -it sandbox /bin/bash or docker attach sandbox got stuck.
When I ran a simple ps aux, I found several /usr/bin/docker-proxy commands which looked like :
/usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 60000 -container-ip 172.17.0.2 -container-port 60000
They probably forward the HTTP ports of the various UIs of HDP components.
I could ssh into the container ip (here 172.17.0.2) using root/hadoop to authenticate. From there, I could use all "missing" commands like ambari-admin-password-reset.
$ ssh root#172.17.0.2
... # change password
$ ambari-admin-password-reset
NB: I am new to docker, so there's probably a better way to deal with this.
I'd like to post here the instructions for 3.0.1 here.
I followed the instructions of installing hortonworks version 3.0.1 here: https://youtu.be/5TJMudSNn9c
After running the docker container, go to your browser and enter "localhost:4200", that will take you to the in browser terminal of the container, that hosts ambari. Enter "root" for login and "hadoop" for password, change the root password, and then enter "ambari-admin-password-reset" in order to reset ambari password.
In order to be able to use sandbox-hdp.hortonworks.com, you need to add the line "127.0.0.1 sandbox-hdp.hortonworks.com" at the end of the /private/etc/hosts file on your mac.
Incorrect Pass
Then right corner click on power button >> power off drop down >> Restart >> when it boots up then press Esc key to get into recovery menu
Restart
select advance option and hit enter
Advance Option
Select Recovery mode hit enter
Select Root
Root enter
Command
mount -rw -o remount/
ls /home
change pass command
passwd username
user as yours
last step
enter pass two times by pressing enter
enter image description here
Hopefully you changed password (:

Git push/pull fails on GitLab in Google Compute Engine

I've installed GitLab on Google Compute Engine using "Click to Deploy" from the project interface. The deployment is successful after a few minutes. I can SSH into the instance, and muck around with it as expected.
I can also log in to GitLab using the web interface, and add SSH keys to my profile. So far, so good. However, when I attempt to push or pull to a new example repository, I receive this message:
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
I've removed my local SSH config so it doesn't interfere. Do I need to setup an SSH tunnel of some sort? What am I missing?
UPDATE: Wiping out my local ~/.ssh folder, and regenerating an SSH key (which I've added to my profile in GitLab) produces the following error:
Received disconnect from {GITLAB_IP_ADDRESS}: 2: Too many authentication failures for git
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
UPDATE 2: It seems GitLab may already have a solution: run sudo gitlab-ctl reconfigure. See here: https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md#git-ssh-access-stops-working-on-selinux-enabled-systems
You need to create an SSH tunnel to communicate with GitLab.
1. Log into your development server as your user, and create a key.
ssh-keygen -t rsa
Follow the steps, and create a passcode (that you can remember) as you'd need this to pull and push code from/to GitLab.
2. Now that you've created your key, we can copy it;
cat id_rsa.pub
Copy the output of that command (including ssh-rsa), and add it to your GitLab profile. (http://my-gitlab-server.com/profile/keys/new).
3. Ensure you have the correct privilege to the project(s)
Ensure you are at role developer at the very least. (Screengrab of roles: http://i.stack.imgur.com/DSSvl.jpg)
4. Now, copy the project link
Go into your project, and find the SSH link in the top right;
5. Now back to your development server
Navigate to your directory where you'd like to work, and run the following;
$ git init
$ git remote add origin <<project_url>>
$ git fetch
Where <<project_url>> is the link we copied in step 4.
You will be prompted your password (this is your ssh key password, not your server password) and to add the host to your known_hosts file. After that, the project will start to download and you can enjoy development.
I did these steps on a CentOS 6.4 machine with Digital Ocean. But they shouldn't differ from using Google CE.
Edit
Quote from Marty Penner answer as per this comment
Solved it! Thanks to #sxleixer and #Alexander Wenzowski for figuring this out.
Apparently, SELinux was interfering with a non-standard location for the .ssh directory. I needed to run the following commands on the Compute Engine instance:
sudo yum -y install policycoreutils-python # Install the `semanage` tool
sudo semanage fcontext -a -t ssh_home_t "/var/opt/gitlab/.ssh/authorized_keys" # Allow the nonstandard ssh_home_t
See the full thread here:
Google Cloud Engine. Permission denied (publickey,gssapi-keyex,gssapi-with-mic)
Solved it! Thanks to #sxleixer and #Alexander Wenzowski for figuring this out.
Apparently, SELinux was interfering with a non-standard location for the .ssh directory. I needed to run the following commands on the Compute Engine instance:
sudo yum -y install policycoreutils-python # Install the `semanage` tool
sudo semanage fcontext -a -t ssh_home_t "/var/opt/gitlab/.ssh/authorized_keys" # Allow the nonstandard ssh_home_t
See the full thread here:
Google Cloud Engine. Permission denied (publickey,gssapi-keyex,gssapi-with-mic)
UPDATE: It seems GitLab may already have a solution: run sudo gitlab-ctl reconfigure. See here: https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md#git-ssh-access-stops-working-on-selinux-enabled-systems
In my situation the git user wasn´t set up completely. If you get in your log files messages like "User git not allowed because account is locked" (Under Centos or Redhat it´s /var/log/secure) than you simply need to activate the user via "passwd -d git"

Resources