Accessing pig logs on HDInsight - azure

How do I access pig log files on HDInsight?
When pig errors the output says:
Details at logfile: C:\apps\dist\hadoop-1.2.0.1.3.7.1-01293\logs\pig_1399635949926.log
I can't find anything like that in the blob store and I have no idea how to access that path.
Cheers.

It is because it exists on the C:\ drive of the headnode. Simple steps to access it:
Go the manage.windowsazure.com and go to your HDI cluster
Enable remote access, choose a user and a password (note admin is reserved)
Then just login the machine
Open the explorer and go to that path
Either copy&paste to your local machine or open it in notepad

Related

Unable to mount file on windows from Azure

I made a file on Azure using "File Service" and then tried to mount it using "connect". It has given me the username: localhost\xyz.
Two questions:
why username starting from "localhost" and not with "Azure"?
why I am unable to mount as windows security not giving any error, instead keep on turning back to credentials page?
p.s. TCP port 445 working properly..
Here are a few workarounds that worked for us.
WAY-1
You can directly go to your PowerShell of your machine and paste the script that you have provided in your storage account
WAY-2
You can click on More options and select for different account and then use the storage account name prepended with AZURE\ as the username and a storage account key as the password.
WAY-3
You can create a file share directly by unchecking the connect using different credentials.
OUTPUT:
For all the above ways here is the screenshot of fileshares that got mounted.
REFERENCES:
Mount SMB Azure file share on Windows

How to download an installed dbfs jar file from databricks cluster to local machine?

I am new to Databricks and I wish to download an installed library of a databricks cluster to my local machine. Could you please help me with that?
So to elaborate I already have a running cluster on which libraries are already installed. I need to download some of those libraries (which are dbfs jar files) to my local machine. I actually have been trying to use the '''dbfs cp''' command through the databricks-cli but that is not working. It is not giving any error but it's not doing anything either. I hope that clears things a bit.
Note: When you installed libraries via Jars, Maven, PyPI, those are located in the folderpath dbfs:/FileStore.
For Interactive cluster Jars located at - dbfs:/FileStore/jars
For Automated cluster Jars located at - dbfs:/FileStore/job-jars
There are couple of ways to download an installed dbfs jar file from databricks cluster to local machine.
GUI Method: You can use DBFS Explorer
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks.
You will need to create a bearer token in the web interface in order to connect.
Step1: Download DBFS explorer from Here:https://datathirst.net/projects/dbfs-explorer and install.
Step2: How to create a bearer token?
Click the user profile icon User Profile in the upper right corner of
your Databricks workspace.
Click User Settings.
Go to the Access Tokens tab.
Click the Generate New Token button.
Note: Copy the generated token and store in a secure location.
Step3: Open DBFS explorer for Databricks and Enter Host URL and Bearer Token and continue.
Step4: Navigate to the DBFS folder named FileStore => jars => Select the jar which you want to download and click download and select the folder on the local machine.
CLI Method: You can use Databricks CLI
Step1: Install the Databricks CLI, configure it with your Databricks credentials.
Step2: Use the CLI "dbfs cp" command used to Copy files to and from DBFS.
Syntax: dbfs cp <SOURCE> <DESTINATION>
Example: dbfs cp "dbfs:/FileStore/azure.txt" "C:\Users\Name\Downloads\"

How to Create a Pig Latin Job in Azure HDInsight Cluster

I just created a free account to Azure and created a hadoop cluster on HDInsight. However, it dose not in anyway show how to launch a Pig client and create pig latin job and run it?
Unfortunately, you cannot use Ambari UI to run pig latin jobs.
Note: To process data using Pig, will need to open an SSH console that is connected to your cluster and then run the pig latin using local mode or mapreduce mode:
If you are using a Windows client computer:
In the Microsoft Azure portal, on the HDInsight Cluster blade for your HDInsight cluster, click Secure Shell, and then in the Secure Shell blade, in the Hostname list, note the Host name for your cluster (which should be your_cluster_name-ssh.azurehdinsight.net).
Open PuTTY, and in the Session page, enter the host name into the Host Name box. Then under Connection type, select SSH and click Open. If a security warning that the host certificate cannot be verified is displayed, click Yes to continue.
When prompted, enter the SSH username and password you specified when provisioning the cluster (not the cluster login username).
If you are using a Mac OS X or Linux client computer:
In the Microsoft Azure portal, on the HDInsight Cluster blade for your HDInsight cluster, click Secure Shell, and then in the Secure Shell blade, in the Hostname list, select the hostname for your cluster. then copy the ssh command that is displayed, which should resemble the following command – you will use this to connect to the head node.
ssh sshuser#your_cluster_name-ssh.azurehdinsight.net
Open a new terminal session, and paste the ssh command, specifying your SSH user name (not the cluster login username).
If you are prompted to connect even though the certificate can’t be verified, enter yes.
When prompted, enter the password for the SSH username.
Once you connected to your cluster, to run pig latin as shown:
You can execute Pig Latin statements:
Using grunt shell or command line
In mapreduce mode or local mode
Either interactively or in batch
Reference: Pig Manual

MarkLogic - Forest data folder & Azure Blob

Technical Stack
MarkLogic 9.0
Cenos Linux
Azure Blob
Blobfuse
To make sure we do not have to worry about data disk size for MarkLogic Forest, we have configured Azure Blob to one of folder in Linux machine, so we do not have to worry about disk size.
There are few things i noticed
Need to create folder in Linux
Create folder and point it to above folder
Then configure Blobfuse else we are getting permission denied while creating forest
Use below command to give permission to all
chmod 777 -R
Now when we started importing using MarkLogic Content Pump (MLCP)
19/03/15 17:01:19 ERROR mapreduce.ContentWriter: SVC-FILSTAT: File status error: stat64 '/mnt/mycontainer/Forests/forest-01/000043e5': Permission denied
So if you look at below image
1st we tried with mycontainer but as soon as we map it to Azure Blob, it does not looks green as azureblob which is. We still need to map azureblob to "azureblob" folder.
It seems i am missing something here, anything to do with Azure Blob security settings?
With the test, when you mount the Azure Blob to Linux, for example, Ubuntu 18.04 (which I'm using), if you want to allow other users to use the mount directory, you can add the parameter -o allow_other when you execute the command blobfuse.
To allow access to all users, you can mount via the option -o
allow_other.
Also, I think you should give others permission through the command chown. For more details, see How to mount Blob storage as a file system with blobfuse.
First i would like to thanks Charles for his efforts and extended help on this issue, Thanks Charls :). I am sure this will help me sometime, somewhere.
I got link on how to setup MarkLogic on Aure
On Page No. 27, steps to Configuring MarkLogic for Azure Blob Storage
In summary it is
Create Storage account in Azure
Create Blob container
Go to MarkLogic server (http://localhost:8001)
Go to Security -> Credentials
Provide Storage account and Azure storage key
While creating MarkLogic Forest, mentioned container path in data directory
azure://mycontainer/mydirectory/myfile
And you are done. No Blobfuse, no drive mount, just a configuration in MarkLogic
Awesome!!
Its working like dream :)

Azure Files preview - access shared folder in IIS and FileZilla

I'm interested in load balancing 2+ Windows VMs in Azure. My primary requirement, though, is that an 'uploads' folder would need to be consistent between each VM. Files in this folder are FTPed by our admin users, and they would then need to select these files in a C# MVC Web app. As you may connect through FTP to one VM, but a Web connection might be to another, the uploads have to be centralised.
It looked as if the new Azure Files, currently in Preview, would help, in that they let me set up a shared drive that each of the VMs could access. My thought was that FileZilla Server would allow FTPing up to this shared 'drive', and the Web app would access it to show the contents.
I've signed up to the Azure Files Preview, and set up the share, persistently mapping it to Drive Z for the sake of experimentation. I've also created a new user and made sure they too have persistent mapping to this same drive as Z.
But I can't seem to do anything with this outside of the Remote Desktop. FileZilla, despite having its Service set to log on using this new account, won't show the contents of this drive, or write anything to it. Likewise my Web App isn't able to access the file contents, despite switching Passthrough Authentication to this new account for the virtual folder.
Does anyone know any way of accessing this drive either through the network path or drive letter? Is this just not possible with Azure Files as they are? Are there any other solutions to sharing some blobs across VMs, but treating it as a local drive or network share?
[UPDATE]
This might help. Having set up the share, and used cmdkey and net use while in a cmd prompt runas a specially created user (as suggested in http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/27/persisting-connections-to-microsoft-azure-files.aspx), if I point a virtual folder in IIS to this share, using the specific account created, and Test Connection, I get:
Test: Authentication (green tick; "The specified user credentials are valid")
Test: Authorization (red cross; "The path does not exist or environment variables in the path could not be expanded to verify whether it exists.")
While still in a runas cmd prompt, I can access the share, so it's not a specific permissions issue. It just seems to be that IIS cannot use that user to access the share, for some reason. The limitation of Azure Files is that I cannot specifically grant any kinds of permissions on the folder within that share.
What worked for me is the following:
Create a new account
Set the IIS App Pool Identity to a this specific user
Set the IIS App Pool Load User Profile property to true
start a cmd promt as this user (runas)
do cmdkey and net use (with /persistent:true switch), as you described
create IIS Virtual Diretory with physical path set to UNC share path (not the mapped drive)
A little PowerShell snippet for point 5:
$share = "your-storage-account.file.core.windows.net\yoursharename"
$usr = "your-storage-account"
$key = "your-storage-key"
#store credentials for the network share - must be done for the user that will run the app pool
cmdkey /add:subclub.file.core.windows.net\images /user:$usr /pass:$key
net use z: "\\$share" /user:$usr $key /persistent:yes
The answers here proved helpful.
Setup
Create a new user {appuser}
Open a command windows as the user
runas /user:{appuser} cmd.exe
In the new {appuser} cmd window use
cmdkey /add:{storage-account}.file.core.windows.net
/user:{storage-account} /pass:{account-key}
Set the IIS Application pool to use {appuser}
4b. Set LoadUserProfile to true
Notice no need for the net use. Don't need the mapped drive.
Code
Now here's the key piece. From your app you must write to the UNC path.
\{storage-account}.file.core.windows.net\
ex.
File.WriteAllText("\\\\{storage-account}.file.core.windows.net\\share\test.txt", "contents goes here");

Resources