Use Hadoop SDK with local HDInsight Server - azure

Is it possible to use the Hadoop SDK, especially LINQ to Hive, with a local installation of HDInsight Server. Note that I am not refering to HDInsight Service hosted on Azure.
I tried to use LINQ to Hive from Microsoft.Hadoop.Hive Nuget package, but was unable to get it working, because LINQ to Hive seems to require that results are stored in Azure Blob Storage, rather than on my hosted instance.
var hiveConnection = new HiveConnection(new Uri("http://hadoop-poc.cloudapp.net:50111"), "hadoop", "hgfhdfgh", "hadoop", "hadooppartner", "StorageKey");
var metaData = hiveConnection.GetMetaData().Result;
var result = hiveConnection.ExecuteQuery(#"select * from customer limit 1");
Even with a storage key, I cannot get this to work, because the MapReduce job fails with:
AzureException: org.apache.hadoop.fs.azure.AzureException: Container a7e3aa39-75ba-4cc2-a8aa-301257018146 in account hadooppartner not found, and we can't create it using anoynomous credentials.
I also added the credentials once more to the core-site.xml file, as follows:
<property>
<name>fs.azure.account.key.hadooppartner.blob.core.windows.net</name>
<value>Credentials</value>
</property>
However I would rather get rid of storing results on Azure Storage, if possible.
Thank you for your help!

You can use the HiveConnection constructor without the storage account options to connect to a local install. This works against a default install of the HDInsights developer preview on a local box:
var db = new HiveConnection(
webHCatUri: new Uri("http://localhost:50111"),
userName: (string) "hadoop", password: (string) null);
var result = db.ExecuteHiveQuery("select * from w3c");
Of course you can then use that connection for any LINQ queries as well.

It turned out that in the HiveConnection constructor you have to specify the full storage account name, i.e. hadooppartner.blob.core.windows.net.
I am still interested to use the .NET LINQ API without the need for a storage account. Furthermore is it possible to use the .NET API with other Hadoop distributions?

Related

Error running Spark on Databricks: constructor public XXX is not whitelisted

I was using Azure Databricks and trying to run some example python code from this page.
But I get this exception:
py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.classification.LogisticRegression(java.lang.String) is not whitelisted.
This error shows up with some library methods when using High Concurrency cluster with credential pass through enabled. If that is your scenario a work around that may be an option is to use a different cluster mode.
py4j.security.Py4JSecurityException: ... is not whitelisted
This exception is thrown when you have accessed a method that Azure Databricks has not explicitly marked as safe for Azure Data Lake Storage credential passthrough clusters. In most cases, this means that the method could allow a user on a Azure Data Lake Storage credential passthrough cluster to access another user’s credentials.
Reference: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html

CosmosDB How to read replicated data

I'm using CosmosDB and replicating the data globally. (One Write region; multiple Read regions). Using the Portal's Data Explorer, I can see the data in the Write region. How can I query data in the Read regions? I'd like some assurance that it's actually working, and haven't been able to find any info or even an URL for the replicated DBs.
Note: I'm writing to the DB via the CosmosDB "Create or update document" Connector in a Logic App. Given that this is a codeless environment, I'd prefer to validate the replication without having to write code.
How can I query data in the Read regions?
If code is possible, we could access from every region your application is deployed, configure the corresponding preferred regions list for each region via one of the supported SDKs
The following is the demo code for Azure SQL API CosmosDB. For more information, please refer to this tutorial.
ConnectionPolicy usConnectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp
};
usConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS); //first preference
usConnectionPolicy.PreferredLocations.Add(LocationNames.NorthEurope); //second preference
DocumentClient usClient = new DocumentClient(
new Uri("https://contosodb.documents.azure.com"),
"<Fill your Cosmos DB account's AuthorizationKey>",
usConnectionPolicy);
Update:
We can enable Automatic Failover from Azure portal. Then we could drag and drop the read regions items to recorder the failover priorties.

Restoring Database from Azure Blob Storage failing from SSMS while using RESTORE FILELISTONLY

I am trying to restore a SQL 2016 database backup file which is in Azure Blob Storage from SSMS using the below T- SQL command :
RESTORE FILELISTONLY
FROM URL = 'https://.blob.core.windows.net//.bak'
GO
It works fine with my normal Azure subscription. But when I use a CSP account ,I get the below error :
Cannot open backup device 'https://.blob.core.windows.net//.bak'. Operating system error 86(The specified network password is not correct.).
Any help on fixing this issue is greatly appreciated.
Following the steps below you should be able to get the file-list.
First you need to create a 'credential': e.g.
create credential [cmbackupprd-sqlbackup]
with
identity = '<storageaccountname>',
secret = 'long-and-lengthy-storageaccountkey'
Now you can use this credential to connect to your storage-account.
restore filelistonly
from URL = 'https://yourstorageaccount.blob.core.windows.net/path/to/backup.bak'
with credential='storageaccount-credential'
Note, I'm asssuming the backup is made directly from sql to azure blob-storage. Otherwise you might need to check the blob-type.

How to do Azure Blob storage and Azure SQL Db atomic transaction

We have a Blob storage container in Azure for uploading application specific documents and we have Azure Sql Db where meta data for particular files are saved during the file upload process. This upload process needs to be consistent so that we should not have files in the storage for which there is no record of meta data in Sql Db and vice versa.
We are uploading list of files which we get from front-end as multi-part HttpContent. From Web Api controller we call the upload service passing the httpContent, file names and a folder path where the files will be uploaded. The Web Api controller, service method, repository, all are asyn.
var files = await this.uploadService.UploadFiles(httpContent, fileNames, pathName);
Here is the service method:
public async Task<List<FileUploadModel>> UploadFiles(HttpContent httpContent, List<string> fileNames, string folderPath)
{
var blobUploadProvider = this.Container.Resolve<UploadProvider>(
new DependencyOverride<UploadProviderModel>(new UploadProviderModel(fileNames, folderPath)));
var list = await httpContent.ReadAsMultipartAsync(blobUploadProvider).ContinueWith(
task =>
{
if (task.IsFaulted || task.IsCanceled)
{
throw task.Exception;
}
var provider = task.Result;
return provider.Uploads.ToList();
});
return list;
}
The service method uses a customized upload provider which is derived from System.Net.Http.MultipartFileStreamProvider and we resolve this using a dependency resolver.
After this, we create the meta deta models for each of those files and then save in the Db using Entity framework. The full process works fine in ideal situation.
The problem is if the upload process is successful but somehow the Db operation fails, then we have files uploaded in Blob storage but there is no corresponding entry in Sql Db, and thus there is data inconsistency.
Following are the different technologies used in the system:
Azure Api App
Azure Blob Storage
Web Api
.Net 4.6.1
Entity framework 6.1.3
Azure MSSql Database (we are not using any VM)
I have tried using TransactionScope for consistency which seems not working for Blob and Db, (works for Db only)
How do we solve this issue?
Is there any built in or supported feature for this?
What are the best practices in this case?
Is there any built in or supported feature for this?
As of today no. Essentially Blob Service and SQL Database are two separate services hence it is not possible to implement "atomic transaction" functionality like you're expecting.
How do we solve this issue?
I could think of two ways to solve this issue (I am sure there would be other as well):
Implement your own transaction functionality: Basically check for the database transaction failure and if that happens delete the blob manually.
Use some background process: Here you would continue to save the data in blob storage and then periodically find out orphaned blobs through some background process and delete those blobs.

Azure Drive addressing using local emulated blob store

I am unable to get a simple tech demo working for Azure Drive using a locally hosted service running the storage/compute emulator. This is not my first azure project, only my first use of the Azure Drive feature.
The code:
var localCache = RoleEnvironment.GetLocalResource("MyAzureDriveCache");
CloudDrive.InitializeCache(localCache.RootPath, localCache.MaximumSizeInMegabytes);
var creds = new StorageCredentialsAccountAndKey("devstoreaccount1", "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
drive = new CloudDrive(new Uri("http://127.0.0.1:10000/devstoreaccount1/drive"), creds);
drive.CreateIfNotExist(16);
drive.Mount(0, DriveMountOptions.None);
With local resource configuration:
LocalStorage name="MyAzureDriveCache" cleanOnRoleRecycle="false" sizeInMB="220000"
The exception:
Uri http://127.0.0.1:10000/devstoreaccount1/drive is Invalid
Information on how to address local storage can be found here: https://azure.microsoft.com/en-us/documentation/articles/storage-use-emulator/
I have used the storage emulator UI to create the C:\Users...\AppData\Local\dftmp\wadd\devstoreaccount1 folder which I would expect to act as the container in this case.
However, I am following those guidelines (as far as I can tell) and yet still I receive the exception. Is anyone able to identify what I am doing wrong in this case? I had hoped to be able to resolve this easily using a working sample where someone else is using CloudDrive with 127.0.0.1 or localhost but was unable to find such on Google.
I think you have passed several required steps before mounting.
You have to initialize the local cache for the drive, and the URI of the page blob containing the Cloud Drive before mounting it.
Initializing the cache:
// Initialize the local cache for the Azure drive
LocalResource cache = RoleEnvironment.GetLocalResource("LocalDriveCache");
CloudDrive.InitializeCache(cache.RootPath + "cache", cache.MaximumSizeInMegabytes);
Defining the URI of the page blob, usually made in the configuration file:
// Retrieve URI for the page blob that contains the cloud drive from configuration settings
string imageStoreBlobUri = RoleEnvironment.GetConfigurationSettingValue("< Configuration name>");

Resources