Azure Datalake Store Create an empty file with C# - azure

How to create an empty file on Azure Data lake with C#. In one of the thread Create File From Azure Data Lake Store .NET SDK its mentioned to use- FileSystemOperationsExtensions.Create but how to use this to create an empty file.

Below is a snippet sample that accomplishes this:
// to find out how to get the credentials...
// see: https://github.com/Azure-Samples/data-lake-analytics-dotnet-auth-options
var adlCreds = ...;
string adls = "youradlsaccountname";
var adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(adlCreds);
var adlsaccount = adlsAccountClient.Account.Get(rg, adls);
adlsFileSystemClient.FileSystem.Create( adls, "/emptyfile.dat", null, null, null,null,null);

Related

How to Read a file from Azure Data Lake Storage with file Url?

Is there a way to read files from the Azure Data Lake. I have the Http url of the file. I want to read it direclty. How can i acheive it because I don't see a way to do it via the SDK.
Thanks for your help.
Regards
Did you check docs?
public async Task ListFilesInDirectory(DataLakeFileSystemClient fileSystemClient)
{
IAsyncEnumerator<PathItem> enumerator =
fileSystemClient.GetPathsAsync("my-directory").GetAsyncEnumerator();
await enumerator.MoveNextAsync();
PathItem item = enumerator.Current;
while (item != null)
{
Console.WriteLine(item.Name);
if (!await enumerator.MoveNextAsync())
{
break;
}
item = enumerator.Current;
}
}
You can also use ADLS Gen2 rest api ,
For example, you can write code like below with sas token authentication(or you can also use the shared key authentication):
string sasToken = "?sv=2018-03-28&ss=b&srt=sco&sp=rwdl&st=2019-04-15T08%3A07%3A49Z&se=2019-04-16T08%3A07%3A49Z&sig=xxxx";
string url = "https://xxxx.dfs.core.windows.net/myfilesys1/app.JPG" + sasToken;
var req = (HttpWebRequest)WebRequest.CreateDefault(new Uri(url));
//you can specify the Method as per your operation as per the api doc
req.Method = "HEAD";
var res = (HttpWebResponse)req.GetResponse();
If you know Blob APIs and Data Lake Storage Gen2 APIs can operate on the same data, then you can directly use the azure blob storage SDK to read file from ADLS Gen2.
First, install this nuget package: Microsoft.Azure.Storage.Blob, version 11.1.6.
Note that, in this case, you should use this kind of url "https://xxx.blob.core.windows.net/mycontainer/myfolder/test.txt" instead of that kind of url "https://xxx.dfs.core.windows.net/mycontainer/myfolder/test.txt".
Here is the sample code which is used to read a .txt file in ADLS Gen2:
var blob_url = "https://xxx.blob.core.windows.net/mycontainer/myfolder/test.txt";
//var blob_url = "https://xxx.dfs.core.windows.net/mycontainer/myfolder/test.txt";
var username = "xxxx";
var password = "xxxx";
StorageCredentials credentials = new StorageCredentials(username, password);
var blob = new CloudBlockBlob(new Uri(blob_url),credentials);
var mystream = blob.OpenRead();
using (StreamReader reader = new StreamReader(mystream))
{
Console.WriteLine("Read file content: " + reader.ReadToEnd());
}
//you can also use other method like below
//string text = blob.DownloadText();
//Console.WriteLine($"the text is: {text}");
The test result:

Delete unflushed file from Azure Data Lake Gen 2

To upload a file to ADL first you need to:
do a put request with the ?resource=file parameters (this creates a file on the ADL)
append data to the file with the ?action=append&position=<N> parameters
lastly, you need to flush the data with ?action=flush&position=<FILE_SIZE>
My question is:
Is there a way to tell the server how long the data should live if it is not flushed(written).
Since you need to create a file first to write data into it, there might be scenarios where the flush does not happen, and you are stuck with an empty file in the data lake.
I could not find anything on the Microsoft documentation about this.
Any info would be appreciated.
Updated 0219:
If you just call the append api, but not call the flush api, then the uncommitted data will be saved in azure within 7 days.
The uncommitted data will be deleted automatically after 7 days and cannot be deleted from the your end.
Origianl:
The SDK for Azure Datalake Storage Gen2 is ready, and you can use it to operate ADLS Gen2 more easier than using rest api.
If you're using .NET/c#, there is a SDK for Azure Datalake Storage Gen2: Azure.Storage.Files.DataLake.
Here is the official doc for how to use this SDK to operate ADLS Gen2, and the c# code below is used to delete a file / upload a file for ADLS Gen2:
static void Main(string[] args)
{
string accountName = "xxx";
string accountKey = "xxx";
StorageSharedKeyCredential sharedKeyCredential =
new StorageSharedKeyCredential(accountName, accountKey);
string dfsUri = "https://" + accountName + ".dfs.core.windows.net";
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClient
(new Uri(dfsUri), sharedKeyCredential);
DataLakeFileSystemClient fileSystemClient = dataLakeServiceClient.GetFileSystemClient("w22");
DataLakeDirectoryClient directoryClient = fileSystemClient.GetDirectoryClient("t2");
// use this line of code to delete a file
//directoryClient.DeleteFile("22.txt");
//use the code below to upload a file
//DataLakeFileClient fileClient = directoryClient.CreateFile("22.txt");
//FileStream fileStream = File.OpenRead("d:\\foo2.txt");
//long fileSize = fileStream.Length;
//fileClient.Append(fileStream, offset: 0);
//fileClient.Flush(position: fileSize);
Console.WriteLine("**completed**");
Console.ReadLine();
}
For Java, refer to this doc.
For Python, refer to this doc.

Azure Easy Tables - Max size of a row?

I want to ask if the Azure Easy Tables does have the row size limit ?
If yes what is the limit ?
Thanks for your answers !!!
We know the back end of easy table in Azure is Azure Sql . And a table can contain a maximum of 8,060 bytes per row in Azure Sql usually. There is also a LOB columns type that the limit is 2G. For better performance, I suggest you better don't reach 8060 bytes. If you have large data to store, you could put a link instead.
Can I save image to Azure blob storage and name, description to Azure Easy Tables
Yes, you could save image to Azure blob directly. You could try the following code:
Code in app.config(storage connection string):
<appSettings>
<add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;AccountName=×××;AccountKey=×××" />
</appSettings>
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("container001");
CloudBlockBlob blockBlob = container.GetBlockBlobReference("apple.jpg");
using (var fileStream = System.IO.File.OpenRead(#"D:\apple.jpg"))
{
blockBlob.UploadFromStream(fileStream);
}
In Easy Table, if I convert image to base64 string to store, I also get en error about the request entity is too large. So you could copy the blob image link to store in Easy Table instead. You could copy the link by click '...'>Blob properties>Url.
var client = new MobileServiceClient("https://[your mobile service name].azurewebsites.net");
IMobileServiceTable<SiteTable> todoTable = client.GetTable<SiteTable>();
SiteTable siteTable = new SiteTable(); //SiteTable is model class name
siteTable.MyImage = " blob image link";
await todoTable.InsertAsync(siteTable);

'AdlsClient' does not contain a definition for 'CreateDocumentAsync'

TARGET:
Create Console Application, which 1) Read json from Azure Data Lake Store 2) Store data to Cosmos DB as json.
PROBLEM:
I can read file (1), but cannot stores data to Cosmos. See error.
ERROR:
Severity Code Description Project File Line Suppression State
Error CS1061 'AdlsClient' does not contain a definition for 'CreateDocumentAsync' and no extension method 'CreateDocumentAsync' accepting a first argument of type 'AdlsClient' could be found (are you missing a using directive or an assembly reference?) CircleCustomActivityCosmosDB C:\AzureCoding\CustomActivityCosmosDB\Program.cs
CODE:
private async Task CreateDocumentsAsync()
{
string fileName = "/myjsonfile.json";
// Obtain AAD token for ADLS
var creds = new ClientCredential(applicationId, clientSecret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(tenantId, creds).GetAwaiter().GetResult();
// Create ADLS client object
AdlsClient client = AdlsClient.CreateClient(adlsAccountFQDN, clientCreds);
String json = "";
//Read file contents
using (var readStream = new StreamReader(client.GetReadStream(fileName)))
{
string line;
while ((line = readStream.ReadLine()) != null)
{
Console.WriteLine("Read file Line: " + line);
json += line;
}
}
//Read file to json
JsonTextReader reader = new JsonTextReader(new StringReader(json));
//Storing json to CosmosDB
Uri collectionUri = UriFactory.CreateDocumentCollectionUri(databaseName, collectionName);
//ERROR HAPPENS HERE - 'AdlsClient' does not contain a definition for 'CreateDocumentAsync' and no extension method 'CreateDocumentAsync'
await client.CreateDocumentAsync(collectionUri, reader);
}
}
There is no CreateDocumentAsync method defined for the class AdlsClient, see the docs
You need a CosmosDB client to create the document there, I think you want to use DocumentClient.CreateDocument instead.
The AdlsClient class gives to methods to access the Azure Data Lake Store only, not the CosmosDB datastore. For that, you need the DocumentClient class, see the docs.
So summarized:
TARGET: Create Console Application, which 1) Read json from Azure Data Lake Store 2) Store data to Cosmos DB as json.
for 1) you need AdlsClient (to access adls)
for 2) you need DocumentClient (to access cosmosdb)

Azure Storage Search Blobs by Metadata

I have CloudBlockBlobs that have metadata.
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob.jpg");
using (var fileStream = System.IO.File.OpenRead(filePath))
{
blockBlob.UploadFromStream(fileStream);
blockBlob.Properties.ContentType = "image/jpg";
blockBlob.Metadata.Add("Title", "Yellow Pear");
blockBlob.SetProperties();
}
I see the Metadata is there:
Debug.WriteLine(blockBlob.Metadata["Title"]);
Now later if I query from storage I see the blobs but the Metadata is missing:
(in the below I know blobItems[0] had Metadata when uploaded but now blobItems[0].Metadata.Count == 0)
var blobItems = container.ListBlobs(
null, false, BlobListingDetails.Metadata);
I also noticed the Metadata is not available when I obtain the blob by itself:
CloudBlockBlob a = container.GetBlockBlobReference("myblob.jpg");
//Below throws an exception
var b = a.Metadata["Title"];
Thank you!
There are some issues with your code :(.
The blob doesn't have any metadata set actually. After setting the metadata, you're calling blob.SetProperties() method which only sets the blob's properties (ContentType in your example). To set the metadata, you would actually need to call blob.SetMetadata() method.
Your upload code is currently making 2 calls to storage service: 1) upload blob and 2) set properties. If you call SetMetadata then it would be 3 calls. IMHO, these can be combined in just 1 call to storage service by doing something like below:
using (var fileStream = System.IO.File.OpenRead(filePath))
{
blockBlob.Properties.ContentType = "image/jpg";
blockBlob.Metadata.Add("Title", "Yellow Pear");
blockBlob.UploadFromStream(fileStream);
}
This will not only upload the blob but also set it's properties and metadata in a single call to storage service.
Regarding
I also noticed the Metadata is not available when I obtain the blob by
itself:
CloudBlockBlob a = container.GetBlockBlobReference("myblob.jpg");
//Below throws an exception
var b = a.Metadata["Title"];
Basically the code above is just creating an instance of the blob on the client side. It doesn't actually fetch the properties (and metadata) of the blob. To fetch details about the blob, you would need to call FetchAttributes method on the blob. Something like:
CloudBlockBlob a = container.GetBlockBlobReference("myblob.jpg");
a.FetchAttributes();
If after that you retrieve blob's metadata, you should be able to see it (provided metadata was created properly).

Resources