How to target Local Path on TrainCustomModelAsync Form Recognizer - azure

Can someone explain to me how TrainModelAsync can access local path on windows as the Source files.
The documentation said:
The request must include a source parameter that is either an externally accessible Azure storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained is expected to be under the source folder or sub folders under it. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg', 'image/png', 'image/tiff'. Other type of content is ignored.
What is the valid format for example i have the train files in C:\input\ ?
What is input mount configuration setting value?
Here is my code: (This run successfully if I set the "Source" property to a blob storage)
var client = new HttpClient();
var uri = "https://MYRESOURCENAME.cognitiveservices.azure.com/formrecognizer/v2.0-preview/custom/models/";
// Request headers
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", ENDPOINT_KEY);
var body =
new
{
source = new Uri("C:\\train\\").AbsolutePath,
sourceFilter = new
{
prefix = "",
includeSubFolders = false
},
useLabelFile = true
};
StringContent stringContent = new StringContent(JsonConvert.SerializeObject(body), Encoding.UTF8, "application/json");
var response = await client.PostAsync(uri, stringContent);

The local path option only applies when you run the Form Recognizer service as a container in your own Docker/Kubernetes environment. The hosted Form Recognizer service can only read training data from an Azure Blob Container URL.
That said, local containers are currently only available for the older v1.0-preview. You can read more about v1.0-preview container at https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/form-recognizer-container-howto

Related

Write file and then upload to cloud storage - NodeJS

I'm trying to upload a file to my bucket after it's written but I'm not sure how to do it.
I confirm that code to write the file is ok as I tested it locally and it's working normally.
bucket.upload doesn't seem to work as the file is saved locally.
bucket.file.save is also not working
the file is saved at "./public/fileName.xlsx".
When I use:
storage.bucket("bucketName").file("bucketFileName").save("./public/fileName.xlsx")
There's indeed a file been uploaded to the storage, but its content is the path string that I'm passing inside .save()
So to resume my question is: How do I write a file and then upload it to my bucket?
ps: the file is an excel worksheet
If you confirmed that the file is saved locally and just want to upload it to the bucket, you may refer to the sample code below:
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
// Change to your bucket name
const bucketName = 'bucket-name';
async function uploadFile(path, filename) {
// Path where to save the file in Google Cloud Storage.
const destFileName = `public/${filename}`;
const options = {
destination: destFileName,
// Optional:
// Set a generation-match precondition to avoid potential race conditions
// and data corruptions. The request to upload is aborted if the object's
// generation number does not match your precondition. For a destination
// object that does not yet exist, set the ifGenerationMatch precondition to 0
// If the destination object already exists in your bucket, set instead a
// generation-match precondition using its generation number.
preconditionOpts: {ifGenerationMatch: generationMatchPrecondition},
};
// The `path` here is the location of the file that you want to upload.
await storage.bucket(bucketName).upload(path, options);
console.log(`${path} uploaded to ${bucketName}`);
}
uploadFile('./public/fileName.xlsx', 'fileName.xlsx').catch(console.error);
Added some comments on the sample code.
For more information, you may check this documentation.

Do you know what could be causing my azure function app to throw a 503 error after running 30 seconds?

` [FunctionName("FileShareDirRead02")]
public static async Task Run(
[HttpTrigger(AuthorizationLevel.Function, "get","post", Route = null)] HttpRequest req,
ILogger log)
{
//Get the contents of the POST and store them into local variables
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
//The following variable values are being passed in to the function through HTTP POST, or via parameters specified in the data factory pipeline
string storageAccount = data.storageAccount; //Name of storage account containing the fileshare you plan to parse and remove files
string fileshare = data.fileShare; //Name of the fileshare within the storage account
string folderPath = data.folderPath; // with no leading slash, this will be considered the ROOT of the fileshare. Parsing only goes down from here, never up.
string keyVaultName = data.keyvaultName; //Name of the key valut where the storage SAS token is stored
int daysoldbeforeDeletion = data.daysoldbeforeDeletion; //Number of days old a file must be before it is deleted from the fileshare
string nameofSASToken = data.nameofsasToken; //Name of SAS token created through PowerShell prior to execution of this function
string storageAccountSAS = storageAccount + "-" + nameofSASToken; //Format of the storage account SAS name
string kvUri = "https://" + keyVaultName + ".vault.azure.net/"; //URI to the key vault
var client = new SecretClient(new Uri(kvUri), new DefaultAzureCredential()); //Instantiate an instance of a SecretClient using the key vault URI
var storageKey1 = await client.GetSecretAsync(storageAccountSAS); //Obtain SAS Token from key vault
string key = storageKey1.Value.Value; //Assign key the SIG value which is part of the SAS Token
key = key.Substring(1); //Trim the leading question mark from the key value since it is not a part of the key
string connectionString = "FileEndpoint=https://" + storageAccount + ".file.core.windows.net/;SharedAccessSignature=" + key; //Define the connection string to be used when creating a Share Client
ShareClient share = new ShareClient(connectionString,fileshare); // Instantiate a ShareClient which will be used to manipulate the file share
var folders = new List<Tuple<string, string>>(); //reference a new list 2-tuple named folders which will include our directories from our share in our Azure Storage Account
ShareDirectoryClient directory = share.GetDirectoryClient(folderPath); // Get a reference to the directory supplied in the POST
Queue<ShareDirectoryClient> remaining = new Queue<ShareDirectoryClient>(); // Track the remaining directories to walk, starting from the folder path provided in the POST
remaining.Enqueue(directory);
while (remaining.Count > 0) // Keep scanning until all folders and files have been evaluated
{
ShareDirectoryClient dir = remaining.Dequeue(); // Get all of the next directory's files and subdirectories
if (dir.GetFilesAndDirectoriesAsync() != null) //Make sure the folder path exists in the fileshare
{
//return new OkObjectResult("{\"childItems\":" + JsonConvert.SerializeObject(remaining.Count) + "}"); //Returns a list of all files which were removed from the fileshare
await foreach (ShareFileItem item in dir.GetFilesAndDirectoriesAsync()) //For each directory and file
{
if (!(item.IsDirectory)) //Make sure the item is not a directory before executing the below code
{
ShareFileClient fileClient = new ShareFileClient(connectionString, fileshare, dir.Path + "/" + item.Name); //Create the File Service Client
if (fileClient.Exists())
{
ShareFileProperties properties = await fileClient.GetPropertiesAsync(); //Get the properties of the current file
DateTime convertedtime = properties.LastModified.DateTime; //Get the last modified date and time of the current file
DateTime date = DateTime.UtcNow; //Get today's date and time
TimeSpan timeSpan = date.Subtract(convertedtime); //Subtract last modified date/time from today's date/time
int dayssincelastmodified = timeSpan.Days; //Assign the number of days between the two dates
if (dayssincelastmodified > daysoldbeforeDeletion)
{
folders.Add(new Tuple<string, string>(item.Name, fileClient.Path)); // Add the directory names and filenames to our list 2-tuple
fileClient.Delete(); //Delete the file from the share
}
}
}
if (item.IsDirectory) // Keep walking down directories
{
remaining.Enqueue(dir.GetSubdirectoryClient(item.Name));
}
}
}
}
return new OkObjectResult("{\"childItems\":" + JsonConvert.SerializeObject(folders) + "}"); //Returns a list of all files which were removed from the fileshare
}
}
}`I have written a function app using MS Visual Studio C# and published it to an azure function app. The app is very simple. It reads a directory and all subdirectories of a file share looking for files that have not been modified in the last 90 days. If so, the files are deleted. This function works fine when reading a small set of directories and files. But when I run it on a directory with say a 1000 or more files, the app crashes with a 503 error saying the service is not available and to check back later. I am using an App Service Plan, Standard. I thought maybe it was timing out but this type of plan is not supposed to prevent an app from running, no matter how long it runs. To be sure, I put a line in my host.json file "functionTimeout": "01:00:00" to make sure that was not the problem. I cannot find a single log entry that explains what is happening. Any ideas on how to debug this issue?
This problem is often caused by application-level issues, such as:
requests taking a long time
application using high memory/CPU
application crashing due to an exception.
Seems like your function is taking more time to return an HTTP response. As mentioned in the documentation, 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request. Please refer to this.
Also, It is also possible to prevent the autorun of the Azure function by specifying the proper value for the parameter functionTimeout of the host.json file and where you set the functionTimeout to 1 hour.
To debug this issue, refer this azure-app-service-troubleshoot-http-502-http-503 MSFT documentation and follow the troubleshooting steps provided.
For longer processing times, use Azure Durable Functions async pattern. Refer this MS Doc.

Cannot set the ContentType of a blob in Azure Data Lake Gen2 in csharp

I'm using the Azure.Storage.Files.DataLake nuget package to write and append file on my Azure Storage account that is enabled for Data Lake Gen2 (including hierarchy).
However, I need to set the Content Type of a new file I create and I don't succeed in it, although I thought I had things written correctly. Here's my code:
public async Task<bool> WriteBytes(string fileName, byte[] recordContent, string contentType)
{
var directory = await CreateDirectoryIfNotExists();
var fileClient = await directory.CreateFileAsync(fileName,
new PathHttpHeaders
{
ContentType = contentType
})).Value;
long recordSize = recordContent.Length;
var recordStream = new MemoryStream(recordContent);
await fileClient.AppendAsync(recordStream, offset: 0);
await fileClient.FlushAsync(position: recordSize);
return true;
}
The result after the execution of the above code looks like this (the default content-type is kept):
Thanks for any insights
I am able to reproduce the behavior you're seeing. Basically the issue is with fileClient.FlushAsync() method. Before calling this method, I checked the content type of the file and it was set properly. However after execution of this method, the content type was changed to application/octet-stream (which is the default).
Looking at the documentation for this method here, you can also set the headers in this method. I tried doing the same and the content type changed to the desired one.
var headers = new PathHttpHeaders()
{
ContentType = "text/plain"
};
await fileClient.FlushAsync(position: recordSize, httpHeaders: headers);

Azure Blob Storage - Content Type Setting on Download and Upload

I am trying to upload a word document to blob storage using C#. The code snippet is as below:
var blobServiceClient = new BlobServiceClient(connectionString);
var containerClient = blobServiceClient.GetBlobContainerClient(container);
containerClient.CreateIfNotExists(PublicAccessType.None);
var blobClient = containerClient.GetBlobClient(documentName);
using (var memoryStream = new MemoryStream({Binary of the File}))
{
var headers = new BlobHttpHeaders() { ContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document" };
await blobClient.UploadAsync(memoryStream, httpHeaders: headers).ConfigureAwait(false);
}
The document gets uploaded successfully to the Blob Storage. I can see that the content type is also being set properly while looking through Azure Storage Explorer. But, when i try to access this document (using document URL) through browser (chrome), it downloads the file as unknown file.
I tried the same by uploading a word document through Azure Storage Explorer. This file gets downloaded as word document while downloading it through browser.
Any idea what is going wrong?
As you see the figure below, the Content-Type header is stored as a property in the Blob Properties, so it will be set into the headers of the download response and then it will be recognized as a word file while download by browser.
So when you upload a word file, it's necessary to set the ContentType property for a blob via BlobBaseClient.SetHttpHeaders(BlobHttpHeaders, BlobRequestConditions, CancellationToken) Method or BlobBaseClient.SetHttpHeadersAsync(BlobHttpHeaders, BlobRequestConditions, CancellationToken) Method.
And then to write the Content-Typeheader of the headers of a download response with the value of contentType below by yourself on your server app.
Response<BlobProperties> response = await blobClient.GetPropertiesAsync();
var contentType = response.Value.ContentType
Or the download response from a blob url with sas token which headers default include the ContentType property as the Content-Type header.

Azure Web App Bot - Access local resources

My Web App Bot should return images, based on the request. The images are located in the .csproj in a folder, with following configuration
The sourcecode to send the image to the user
var imgMessage = context.MakeMessage();
var attachment = new Attachment();
attachment.ContentUrl = $"{HttpContext.Current.Request.Url.Scheme}://{HttpContext.Current.Request.Url.Authority}/Resources/{InvocationName}/{InvocationNameExtension}.jpg";
attachment.ContentType = "image/jpg";
attachment.Name = "Image";
context.PostAsync(attachment.ContentUrl);
While it works locally, it doesn't work after it has been published to the Azure cloud. However, the path to the Azure cloud is something like: https://xxxx.azurewebsites.net/Resources/img/Cafeteria.jpg
The FTP upload did include the file
2>Adding file (xxxx\bin\Resources\img\Cafeteria.jpg).
The file is on the server, but it can't be accessed. How am I supposed to include an image, located in the .csproj? I don't want to refer to an external URL due independency.
Changed the Build Action to: "Embedded Resource".
string resourceFile = ResourceManager.FindResource(InvocationName, InvocationNameExtension);
string resourceFileExtension = ResourceManager.GetResourceExtension(resourceFile);
var attachment = new Attachment();
attachment.ContentUrl = BuildImageUrl(resourceFile, resourceFileExtension);
attachment.ContentType = $"image/{resourceFileExtension}";
private string ConvertToBase64(string resourceFile) => Convert.ToBase64String(ResourceManager.GetBytes(resourceFile));
private string BuildImageUrl(string resourceFile, string resourceFileExtension) => "data:image/" + resourceFileExtension + ";base64," + ConvertToBase64(resourceFile);
With this approach I send directly the content of the image via base64 to the user. Works like a charm

Resources