Writing and Reading to Local Storage in Azure WebJobs

Writing and Reading to Local Storage in Azure WebJobs - azure

I need to use local storage in an Azure WebJob (continuous if it matters). What is the recommended path for this? I want this to be as long-lasting as possible, so I am not wanting a Temp directory. I am well aware local storage in azure will always need to be backed by Blob storage or otherwise, which I already will be handling.
(To preempt question on that last part: This is a not frequently changing but large file (changes maybe once per week) that I want to cache in local storage for much faster times on startup. When not there or if out of date (which I will handle checking), it will download from the source blob and so forth.)
Related questions like Accessing Local Storage in azure don't specifically apply to a WebJob. However, this question is vitally connected, but 1) the answer replies on using Server.MapPath which is a System.Web dependent solution I think, and 2) I don't find this answer to have any research or definitive basis (though it is probably a good guess for the best solution). It would be nice if the Azure team gave more direction on this important issue, we're talking about nothing less than usage of the local hard drive.
Here are some Environment variables worth considering, though I don't know which to use:
Environment.CurrentDirectory: D:\local\Temp\jobs\continuous\webjobname123\idididid.id0
[PUBLIC, D:\Users\Public]
[ALLUSERSPROFILE, D:\local\ProgramData]
[LOCALAPPDATA, D:\local\LocalAppData]
[ProgramData, D:\local\ProgramData]
[WEBJOBS_PATH, D:\local\Temp\jobs\continuous\webjobname123\idididid.id0]
[SystemDrive, D:]
[LOCAL_EXPANDED, C:\DWASFiles\Sites\#1appservicename123]
[WEBSITE_SITE_NAME, webjobname123]
[USERPROFILE, D:\local\UserProfile]
[USERNAME, RD00333D444333$]
[WEBSITE_OWNER_NAME, asdf1234-asdf-1234-asdf-1234asdf1234+eastuswebspace]
[APP_POOL_CONFIG, C:\DWASFiles\Sites\#1appservicename123\Config\applicationhost.config]
[WEBJOBS_NAME, webjobname123]
[APPSETTING_WEBSITE_SITE_NAME, webjobname123]
[WEBROOT_PATH, D:\home\site\wwwroot]
[TMP, D:\local\Temp]
[COMPUTERNAME, RD00333D444333]
[HOME_EXPANDED, C:\DWASFiles\Sites\#1appservicename123\VirtualDirectory0]
[APPDATA, D:\local\AppData]
[WEBSITE_INSTANCE_ID, asdf1234asdf134asdf1234asdf1234asdf1234asdf1234asdf12345asdf12342]
[HOMEPATH, \home]
[WEBJOBS_SHUTDOWN_FILE, D:\local\Temp\JobsShutdown\continuous\webjobname123\asdf1234.pfs]
[WEBJOBS_DATA_PATH, D:\home\data\jobs\continuous\webjobname123]
[HOME, D:\home]
[TEMP, D:\local\Temp]

Using the %HOME% environment variable as a base path works for me nicely. I use a subfolder to store job-specific data, but other folder structure on top of this base path can be valid. For more details take a look at https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system and https://github.com/projectkudu/kudu/wiki/File-structure-on-azure

Related

Archive not found in the storage location: Google Function

I have a Running Google Function which I use in my code and it works fine.
But when I go to Google function to see the source code it shows:
Archive not found in the storage location
Why Can't I see my source code? What should I do?
Runtime: Node.js 10

There are two possible reasons:
You may have deleted the source bucket in Google Cloud storage. Have you perhaps deleted GCS bucket named like gcf-source-xxxxxx? It is the source storage where your code is archived. If you are sure you have deleted the source bucket, There is no way to restore your source code.
Much more likely, though, is that you did not delete anything but instead renamed the bucket, for example by choosing a near city for the location settings. If the GCS bucket's region does not match your Cloud function region, the error is thrown. You should check both services' region.
You can check the Cloud Function's region at details -> general information

This error had appeared before when I browsed the Google Storage location that is used by the cloud function - without deleting anything there. It might have happened, though, that I changed the location / city of the bucket to Region MY_REGION (MY_CITY). In my case, the CF was likely already at the chosen region, therefore the other answer above, bullet point 2., probably does not cover the whole issue.
I guess it is about a third point that could be added to the list:
+ 3. if you firstly choose a region at all, the bucket name gets a new suffix that was not there before, that is, from gcf-sources-XXXXXXXXXXXX to gcf-sources-XXXXXXXXXXXX-MY_REGION. Then, the CF is not be able to find its source code anymore in the old bucket address. That would explain this first error.
First error put aside, now the error in question appears again, and this time I have not done anything to get Google app engine deployment fails- Error while finding module specification for 'pip' (AttributeError: module '__main__' has no attribute '__file__'). I left it for two days, doing anything, only to get the error in question afterwards. Thus, you seem to sometimes just lose your deployed script out of nowhere, better keep a backup before each deployment.
Solution:
Create a new Cloud Function or
edit the Cloud Function, choose Inline Editor as source code, create the default files for Runtime Node.js 10 manually and fill them with your backup code.

Storing files on a datastore and getting back their location to store for later retrieval

I will need to store photo's on a datastore, store the location in Mongo so I can retrieve it later to display. What is a good way to go about doing this? If I just google I end up getting information on accessing the files in node itself.....
At this point I am not sure what datastore I will use, for now just on another server I have, Ubuntu Server.

I made something like what you need. I had to store photos on s3 and store the path on mongo. If you use amanzon too, they provide a sdk, that contains all possible functions for interactions with the cloud. So you'll need a way to work with asynchronous tasks when store the data in the cloud, I used async module to manage the functions. I hope I gave you some direction where go to. I can't help you code, once I don't know your business rules, good lucky

For a web app that allows simple image uploads, how should I store the images? Confused about file system vs. cdn

Every search result says something about storing the images in the file system but store the paths in the database, but I'm not sure exactly what "file system" means. Would that mean you have something like:
/public (assets)
/js
/css
/img
/app (frontend)
/server (backend)
and you'd upload directly to that /public/img directory?
I remember trying something like that in the past with a Node.js app hosted on Heroku, and it wouldn't let me. I had to set up Amazon S3 and upload the images THERE, which leads to my confusion.
Is using something like Amazon S3 the usual practice or do people upload directly to the /img directory (assuming this is the "file system"?) and it just happened to be the case that Heroku doesn't allow this but other hosts do?

I'd characterize the pattern as "store the data in a blob storage service, store a pointer in your database". The uploaded file is the "blob" - once it has left the user's computer and filesystem, is it really a file anymore? :) On the server, a file system can store that "blob". S3 can store that blob. In the first case, you are storing a path. In the second case, you are storing the URL to the S3 object. A database could even store that blob (not at all recommended, though...)
In any case, the question to ask is: "what happens when I need two app servers to support my traffic?". Wherever that blob goes, both app servers need access to it.
In a data center under your control, there are many ways to share a filesystem across servers - network attached storage (NFS- or SMB-mounted volumes), or storage area networks (iSCSI, Fibre Channel). With more limited network/hardware configuration options in cloud-based Infrastructure/Platform-as-a-Service providers, the de facto standard is S3 because it is inexpensive, reliable, easy to use, and can completely offload serving the file from your servers.
For Heroku, though, you don't have much control over the file system. And, know that the file system for each of your dynos is "ephemeral" - it goes away when the dyno restarts. Which will happen when your app goes idle, or every 24 hours, whichever comes first. So that forces the choice a little.
Final point - S3 comes with the ancillary benefit of taking the burden of serving the blob off of your servers. You can also store files directly to S3 from the browser, without routing it through your app (see https://devcenter.heroku.com/articles/s3-upload-node). The benefit in both cases is that those downloads/uploads can take up lots of your application's precious time for stuff that's pretty rote.

Uploading directly to a host file system is generally not a best practice. This is one reason services like S3 are so popular.
If you're using the host file system and ever need more than one instance of a server, the file systems will grow out of sync. Imagine one user uploads 'foo.jpg' to server A (A/app/uploads) and another uploads 'bar.jpg' to server B (B/app/uploads). When either of these images is later requested, the request has a 50% chance of failing, depending on whether the load balancer routes the request to server A or server B.
There are several ancillary benefits to avoiding the host filesystem. For instance, you can set the filesystem serving your app to read-only for increased security. Files are a form of state, and stateless web servers allow you to do things like blow away one instance and deploy another instance to take over its work.

You might find this of help:
https://codeforgeek.com/2014/11/file-uploads-using-node-js/
I used multer in my node.js server file to handle uploading from the front end. Basically I had an html form that would submit the image to the server file, where it would be handled by multer. This actually led it to be saved in the file system (to answer your question concretely, yes, this was to something like the /img directory right in your project file structure). My application is running on heroku, and this feature works on there as well. However, I would not recommending using the file system to store your image like this (I doubt you will have enough space for a large amount of images/files) - using AWS storage or a DB would be better.

Azure Storage copy an image from blob to blob

I am using Azure Storage Nodejs and what i need to do is to copy image from one blob to another.
First i tried to getBlobToFile to get the image on temp location in disk and then just createBlockBlobFromFile from that temp location. That method did the task, but for some reason it didn't copied completely in 10% of cases.
The i was trying to use getBlobToText and the result of that put into createBlockBlobFromText, also tried to put options which is need blob to be image container. That method failed completely, image not even opened after copy.
Perhaps there is a way to copy blob file and paste it in other blobl but i didn't find that method.
What else can i do?

I'm not sure what your particular copy-error is, but... with getLocalBlobToFile(), you're actually physically moving blob content from blob storage to your VM (or local machine), and then with createBlockBlobFromLocalFile() you're pushing the entire contents back to blob storage, which is resulting in two physical network moves.
The Azure Storage system supports blob-copy as a 1st-class operation. While it's available via REST API call, it's also wrapped in the same SDK you're using, in the method BlobService.startCopyBlob() (source code here). This will instruct the storage to initiate an async copy operation, completely within the storage system (meaning no download+upload on your side). You'll be able to set source and destination, set timeouts, etc. (all parameters are fully documented in the source code).

The link in the accepted answer is broken, although the method is correct: the method startCopyBlob is documented here
(Updated: Jan 3, 2020) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile
(The old link) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile

Azure Block Blob PUT fails when using HTTPS

I have written a Cygwin app that uploads (using the REST API PUT operation) Block Blobs to my Azure storage account, and it works well for different size blobs when using HTTP. However, use of SSL (i.e. PUT using HTTPS) fails for Blobs greater than 5.5MB. Blobs less than 5.5MB upload correctly. Anything greater and I find that the TCP session (as seen by Wireshark) reports a dwindling window size that goes to 0 once the aforementioned number of bytes are transferred. The failure is repeatable and consistent. As a point of reference, PUT operations against my Google/AWS/HP cloud storage accounts work fine when using HTTPS for various object sizes, which suggests the problem is not my client but specific to the HTTPS implementation on the MSAZURE storage servers.
If I upload the 5.5MB blob as two separate uploads of 4MB and 1.5MB followed by a PUT Block List, the operation succeeds as long as the two uploads used separate HTTPS sessions. Notice the emphasis on separate. That same operation fails if I attempt to maintain an HTTPS session across both uploads.
Any ideas on why I might be seeing this odd behavior with MS Azure? Same PUT operation with HTTPS works ok with AWS/Google/HP cloud storage servers.

Thank you for reporting this and we apologize for the inconvenience. We have managed to recreate the issue and have filed a bug. Unfortunately we cannot share a timeline for the fix at this time, but we will respond to this forum when the fix has been deployed. In the meantime, a plausible workaround (and a recommended best practice) is to break large uploads into smaller chunks (using the Put Block and Put Block List APIs), thus enabling the client to parallelize the upload.

This bug has now been fixed and the operation should now complete as expected.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string