How to create a blob in node (server side) from a stream, a file or a base64 string? - node.js

I am trying to create a blob from a pdf I am creating from pdfmake so that I can send it to a remote api that only handles blobs.
This is how I get my PDF file:
var docDefinition = { content: 'This is an sample PDF printed with pdfMake' };
pdfDoc.pipe(fs.createWriteStream('./pdfs/test.pdf'));
pdfDoc.end();
The above lines of code do produce a readable pdf.
Now how can I get a blob from there? I have tried many options (creating the blob from the stream with the blob-stream module, creating from the file with fs, creating it from a base64 string with b64toBlob) but all of them require at some point to use the constructor Blob for which I always get an error even if I require the module blob:
TypeError: Blob is not a constructor
After some research I found that it seems that the Blob constructor is only supported client-side.
All the npm packages that I have found and which seem to deal with this issue seem to only work client-side: blob-stream, blob, blob-util, b64toBlob, etc.
So, how can I create a blob server-side on Node?
I don't understand why almost nobody also needs to create a blob server-side? The only thread I could find on the subject is this one.
According to that thread, apparently:
The Solution to this problem is to create a function which can convert between Array Buffers and Node Buffers. :)
Unfortunately this does not help me much as I clearly seem to lack some important knowledge here to be able to comprehend this.

use node-blob npm package
const Blob = require('node-blob');
let myBlob = new Blob(["something"], { type: 'text/plain' });

Related

Trying to use HttpClient.GetStreamAsync straight to the adls FileClient.UploadAsync

I have an Azure Function that will call an external API via HttpClient. The external API returns a JSON response. I want to save the response directly to an ADLS File.
My simplistic code is:
public async Task UploadFileBulk(Stream contentToUpload)
{
await this._theClient.FileClient.UploadAsync(contentToUpload);
}
The this._theClient is a simple wrapper class around the various Azure Data Lake classes such as DataLakeServiceClient, DataLakeFileSystemClient, DataLakeDirectoryClient, DataLakeFileClient.
I'm happy this wrapper calls works as I expect, I spin one up, set the service, filesystem, directory and then a filename to create. I've used this wrapper class to create directories etc. so it works as I expect.
I am calling the above method as follows:
await dlw.UploadFileBulk(await this._httpClient.GetStreamAsync("<endpoint>"));
I see the file getting created in the Lake directory with the name I want, however if I then download the file using Sorage Explorer and then try to open it in say VS Code it's not in a recognisable format (I can "force" code to open it but it looks like binary format to me).
If I sniff the traffic with fiddler I can see the content from the external API is JSON, content-type is application/json and the body shows in fiddler as JSON.
If I look at the calls to the ADLS endpoint I can see a PUT call followed by two PATCH calls.
The first PATCH call looks like it is the one sending the content, it has a content-header of application/octet-stream and the request body is the "binary looking content".
I am using HttpClient.GetStreamAsync as I don't want my Function to have to load the entire API payload into memory (some of the external API endpoints return very large files over 100mb). I am thinking I can "stream the response from the external API straight into ADLS".
Is there a way to change how the ADLS FileClient.UploadAsync(Stream stream) method works so I can tell it to upload the file as a JSON file with a content type of application/json?
EDIT:
So turns out the External API was sendng back zipped content and so once I added the following extra AutomaticDecompression code to my functions startup I got the files uploaded to ADLS as expected.
public override void Configure(IFunctionsHostBuilder builder)
{
builder.Services.AddHttpClient("default", client =>
{
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
}).ConfigurePrimaryHttpMessageHandler(() => new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
});
}
#Gaurav Mantri has given me some pointers on if the pattern of "streaming from an output to an input" is actually correct, I will research this further.
Regarding the issue, please refer to the following code
var uploadOptions = new DataLakeFileUploadOptions();
uploadOptions.HttpHeaders = new PathHttpHeaders();
uploadOptions.HttpHeaders.ContentType ="application/json";
await fileClient.UploadAsync(stream, uploadOptions);

Does Azure Blob Storage supports partial content 206 by default?

I am using Azure blob storage to storage all my images and videos. I have implemented the upload and fetch functionality and it's working quite good. I am facing 1 issue while loading the videos, because when I use the url which is generated after uploading that video on Azure blob storage, it's downloading all the content first before rendering it to the user. So if the video size is 100 mb, it'll download all the 100 mb and till than user won't able to see the video.
I have done a lot of R&D and came to know that while rendering the video, I need to fetch the partial content (status 206) rather than fetching the whole video at a time. After adding the request header "Range:bytes-500", I tried to hit the blog url, but it's still downloading the whole content. So I have checked with some open source video URLs and tried to hit the video URL along with the "Range" request header and it was successfully giving 206 response status, which means it was properly giving me the partial content instead of the full video.
I read some forum and they are saying Azure storage supports the partial content concept and need to enable it from the properties. But I have checked all the options under the Azure storage account but didn't find anything to enable this functionality.
Can anyone please help me out to resolve this or if there's anything on Azure portal that I need to enable? It's something that I have been doing the R&D for this since a week now. Any help would be really appreciated.
Thank you! Stay safe.
Suppose the Accept-Ranges is not enabled, from this blog I got it needs to set the default version of the service.
Below is a sample code to implement it.
var credentials = new StorageCredentials("account name", "account key");
var account = new CloudStorageAccount(credentials, true);
var client = account.CreateCloudBlobClient();
var properties = client.GetServiceProperties();
properties.DefaultServiceVersion = "2019-07-07";
client.SetServiceProperties(properties);
Below is a return header comparison after setting the property.
Before:
After:
Assuming the video content is MPEG-4 the issue may be the media itself needs to have the moov atom position changed from the end of the file to the beginning. The browser won't render the video until it finds the moov atom in the file therefore you want to make sure the atom is at the start of the file which can be accomplished using ffmpeg with the "FastStart". Here's a good article with more detail : HERE
You just need to update your Azure Storage version. It will work automatically after the update.
Using Azure CLI
Just run:
az storage account blob-service-properties update --default-service-version 2021-08-06 -n yourStorageAccountName -g yourStorageResourceGroupName
List of avaliable versions:
https://learn.microsoft.com/en-us/rest/api/storageservices/previous-azure-storage-service-versions
To see your current version, open a file and inspect the x-ms-version header
following is the SDK I used to download the contents:
var container = new BlobContainerClient("UseDevelopmentStorage=true", "sample-container");
await container.CreateIfNotExistsAsync();
BlobClient blobClient = container.GetBlobClient(fileName);
Stream stream = new MemoryStream();
var result = await blobClient.DownloadToAsync(stream, cancellationToken: ct);
which DOES download the whole file right away! Unfortunately the solution provided in other answers seems to be referencing another SDK? So for the SDK that I use - the solution is to use the method OpenReadAsync:
long kBytesToReadAtOnce = 300;
long bytesToReadAtOnce = kBytesToReadAtOnce * 1024;
//int mbBytesToReadAtOnce = 1;
var result = await blobClient.OpenReadAsync(0, bufferSize: (int)bytesToReadAtOnce, cancellationToken: ct);
By default - it fetches 4mb of data, so you have to override the value to smaller amount if you want your app to have smaller memory footprint.
I think that internally the SDK sends the requests with the byte range already set. So all you have to do is enable the partial content support in Web API like this:
return new FileStreamResult(result, contentType)
{
EnableRangeProcessing = true,
};

How to Download a File (from URL) in Typescript

Update: This question used to ask about Google Cloud Storage, but I have since realized the issue actually is reproducable merely trying to save the download to local disk. Thus, I am rephrasing the question to be entirely about file downloads in Typescript and to no longer mention Google Cloud Storage.
When attempting to download and save a file in Typescript with WebRequests (though I experienced the same issue with requests and request-promises), all the code seems to execute correctly, but the resultant file is corrupted and cannot be viewed. For example, if I download an image, the file is not viewable in any applications.
// Seems to work correctly
const download = await WebRequest.get(imageUrl);
// `Buffer.from()` also takes an `encoding` parameter, but it's unclear how to determine the encoding of a download
const imageBuffer = Buffer.from(download.content);
// I *think* this line is straightforward
const imageByteArray = new Uint8Array(imageBuffer);
// Saves a corrupted file
const file = fs.writeFileSync("/path/to/file.png", imageByteArray);
I suspect the issue lies within the Buffer.from call not correctly interpreting the downloaded content, but I'm not sure how to do it right. Any help would be greatly appreciated.
Thanks so much!
From what I saw in the examples for web-request, download.content is just a string. If you want to upload a string to Cloud Storage using the node SDK, you can use File.save, passing that string directly.
Alternatively, you could use one the solutions seen here.

Read content from Azure blob storage in node API

I am new to azure and working on the storage account for one my application.Basically I have json files stored in azure blob storage.
I want to read the data from these files in Node JS application and do some filtering on the data, which is eventually secured REST end point to view data in the UI/Client as HTTP response.
I have gone through the docs about different operations on the blob storage which is exposed as NODE SDK, we can see find them in below link,
https://github.com/Azure/azure-storage-node
But the question I have is "How to read the json files". I see one method getBlobToStream. Is this going to give me json content in the callback, so that I can do further processing on the data and send as response to clients who requested.
Please some one explain how to do this in better way or is this the only option we have.
Thanks for the help.
To use getBlobToStream, you have to define a writable stream.
So I recommend you to use getBlobToText to avoid trouble.
If no error occurs, this method will get blob content into text in callback. You can then parse it to a JSON string. A simple example is as below.
blobService.getBlobToText(container, blobname, function(error, text){
if(error){
console.error(error);
res.status(500).send('Fail to download blob');
} else {
var data = JSON.parse(text);
res.status(200).send('Filtered Data you want to send back');
}
});

does azure blob storage use gzip across the wire

I want to know if there is a benefit to zipping files before sending them to Azure Blob Storage - strictly for transfer purposes. Put another way, will pre-zipping files make file transfers any faster when going to/from blob storage? Or does this automatically happen at the transport level by using gzip?
As of 12th August 2015 Azure blob storage (when mounted to the CDN) now supports automatic GZip compression.
Compression method - Supported compression methods are
gzip/deflate/bzip2, a supported method must be set in the
Accept-Encoding Request Header.
Improve performance by compressing files
UPDATE
I'm unsure of what and how I originally did this, but all I can think is that I was looking at the results incorrectly. Everything I can read about azure (from MSDN, to the code itself) is now telling me that Azure does not support gzip for transfer purposes. I do not know under what circumstances I was able to get the following results and am unable to reproduce them now. Needless to say, I'm very disappointed.
(THIS ANSWER IS INCORRECT, SEE THE UPDATE ABOVE) The answer is no, there is no benefit for transfer speed purposes to zip a file before sending to blob storage. By turning on Fiddler, you can see that the transport level automatically gzips content across the wire. Screenshots below confirm this:
Edit 1 - Quick Clarifications for Gaurav
The byte array that comes back in code has a length of 386803, but the network card only saw 23505 bytes go by, because it was gzipped by Azure in the response. I didn't have to do anything for that to happen.
Here is the code I'm using to initiate the request from Blob Storage
public Byte[] Read(string containerName, string filename)
{
CheckContainer(containerName);
Initialize();
// Retrieve reference to a previously created container.
CloudBlobContainer container = _blobClient.GetContainerReference(containerName);
// Retrieve reference to a blob named "photo1.jpg".
CloudBlockBlob blockBlob = container.GetBlockBlobReference(filename);
byte[] buffer;
// Save blob contents to a file.
using (var stream = new MemoryStream())
{
blockBlob.DownloadToStream(stream);
stream.Seek(0, SeekOrigin.Begin);
buffer = new byte[stream.Length];
stream.Read(buffer, 0, (int)stream.Length);
}
return buffer;
}

Resources