File operation on S3 file without download - node.js

I want to know the File type using "file" command without downloading file from S3 in Nodejs.
so aim is to only hit the file , fire "File" command on it and retrieve it's output. How can I achieve this without downloading?

Why not use the NodeJS S3 SDK to get the file details. There is a method that returns the object metadata. You can check the object ContentType to know its file type. It is returned in the response json when you make a request for an object's metadata.
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#headObject-property

Related

how do I sniff file mime type nodejs

i am creating a api service using nodejs and express and to parse multipart request i am using multerjs. I have to receive a pdf file from rest request and it is coming as multipart/form-data and to verify file type i am checking the file.minetype
Now the issue is if someone changes the file extension like from mp4 -> pdf it also changes the mimetype to pdf and the file is accepted as pdf so how do i sniff the mimetype for content is their any way to do without any new packages. I really need to do it without using any packages as it'll take a really long to get a new package to approved

pd.read_parquet produces: OSError: Passed non-file path

I'm trying to retrieve a DataFrame with pd.read_parquet and I get the following error:
OSError: Passed non-file path: my/path
I access the .parquet file in GCS with a path with prefix gs://
For some unknown reason, the OSError shows the path without the prefix gs://
I did suspect it was related to credentials, however from my Mac I can use gsutil with no problem. I can access and download the parquet file via the Google Cloud Platform. I just can't read the .parquet file directly in the gs:// path. Any ideas why not?
Arrow does not currently natively support loading data from GCS. There is some progress in implementing this the tasks are tracked in JIRA.
As a workaround you should be able to use the fsspec GCS filesystem implementation to access objects (I would try installing it and using 'gcs://' instead 'gs://'), or opening the file directly with the gcsfs api and passing it through.

Can't create .zip file with Azure Logic Apps' SharePoint Create File action

I'm trying to create a .zip file by passing the returned body of an HTTP GET request to SharePoint's Create File.
Body is:
{
"$content-type": "application/zip",
"$content": "UEsDBBQACA...="
}
Shouldn't this just work? The docs only define the Create File field as "Content of the file." but that's not super informative...
I believe I've done this before with a file that was application/pdf and it worked. Unfortunately, I can't find that Logic App (I think it may have been an experiment I've since deleted).
I should note that the Create File action does create a valid .zip file, in that it's not corrupt, but archive is empty. It's supposed to contain a single .csv file.
I tried decoding the Base64 content and it's definitely binary data.
Any idea where I'm going wrong?
I test with Postman and when I use the form-data way to POST the request, I found the .zip file couldn't be open. Then I check the Logic App run history and I find the problem is if just use the triggerbody() as the file content it will fail.
This is because the triggerbody() not just have the $content, so I change the expression to triggerBody()['$multipart'][0]['body'] then it works and the .zip file is full.

Amazonka: How to generate S3:// uri from Network.AWS.S3.Types.Object?

I've been using turtle to call "aws s3 ls" and I'm trying to figure out how to replace that with amazonka.
Absolute s3 urls were central to how my program worked. I now know how to get objects and filter them, but I don't know how to convert an object to an S3 url to integrate with my existing program.
I came across the getFile function and tried downloading a file from s3.
Perhaps I had something wrong, but it didn't seem like just the S3 Bucket and S3 Object key were enough to download a given file. If I'm wrong about that I need to double check my configuration.

Setting Metadata in Google Cloud Storage (Export from BigQuery)

I am trying to update the metadata (programatically, from Python) of several CSV/JSON files that are exported from BigQuery. The application that exports the data is the same with the one modifying the files (thus using the same server certificate). The export goes all well, that is until I try to use the objects.patch() method to set the metadata I want. The problem is that I keep getting the following error:
apiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/storage/v1/b/<bucket>/<file>?alt=json returned "Forbidden">
Obviously, this has something to do with bucket or file permissions, but I can't manage to get around it. How come if the same certificate is being used in writing files and updating file metadata, i'm unable to update it? The bucket is created with the same certificate.
If that's the exact URL you're using, it's a URL problem: you're missing the /o/ between the bucket name and the object name.

Resources