ADF - Get list of files from the public HTTP

ADF - Get list of files from the public HTTP - azure

I currently have a solution where I am downloading zip files from a public HTTP. I have a solution to dynamically pass file names, but I'm wondering if there's a solution to get the list of files from the HTTP instead. So instead of passing a specific name, I'd like pull the list of file names, compare to existing files in BLOB, and proceed with downloading new files. Is that feasible to achieve in ADF?
Below are the URL paths:
https://download.cms.gov/nppes/NPPES_Data_Dissemination_September_2022.zip
https://download.cms.gov/nppes/NPPES_Deactivated_NPI_Report_091322.zip
https://download.cms.gov/nppes/NPPES_Data_Dissemination_090522_091122_Weekly.zip
Attempted Solutions:
tried passing a wildcard in the URL: https://download.gov/data/NPPES_*
tried get metadata activity, but it does not support HTTP.
tried web active, but that's for API only.
Thanks!

Related

How to force browser to download public asset from GCP Storage Bucket url?

I have assets from Wordpress being uploaded to GCP Storage bucket. But when I then list all these links to these assets within the website im working on, I would like the user to automatically download the file instead of viewing it in the browser when the user clicks on the link.
Is there an "easy" way to implement this behaviour?
The project is running with Wordpress as headless API, and Next.js frontend.
Thanks

You can change object metadata for your objects in Cloud Storage to force browsers to download files directly, instead of previewing them. You can do this through the available content-disposition property. Setting this property to attachment will allow you to directly download the content.
I quickly tested downloading public objects with and without this property and can confirm the behavior, downloads do happen directly. The documentation explains how to quickly change the metadata for existing objects in your bucket. While it is not directly mentioned, you can use wildcards to apply metadata changes to multiple objects at the same time. For example this command will apply the content-disposition property in all objects of the bucket:
gsutil setmeta -h "content-disposition:attachment" gs://BUCKET_NAME/**

How to generate sitemap on user-generated content site in express js?

I'm creating a user-generated content site using expressjs. How can I add the URL of these generated content to the sitemap and get it done automatically?
It also needs to be removed from these URLs via the sitemap when the user deletes the account or deletes the content.
I tried the sitemap builder npm packages created for express js, but none of them worked as I wanted, or the intended use was not the same as my intended use.

I am unsure if I understood your question, so I assume the following:
Your users can generate new URLs that you want to publish in an sitemap.xml that is returned from a specific endpoint right?
If so I'd suggest to use the sitemap.js package. However this package still needs a list of URLs and the metadata you want to deliver.
You could just save the URLs and the metadata to a database table, the filesystem, or whatever data storage you use. Every time content is generated or deleted you also update your URLs list there.
Now, if someone accesses the sitemap endpoint, URLs are read from storage and sitemap.js generates an XML. Goal achieved!

Azure Storage - Exposing A Container With A File Structure

I'm trying to upload a folder structure, which contains an html file, which references css and javascript files within that folder structure, in a private azure blob storage container.
When the html file is browsed to, I'm expecting it to retrieve the css and javascript files, just like you'd see if it was an html file as part of a website.
I can make this work if the container is completely public, but I'm struggling to achieve the same results when the container is private and I supply a SAS token.
Suppose the container contains an html file called "main.html" and a css file called "css/mystyles.css" (main.html will have the link tag for the css file pointing to the following relative url "css/mystyles.css").
If I create a SAS token to the container (Let's just call it "mySAS" for simplicity), then navigate to the main.html file, appending the SAS token like so:
https://my-storage-account.blob.core.windows.net/container-name/main.html?mySAS
The main.html file will load correctly as the SAS token will be appended at the end, however the css file will not, so it will return a 404.
I think I already know the answer, but is it even technically possible to store and present an html file and all of it's associated files without putting it in a public container?
I should note that modifying the paths specified in the html file is not an option, as they're files I don't control, so I don't know what they'll look like ahead of time.
There's a few (Very messy and undesirable) hybrid solution where I place it in a private container, then make it public on-demand and after a time switch it back to a private container.
Or go for a more extreme hybrid solution, where I store it in a private container (Which is never exposed), then whenever it's requested, copy the contents to a short lived public container (Reducing the risk that someone might note down the public container in the first hybrid scenario, then access it again later when it's perhaps not intended to be available for them).
I'd really rather stick with a private container and SAS token solution if it's at all possible.

If your question is about reusing a SAS token: You can't. You have to generate a URL+SAS combination, and the SAS is a hash based on the URL. You cannot just create a SAS token once and then append it to a URL. Otherwise, there's nothing stopping someone from guessing URLs and appending one URL's SAS to another URL.
If the goal is to have a static site, with no backend logic, then you need to have your content public (or you have to pre-generate all of your URLs to have SAS properly appended).
If you want dynamic access (e.g. you have no idea what content will be served up), you'd need to have some type of backend app server which serves content (e.g. returns a new html page with appropriate SAS-based links embedded in the various tags).

Avoid over-writing blobs AZURE

if i upload a file on azure blob in the same container where the file is existing already, it is over-writing the file, how to avoid overwriting the same? below i am mentioning the scenario...
step1 - upload file "abc.jpg" on azure in container called say "filecontainer"
step2 - once it gets uploaded, try uploading some different file with the same name to the same container
Output - it will overwrite existing file with the latest uploaded
My Requirement - i want to avoid this overwrite, as different people may upload files having same name to my container.
Please help
P.S.
-i do not want to create different containers for different users
-i am using REST API with Java

Windows Azure Blob Storage supports conditional headers using which you can prevent overwriting of blobs. You can read more about conditional headers here: http://msdn.microsoft.com/en-us/library/windowsazure/dd179371.aspx.
Since you want that a blob should not be overwritten, you would need to specify If-None-Match conditional header and set it's value to *. This would cause the upload operation to fail with Precondition Failed (412) error.
Other idea would be to check for blob's existence just before uploading (by fetching it's properties) however I would not recommend this approach as it may lead to some concurrency issues.

You have no control over the name your users upload their files with. You, however, have control over the name you store those files with. The standard way is to generate a Guid and name each file accordingly. The chances of conflict is almost zero.
A simple pseudocode looks like this:
//generate a Guid and rename the file the user uploaded with the generated Guid
//store the name of the file in a dbase or what-have-you with the Guid
//upload the file to the blob storage using the name you generated above
Hope that helps.

Let me put it that way:
step one - user X uploads file "abc1.jpg" and you save it io a local folder XYZ
step two - user Y uploads another file with same name "abc1.jpg", and now you save it again in a local folder XYZ
What do you do now?
With this I am illustrating that your question does not relate to Azure in any way!
Just do not rely on original file names when saving files. Where-ever you are saving them. Generate random names (GUIDs for example) and "attach" the original name as meta-data.

Using GitHub API to retrieve all versions of a specific file

I am currently trying to read through the (GitHub API)[http://developer.github.com/v3/repos/contents/] to figure out how I can programmatically retrieve all versions of a specific file in a single repository.
I see that one can get the list of commits, and the current version of a single file easily. But, is there a way to list all the commits relevant for a specific file and then iterate through all the versions of that file?

To get the list of commits relevant for a specific file, use this API endpoint and specify the path parameter:
GET https://api.github.com/repos/:owner/:repo/commits?path=FILE_PATH
You'll get back an array of commit objects, each of which has a sha attribute.
Now that you have all the commit SHAs, you can fetch all the different versions of the file using this API endpoint and by specifying the ref query parameter to set the SHA. So, for each commit SHA, make a request to:
GET https://api.github.com/repos/:owner/:repo/contents/:FILE_PATH?ref=SHA
and read the content attribute. Notice that the content is Base64 encoded, but you can also request a raw version by setting the relevant Accept HTTP header.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ADF - Get list of files from the public HTTP - azure

Related

How to force browser to download public asset from GCP Storage Bucket url?

How to generate sitemap on user-generated content site in express js?

Azure Storage - Exposing A Container With A File Structure

Avoid over-writing blobs AZURE

Using GitHub API to retrieve all versions of a specific file

Categories

Resources