Azure DataFactory from Blob storage to SFTP server - azure

After hunting through the net I can find lots of examples of retrieving data from SFTP but none to send from Blob storage to SFTP.
Basically I attempted to do this using a Logic App but Azure only supports files less than 50MB (which is really dumb).
All the Azure docs I have read reference pulling but not pushing.
https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-sftp-connector
etc etc..
Maybe someone with better googling skills can help me find the docs to help me out.
I'm using DataFactory V1.0 not 2.0 cheers

Always check this table to see if a data store is supported as source or sink in a data movement activity.
In this case, SFTP is supported as source but not as sink, this means its possible to extract data from it but not store data on it.
Hope this helped!

Related

How to zip my file in azure fileshare using FileShareClient?

I need to zip my file in fileshare. I have gone through few process and they all suggested methods for blob. Any link or advice that would be helpful for me to proceed?
I can't use Azure Data Factory because of cost issue and I have already gone through these links: link1 and link2. In these link they have used blockblobclient.downloadto method which is not present in fileshareclient
Using Azure Data Factory, it supports Azure Blob Storage (see link below).
This is a better approach, but as you mention, there is a tutorial that can help you with zipping files and storing them to Azure Storage.
Tutorial: https://josef.codes/azure-storage-zip-multiple-files-using-azure-functions/
Supported formats: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#supported-data-stores-and-formats
Using Data Factory: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-copy-data-portal

Best way to index data in Azure Blob Storage?

I plan on using Azure Blob storage to store images. I will have around 5000 categories for images that I plan on using folders to keep separated. For each of the image files, the file names won't differ a lot across the board and there is the potential to need to change metadata frequently.
My original plan was to use a SQL database to index all of these files and store my metadata there, but I'm second guessing that plan.
Is it feasible to index files in Azure Blob storage using a database, or should I just stick with using blob metadata?
Edit: I guess this question should really be "are there any downsides to indexing Azure Blob storage using a relational database?". I'm much more comfortable working with a DB than I am Azure storage, so my preference is to use a DB.
I'm second guessing whether or not to use a DB after looking at Azure storage more and discovering meta-tags and indexing. Hope this helps.
You can use Azure Search for this task as well, store images in Azure Storage (BLOB) and use Azure Search for crawling. indexing and searching. Using metadata you can enhance your search as well. This way you might not even need to use Folders to separate different categories.
Blob Index is a very feasible option and it can save the in the pricing, time, and overhead in terms of not using SQL.
https://azure.microsoft.com/en-gb/blog/manage-and-find-data-with-blob-index-for-azure-storage-now-in-preview/
If you are looking for more information on this preview feature, I would love hear more and work closer on this issue. Could you please reach me on BlobIndexPreview#microsoft.com.

Lost SAS URI - is it recoverable?

We hired a guy to push a bunch of PST's to Azure (to be put into mailboxes) and he disappeared. We know he used a SAS URI and we know he did push the data up. We looked in storage explorer and dont see the data in any of our storage accounts. The guy deleted the original PST's so we cant just push the data back up.
as far as we know, he was using this guide https://learn.microsoft.com/en-us/microsoft-365/compliance/use-network-upload-to-import-pst-files?view=o365-worldwide
can we find the SAS URI he used somewhere in azure?
can we explore this data somehow?
Any help is appreciated, thanks so much.
can we find the SAS URI he used somewhere in azure?
Sadly no. Azure does not store the SAS URI anywhere.
can we explore this data somehow?
You would need to know the storage account where the files were uploaded. Without that you will not be able to explore this data.

How to transfer csv files from Google Cloud Storage to Azure Datalake Store

I'd like to have our daily csv log files transferred from GCS to Azure Datalake Store, but I can't really figure out what would be the easiest way for it.
Is there a built-in solution for that?
Can I do that with Data Factory?
I'd rather avoid running a VM scheduled to do this with the apis. The idea comes from the GCS->(DataFlow->)BigQuery solution.
Thanks for any ideas!
Yes, you can move data from Google Cloud Storage to Azure Data lake Store using Azure Data Factory by developing custom copy activity. However, in this activity, you will be using APIs for transferring that data. See details on this article.

Azure blob storage and stream analytics

I read what in azure blob very nice save some data for statistics or something else, after it create requests for blob and show statistics to website (dashboard).
But I don't know how to use stream analytics for showing statistics. It is some SDK for create query to blob and generate josn data. Or ... I don't know.
And I have more question about it:
How to save data to blob (it is json data or something else). I don't
know format data for it issue.
How to use stream analytics for create request to blob and after it get data for showing in dashboard.
And maybe you know how to use this technology. Help me please. Thanks, and have a nice day.
#Taras - did you get a chance to toy with the Stream Analytics UI?
When you add a blob input you can either add an entire container - which means Stream Analytics will scan the entire container for new files or you can specify a path prefix pattern which will make Stream Analytics look in only that path.
You can also specify tokens such as {date}, {time} on the path prefix pattern to help guide Stream Analytics on the files to read.
Generally speaking - it is highly recommended to use Event Hub as input for the improved latency.
As for output - you can either use Power BI which would give you an interactive dashboard or you can output to some storage (blob, table, SQL, etc...) and build a dashboard on top of that.
You can also try to do one of the walkthroughs to get a feel for Stream Analytics: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-twitter-sentiment-analysis-trends/
Thanks!
Ziv.

Resources