I created snowpipe in snowflake.
Now When I executed the copy command the data got loaded into the table. But I want to know which file got loaded (file Name).
Can any one help me is there any view or table to find out the processed file through snowpipe.
Related
I have created a ADF pipeline to copy around 18 files from FTP location into Azure Blob container. Initially, I have used Get Metadata Activity to get all the files from the FTP location. Then, I have ForEach activity to loop through all the files. Inside ForEach Activity, I have Copy Data Activity which copies from FTP location to Blob location.
While running the pipeline, some of the files are getting copied however, some of them are getting failed saying below error message -
"ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The remote server returned an error: (550) File unavailable (e.g., file not found, no access).,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (550) File unavailable (e.g., file not found, no access).,Source=System,'"
I am not sure what is wrong here, because other files get copied successfully, however, few of them are not. I had to try it multiple times, still no guarantee that all files would get copied.
When I try to see if the connection to FTP Linked service is working or not, it says it connects successfully. FTP linked service is SSL enabled and configured to get password from Azure Key Vault.
Refer below output when I ran pipeline -
Any thoughts as to what is going wrong in here? Is there any limit on number of files being copied at one time?
Thank you in advance.
As #Joel Cochran said the issue may be a concurrency limit issue.
When we select Sequential, Copy activity will be single-threaded. Uncheck it, Copy activity will be multi-threaded, efficiency is greatly improved.
So our solution is:
Uncheck Sequential
Increase the maximum number of parallel operations of internal activities.
I am trying to process multiple excel files in ADF to utilize them in a copy data activity to blob storage. Here is how my hierarchy is structured :
My source is an excel sheet coming from SFTP server (linked service).
File path: unnamed folder with multiple .xlsx files. Inside those files, the sheet name varies between sheet1 and table1.
I am trying to create get metadata to get all those files to pass them into a copy activity, but my metadata is never succeeding
Attached below is an elaboration about the problem:
If you only want to copy all excel files from SFTP to Blob Storage, there is no need to use Get Metadata activity.
Please try like this:
1.create binary format dataset
2.choose Wildcard file path when copy data
3.sink to your Blob Storage.
We have an files partitioned in the datalake and are using Azure Synapse SQL Serverless pool to query them using external tables before visualising in Power BI.
Files are stored in the following partition format {source}/{year}/{month}/{filename}_{date}.parquet
We then have an external table that loads all files for that source.
For all files that increment each day this is working great as we want all files to be included. However we have some integrations that we want to return only the latest file. (i.e. the latest file sent to us is the current state that we want to load into Power BI).
Is it possible in the external table statement to only return the latest file? Or do we have to add extra logic?
We could load all the files in, and then filter for the latest filename and save that in a new location. Alternatively we could try to create an external table that changes every day.
Is there a better way to approach this?
If you are using dedicated pools then I would alter the location of your table with the latest files folder.
Load every day into a new folder and then alter the LOCATION of the external table to look at the current/latest day, but you might need to add additional logic to track in a control table what the latest successful load date is.
Unfortunately I have not found a better way to do this myself.
I am trying to implement the following flow in an Azure Data Factory pipeline:
Copy files from an SFTP to a local folder.
Create a comma separated file in the local folder with the list of files and their
sizes.
The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.
The files are being copied, but in the output of this step, I don't see any file information.
I also don't see an option to create a file using data from a previous step.
Maybe I'm using the wrong technology?
One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)
Is there a way to implement step 2?
Thanks for any insight!
There is no built-in feature to achieve this.
You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.
The structure should be like this:
You can get the size of the files and save them to the csv file:
Get size of files(python):
How to fetch sizes of all SFTP files in a directory through Paramiko
And use pandas to save the messages as csv(python):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Writing a pandas DataFrame to CSV file
Simple http trigger of azure function(python):
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python
(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)
I have a work with azure etc,
I build a job in ssis that contains a for each for ADLS (Azure data lake store), it produces the path for some files. i make a data flow task inside the for each and add a ADLS Source with an expression for ADLS file path is dynamic by a variable that my for each produce.
When I run it, it always produces an error :
but when I write the variable that for each product to the file path it runs correctly (which is just 1 source, not all sources from for each)
Did anyone get an idea?
ADLS source does not support files at root folder.
If such a file is enumerated as passed to ADLS source, you get the error.