So my problem is quite stupid but I cannot find a way to resolve it. I have one 15 GB file on external SFTP server that I need to copy to my data lake. The thing is that column delimiter is a comma and I have some nested lists as well. So when I am trying to use ADF copy activity, the result looks like that:
And most of my data is gone(as nested structures get cut on the first occurence of comma). So maybe I could ignore delimiter. I have tried to set pipe as a delimiter just to get this whole dataset as one column but this doesnt work either.
Powershell? I have tried different scripts that used to work with smaller files and I am getting an error every time.
I have even tried to upload it manually via Azure Storage Explorer but it fails as well after some time. I am not really sure how to make it work at this point.
Thank you for any advice!
Related
I am entirely new to Azure, so if this is easy please just tell me to RTFM, but I'm not used to the terminology yet so I'm struggling.
I've created a data factory and pipeline to copy data, using a simple query, from my source data. The target data is a .txt file in my blob storage container. This part is all working quite well.
Now, what I'm attempting to do is to store each row that's returned from my query into an individual file in blob storage. This is where I'm getting stuck, and I'm not sure where to look. This seems like something that'll be pretty easy, but as I said I'm new to Azure and so far am not sure where to look.
You can type 1 in the Max rows per file of the Sink setting and don't set the file name in the dataset of sink. If you need, you can specify the file name prefix in the File name prefix setting.
Screenshots:
The dataset of sink
Sink setting in the copy data activity
Result:
Let's say that I have a blob container that have the following files with names
2cfe4d2c-2703-4b0f-bed0-6b239f238206
1a18c31f-bf28-4f64-a796-79237fabc66a
20a300dd-c032-405f-b67d-9c077623c26c
6f4041dd-92da-484a-966d-d5a168a9a2ec
(Let's say there are around ~15000 files)
I want to delete around 1200 of them. I have a file with all the names I want to delete. In my case, I have it in JSON but it does not really matter in what kinda format it is; I know the files I wanna delete.
What is the most efficient/safe way to delete these items?
I can think of a few ways. For example, using az storage blob deletebatch or az storage blob delete . I am sure that the former is more efficient but I would not know how to do this because there is not really a pattern, just a big list of guids (names) that I want to delete.
I guess I would have to write some code to iterate over my list of files to delete and then use the CLI or some azure storage library to delete them.
I'd prefer to just use some built-in tooling but I can't find a way to do this without having to write code to talk with the API.
What would be the best way to do this?
The tool azcopy would be perfect for that. Using the command azcopy remove you can specify a path to a text file using the parameter --list-of-files={path} to filter on specific files. The file should be plain text and line delimited.
I have a set of CSV files stored in Azure Blob Storage. I am reading the files into a database table using the Copy Data task. The Source is set as the folder where the files reside, so it's grabbing it's file and loading it into the database. The issue is that I can't seem to map the file name in order to read it into a column. I'm sure there are more complicated ways to do it, for instance first reading the metadata and then read the files using a loop, but surely the file metadata should be available to use while traversing through the files?
Thanks
This is not possible in a regular copy activity. Mapping Data Flows has this possibility, it's still in preview, but maybe it can help you out. If you check the documentation, you find an option to specify a column to store file name.
It looks like this:
I am trying to using Microsoft Flow to export a Sharepoint List to Azure Data Lake.
I want it so that anytime a particular online list is changed, its entire contents are loaded into a file in Data Lake. If the file already exists, I want to overwrite it. Can someone please explain how I can go about doing this, I have tried multiple ways, but they are not getting the job done.
Thanks
I was able to get the items in the SharePoint list to near perfection. I will post the Flow here in case anyone in the future needs it.
So what I did is that every 5 minutes I "create" a file in Azure Data Lake which overwrites the file if it exists. The content of the files cannot be blank, so I added a newline to the content. Then I use Get Items to retrieve all the items in the SharePoint List. From there, using an Apply to each loop, I append the content of the current row of the Sharepoint list to the Data Lake file (separated by | and ending with a new line after all the content is added). This works to near perfection, with the only caveat being the newline at the beginning of the file, which I eliminate using PowerQuery.
This is exactly what I needed. If anybody sees a way to make this better, please post so that we can get this to perfection.
I've set up two different tables with two connection strings. Now I want to move data from one to the other but I'm not sure where to start. Has anyone else coded up some solution for this. If so I would really appreciate some tips/advice on where to start. My tablestorage tables are small. Some are around 500-2000 rows. I would just like to make sure that data does not get lost.
Please note it is tablestorage that I am using and NOT Sql Azure
Thanks if you have some ideas.
If you're dealing with only a few thousand rows then this is easy.
Download TableXplorer here: http://clumsyleaf.com/products/tablexplorer
There is an option to export all data from a table to an XML or CSV file. Once you have your file then use the import option with the file and import it into the other table.