I've successfully copied an Azure Storage Table from Azure to the local Emulator using AZCopy. However when looking at the local table, there is two columns that are named "Timestamp" and "TIMESTAMP". The latter contains the original timestamp, while the first is the timestamp when the row is being inserted.
I cant figure out if it's possible to keep the original timestamp with Azcopy or not? The "Timestamp" column i get is quite useless.
I assume you made two following steps to copy an Azure Storage Table to local Emulator by AzCopy:
Exporting the Azure Storage Table to local files or blobs;
Importing from the local files or blobs to local Emulator table.
Please correct me if my assumption is wrong.
About the column "TIMESTAMP", does your original Azure Storage Table contain this column? If not, it may be an unexpected behavior to us since AzCopy shouldn't introduce any additional columns ("TIMESTAMP" here) after exporting and importing. If this is the case, please do share us more information so that we can verify whether it's a bug in AzCopy.
Regarding your question "if it's possible to keep the original timestamp with Azcopy or not", the answer is NO. Timestamp is a property maintained by Azure Storage Table service, users are not able to customize its value.
Related
I have a json file stored in Azure Blob Storage and I have loaded it into Azure SQL DB using Data Factory.
Now I would like to find a way in order to load only new records from the file to my database (as the file is being updated every week or so). Is there a way to do it?
Thanks!
You can use the upsert ( slowly changing dimension type 1 ) that is already implemented in Azure Data Factory.
It will add new record and update old record that changed.
Here a quick tutorial :
https://www.youtube.com/watch?v=MzHWZ5_KMYo
I would suggest you to use Dataflow activity.
In Dataflow Activity, you have the option of alter row as shown in below image.
In Alter row you can use Upsert if condition.
Here mention condition as 1 == 1
We have an existing Azure Data Factory pipeline that takes data from Azure Table Storage table and copies the data to Azure SQL table - which is working without issue.
The problem exists when we added a new data element to the table storage (since it is NoSQL). When I go into the ADF Source of Pipeline and refresh the table storage, the new data is not available to map properly. Is there something I am missing to get this new data element (column) to show up. I know this is working correctly since I can see this column in Azure Storage Explorer.
Congratulations that you found the answer:
"I located the answer with additional research. See article: https://stackoverflow.com/questions/44123539/azure-data-factory-changing-azure-table-schema"
This can be beneficial to other community members.
We are in a process of migrating our manually managed production environment to Terraform and in the process would be creating all the resources required for the environment anew. One of such resource is storage account.
We have a storage account that has close to 1500+ tables and each table consisting of millions of records with a timestamp attached to each of these records. During the migration we are mostly interested in copying the records for the past 30 days.
I was wondering if there's a tool that could help us perform this copy operation most effectively and which is less time consuming.
We looked into Azcopy but it only allows us to do one to one copy and copying billions of records might take us days and from what I learnt online Azcopy doesn't support queries to only copy days from a certain timestamp.
Would be helpful to get some insights on different tools and techniques we could adapt to accomplish this.
As far as I know, there is no such the tools that can copy table storage from a specified timestamp. You should write your own logic to select from the specified timestamp, but that would cause bad performance.
Here, I suggest you can use a tool named EastFive.Azure.Storage.Backup. It supports copy azure blob / azure table storage to a new storage account. And for azure table storage, it supports copy array of specified partition_key but not supports specified timestamp.
If you're interesting about it, you can follow the simple steps as below:
1.Create a folder named "backup" in D drive, then download all the 4 projects mentioned in Prerequisites into D:\backup.
2.Unzip all the 4 projects, and open them one by one in visual studio -> in the manage nuget package in visual studio, updates all the old packages -> build them one by one, make sure each of them are built successfully.
3.Open the backup.json in EastFive.Azure.Storage.Backup project, fill in your sourceConnectionString and targetConnectionString.
If you don't want to copy blobs, just remove blobs.
For timeLocal field at the end, it means when to run the copy activity according to your local time.
4.You can install it as a service, and start the service to run the copy activity.
I test it at my side, and all of my azure table storage are copied to the new storage account, a screenshot as below:
I am storing a series of Excel files in an Azure File Storage container for my company. My manager wants to be able to see the file created date for these files, as we will be running monthly downloads of the same reports. Is there a way to automate a means of storing the created date as one of the properties in Azure, or adding a bit of custom metadata, perhaps? Thanks in advance.
You can certainly store the created date as part of custom metadata for the file. However, there are certain things you would need to be aware of:
Metadata is editable: Anybody with access to the storage account can edit the metadata. They can change the created date metadata value or even delete that information.
Querying is painful: Azure File Storage doesn't provide querying capability so if you want to query on file's created date, it is going to be a painful process. First you would need to list all files in a share and then fetch metadata for each file separately. Depending on the number of files and the level of nesting, it could be a complicated process.
There are some alternatives available to you:
Use Blob Storage
If you can use Blob Storage instead of File Storage, use that. Blob Storage has a system defined property for created date so you don't have to do anything special. However like File Storage, Blob Storage also has an issue with querying but it is comparatively less painful.
Use Table Storage/SQL Database For Reporting
For querying purposes, you can store the file's created date in either Azure Table Storage or SQL Database. The downside of this approach is that because it is a completely separate system, it would be your responsibility to keep the data in sync. For example, if a file is deleted, you will need to ensure that entry for the same in the database is also removed.
After uploading file into Azure blob storage I could see original file time stamp is lost.
Some how I need to preserve original file time stamp. Azure blob storage doesn't allow to programmatically update "Last Modified Date".
Please share if any one has come across this situation before.
Since Last Modified Date is a system defined property, you can't really preserve it. Any time a blob is updated, this value will change. One thing you could do is keep the original date/time when the blob was created as blob metadata entry if you're interested in finding out when a blob was created. This however is not fool proof as if the blob is re-uploaded, this value would either change or removed.
Another thing you could do is keep this information in a separate place (Table Storage for example).