Set expiry limit for blob - azure

I'm using Azure-Storage for storing information like a cache mechanism. So, for given input I'm doing the job for first time and after that I'll save the result in the cache for further use. When I'll need to solve the problem with same given input, I'll get it directly the already ready solution from storage. This all is implemented.
I'm trying to add a expiry limit for file in my cache. Each result will be stored for maximum 30 days. After that, they will be automatically deleted.
The naive solution is to implement also an background worker that will run one time per day and will run over all files and delete them, according to their creation time.
There are better solution?

We don't currently have automatic expiration for blob storage. To your point you could use something like WebJobs to run a background task to delete files. If you have a large number of files that you create each day a simpler approach could just be to create a new container each day and store the blobs created each day in that container - then each day you just delete the container that is 31 days old. You could also do something similar with Tables whereby you create a new Table each day - and then delete the Table that is 31 days old.

I landed here from Google, so I'm leaving this for future readers. Microsoft currently has Blob Storage lifecycle management in public preview.

Related

Azure Geo-replication a good fit to reduce downtime during updates

We currently have an Azure SQL database that supports a web application. It is read only from a user perspective. The problem is we have to drop the tables and reload them with new data each hour, this makes the application unavailable for 5 minutes each hour which is unacceptable.
Reading the information for Azure active geo-replication seems a little vague. I thought I might be able to use active geo-replication to fail over to a secondary database and take the primary offline to do the update and then when the update is complete switch back to the original primary to allow the secondary to auto-sync. However, it is not clear from what I have read about active geo-replication if that scenario is possible.
Could someone provide some help on this problem or direct me toward another possible solution.
Thanks
You can give it a try to Azure Data Factory since it allows you to append data to a destination table or invoke a stored procedure with custom logic during copy when SQL Server/SQL Azure is used as a "sink". You can learn more here.
Azure Data Factory allows you to incrementally load data (delta) after an initial full data load by using a watermark column that has the last updated time stamp or an incrementing key. The delta loading solution loads the changed data between an old watermark and a new watermark. You can learn more how to do that with Azure Data Factory on this article.
If the setup takes 5 mins, and the the data is read only, then I would make a new empty DB in every hour, using some C# code or PowerShell script, and run the data import on this new empty DB, and once it is done, I'd change the connection string in the running production application to point to the new updated version of the DB, and then drop the old DB. This way you won't have any downtime, because till the import is running, the application will connect to the old DB, and when the import is done, it will connect to the new DB.

AWS RDS retention period is more than specified in instance settings

I have an AWS RDS instance with PostgreSQL. During instance creation, I had specified the maximum retention period of automated backup to be 7. But I can see automated backups of last 9 days in the snapshots section.
Does anyone have an idea what's going on here ?
This is normal in RDS, you can see 1 or 2 more automated snapshots as compared to your max retention period. You can refer RDS docs for this. This is also provided in FAQ's https://aws.amazon.com/rds/faqs
For quick look please refer the FAQ screen shot ==>
More no. of instances than retention period
This is quite normal. Please find below snippet from RDS FAQ
Q: Why do I have more automated DB snapshots than the number of days in the retention period for my DB instance?
It is normal to have 1 or 2 more automated DB snapshots than the number of days in your retention period. One extra automated snapshot is retained to ensure the ability to perform a point in time restore to any time during the retention period. For example, if your backup window is set to 1 day, you will require 2 automated snapshots to support restores to any within previous 24 hours. You may also see an additional automated snapshot as a new automated snapshot is always created before the oldest automated snapshot is deleted.

Are there utilities to help automate backing up Azure SQL databases to Azure Storage?

I know SQL-Azure has automated backups that are retained for 30 days, but for archival purposes I also need to take and retain other backups: daily (last 60 days), weekly (first day of each week for the last 8 weeks), monthly (first day of each month for the last 12 months). At the end of a period, the last backup gets deleted (except for monthly). Any daily backups older than 60 days gets deleted, etc. The monthly backups would get moved to cold storage where they are saved for years.
I should note that my databases are only in the 2-4 GB range, so the cost savings of using Azure's cold storage may be so minimal that it's not even worth bothering moving the monthly backups to cold storage.
I was thinking blob storage is probably the way to go. Are there utilities, scripts, etc that do this? I don't want to reinvent the wheel. I see Azure has a scheduling service which would be nice to use because the free version would more than suffice, but I don't want to overcomplicate things. If I need to run a cheap VM just for backups I will.
There is Cherry Safe
https://www.cherrysafe.com/Home/Features#sqlAzureBackup
This tool is flexible and not very expensive.
This article may also help you
http://fabriccontroller.net/backup-and-restore-your-sql-azure-database-using-powershell/
We have a private preview of long term backup retention feature in Azure SQL. If you are interested to join please email to sashan at microsoft.com for details.

Tasks that need to be performed on a certain date in Azure

I am developing an application using Azure Cloud Service and web api. I would like to allow users that create a consultation session the ability to change the price of that session, however I would like to allow all users 30 days to leave the session before the new price affects the price for all members currently signed up for the session. My first thought is to use queue storage and set the visibility timeout for the 30 day time limit, but this seems like this could grow the queue really fast over time, especially if the message should not run for 30 days; not to mention the ordering issues. I am looking at the task scheduler as well but the session pricing changes are not a recurring concept but more random. Is the queue idea a good approach or is there a better and more efficient way to accomplish this?
The stuff you are trying to do should be done with a relational database. You can use timestamps to record when prices for session changed. I wouldn't use a queue at all for this. A queue is more for passing messages in a distributed system. Your problem is just about tracking what prices changed on what sessions and when. That data should be modeled in a database.
I think this scenario is more suitable to use Azure Scheduler. Programatically create a Job with one time recurrence with set date as 30 days later to run once. Once this job gets triggered automatically by scheduler, assign an action to callback to one of your API/Service to do the price & other required updates and also remove this Job from the scheduler as part of this action to have a clean jobs list. Anyways premium plan of Azure Scheduler Job Collection will give you unlimited number of jobs to run.
Hope this is exactly what you were looking for...
I would consider using Azure WebJobs. A WebJob basically gives you the ability to run a .NET console application within the context of an Azure Web App. It can be run on demand, continuously, or in response to a reoccurring schedule. If your processing requirements are low and allow for it they can also run in the same process that your Web App is running in to save you $$$ as they are free that way.
You could schedule the WebJob to run once or twice per day and examine the situation and react as is appropriate. Since it's really just a .NET worker role you have ultimate flexibility.

Backup in SQL Azure very slow

I'm currently working on an SQL Database backup strategy in advance of porting our application to Azure. Currently we are using a SQL Server maintenance task to run a backup of our on-premise database once every 15 minutes with a 1 hour retention (thus retaining 4 local copies). We also run a 24 hour backup which gets pushed into Amazon S3.
Now in Azure, I've so far managed to institute a backup of the primary database (to another sql server instance) using the following T-SQL:
CREATE DATABASE targetserver.backupName AS COPY OF sourceserver.sourceName
The source database is approximately 3GB in size and is expanding around 5-10% per month. The problem I'm having is that the copy process is painfully slow! I initiated a copy over 30 minutes ago and it's still running! This means that adopting a 15 minute backup schedule seems untenable in Azure.
So I'm wondering if I can qualify a few things with other users:
Is it normal for a 3GB backup to take over 30 minutes (and counting) to replicate to another server instance?
Should I keep the backups on the same server as the source? I'm very nervous as a few clicks in the Azure portal could wipe out a lot of critical data! I know this is a 'black swan' event but I just wouldn't feel easy having everything running in a single server instance.
Is there a quicker way to backup an SQL Azure Database? I've taken a look at the Red-Gate but it seems expensive to do sub daily incremental backups.
Any thoughts on this would be much appreciated!
I should add that I am happy to rethink my backup strategy entirely to be more Azure friendly. The key thing is mitigation against administrator error, e.g. dropping a load of important data due to a clumsy statement (the shorter the backup intervals the better) and a 24 hour backup pushed into a different storage method, e.g. blob container.
UPDATE ------
I cancelled the initial backup request after waiting 1 hour and re-initiated. The second backup completed in 5 minutes. I've now gone back to Red-Gate to take a look at their hosted backup solution.
How long copy database takes to run depends not only on the size of the data, but also how many transactions are being run on it at the time, so this option may not be tenable in your situation. Now that you have a backup DB you can test this for yourself by making a backup of your backup and see how long that takes.
Your other option is to export a .bacpac file and store it in blob storage. There are libraries for this but I don't have the reference to hand. This will also be a much cheaper option. I'm pretty sure this is what Red Gate are doing under the covers of their service.

Resources