Can't change RDS Postgres major version from the AWS console? - amazon-rds

I have an RDS Postgres database, currently sitting at version 14.3.
I want to schedule a major version upgrade to 14.5 to happen during the maintenance window.
I want to do this manually via the console, because last time I did a major version of the Postgres version by changing the CDK definition, the deploy command applied the DB version change immediately, resulting in a few minutes downtime of the database (manifesting as connection errors in the application connecting to the database).
When I go into the AWS RDS console, do a "modify" action and select the "DB Engine Version" - it only shows one option, which is the current DB version: "14.3".
According to the RDS doco 14.4, 14.5. and 14.6 are all valid upgrade targets: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html#USER_UpgradeDBInstance.PostgreSQL.MajorVersion
Also, when I do aws rds --profile raido-prod describe-db-engine-versions --engine postgres --engine-version 14.3 it shows those versions in the ValidUpgradeTarget collection.
Using CDK version 2.63.0
Database CDK code:
// starting with 14.3 to test the manual upgrade process
const engineVersion14_3 = PostgresEngineVersion.VER_14_3;
const dbParameterGroup14_3 = new ParameterGroup(this, 'Postgres_14_3', {
description: "RaidoDB postgres " + engineVersion14_3.postgresFullVersion,
engine: DatabaseInstanceEngine.postgres({
version: engineVersion14_3,
}),
parameters: {
// none we need right now
},
});
/* Note that even after this stack has been deployed, this param group
will not be created, I guess it will only be created when it's attached
to an instance? */
const engineVersion14_5 = PostgresEngineVersion.VER_14_5;
// CDK strips out underbars from name, hoping periods will remain
const dbParameterGroup14_5 = new ParameterGroup(this, 'Postgres.14.5.', {
description: "RaidoDB postgres " + engineVersion14_3.postgresFullVersion,
engine: DatabaseInstanceEngine.postgres({
version: engineVersion14_5,
}),
parameters: {
// none we need right now
},
});
this.pgInstance = new DatabaseInstance(this, 'DbInstance', {
databaseName: this.dbName,
instanceIdentifier: this.dbName,
credentials: Credentials.fromSecret(
this.raidoDbSecret,
this.userName,
),
vpc: props.vpc,
vpcSubnets: {
subnetGroupName: props.subnetGroupName,
},
publiclyAccessible: false,
subnetGroup: dbSubnetGroup,
multiAz: false,
availabilityZone: undefined,
securityGroups: [props.securityGroup],
/* Should we size a bigger instance for prod?
Plan is to wait until its needed - there will be some downtime for
changing these. There's also the "auto" thing. */
allocatedStorage: 20,
instanceType: InstanceType.of(InstanceClass.T4G, InstanceSize.SMALL),
engine: DatabaseInstanceEngine.postgres({
version: engineVersion14_3,
}),
parameterGroup: dbParameterGroup14_3,
/* Not sure what this does, changing it to true didn't allow me to change
the version in the console. */
allowMajorVersionUpgrade: true,
/* 14.3.x -> 14.3.x+1 will happen automatically in maintenance window,
with potential downtime. */
autoMinorVersionUpgrade: true,
// longer in prod
backupRetention: Duration.days(30),
/* This enables DB termination protection.
When stack is destroyed, db will be detached from stack but not deleted.
*/
removalPolicy: RemovalPolicy.RETAIN,
// explain and document the threat model before changing this
storageEncrypted: false,
/* "Enhanced monitoring"."
I turned this on while trying to figure out how to change the DB version.
I still don't think we should have it enabled until we know how/when we'll
use it - because it costs money in CloudWatch Logging, Metric fees and
performance (execution and logging of the metrics from the DB server).
*/
monitoringInterval: Duration.minutes(1),
monitoringRole: props.monitoringRole,
/* Useful for identifying expensive queries and missing indexes.
Retention default of 7 days is fine. */
enablePerformanceInsights: true,
// UTC
preferredBackupWindow: '11:00-11:30',
preferredMaintenanceWindow: 'Sun:12:00-Sun:13:00',
});
So, the question: What do I need to do in order to be able to schedule the DB version upgrade in the maintenance window?

I made a lot of changes during the day trying to diagnose the issue before I posted this question, thinking I must be doing something wrong.
When I came back to work the next day, the modify screen DB Engine Version field contained the upgrade options I was originally expecting.
Below is my documentation of the issue (unfortunately, our CDK repo is not public):
Carried out by STO, CDK version was 2.63.0.
This page documents my attempt to manually schedule the DB version upgrade
using the console for application during the maintenance window.
We need to figure out how to do this since the DB upgrade process results in a
few minutes of downtime, so we'd prefer to avoid doing it in-hours.
It's preferred that we figure out how to schedule the upgrade - if the choice
comes down to asking team members to work outside or hours or accept
downtime - we will generally choose to have the downtime during business hours.
Note that that Postgres instance create time is:
Sat Jan 28 2023 10:34:32 GMT+1000
Action plan
in the RDS console
Modify DB instance: raido
DB engine version: change 14.3 to 14.5
save and select to "schedule upgrade for maintenance window" (as
opposed to "apply immediately")
Actions taken
2023-02-02
When I tried to just go in and change it manually in the AWS console, the only
option presented on the "modify" screen was 14.3 - there was nothing to
change the version to.
I tried creating a 14.5 param group in CDK, but it just ignored me, didn't
create the param group I'm guessing because it's expected to actually be
attached to a DB.
Tried copying and creating a new param group to change the version, but there's
no "version" param in the param group.
Tried manually create a db of version 14.5 "sto-manual-14-5", but after the
DB was reported successfully created (as 14.5) - "14.3" is still the only
option in the "modify" screen for raido-db.
Tried creating a t3.micro in case t4g ws the problem - no good.
Tried disabling minor version auto-upgrade - no good.
Note that 14.6 is the current most recent version, both manually created 14.3
and 14.5 databases presented no other versions to upgrade to - so this problem
doesn't seem to be related to the CDK.
List available upgrades: aws rds --profile raido-prod describe-db-engine-versions --engine postgres --engine-version 14.3
Shows 14.4, 14.5 and 14.6 as valid target versions.
This page also shows the the versions should be able to be upgraded to, as at
2023-02-02: https://docs.amazonaws.cn/en_us/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html#USER_UpgradeDBInstance.PostgreSQL.MajorVersion
After all this, I noticed that we had the instance declared as
allowMajorVersionUpgrade: false, so I changed that to true and deployed,
but still can't select any other versions.
Also tried aws rds --profile raido-prod describe-pending-maintenance-actions
but it showed no pending actions.
I found this SO answer talking about RDS problems with T3 instances (note, I
have previously upgraded major versions of T3 RDS postgres instances):
https://stackoverflow.com/a/69295017/924597
On the manually created t3.micro instance, I tried upgrading the instance size
to a standard large instance. Didn't work.
Found this SO answer talking about the problem being related to having
outstanding "recommended actions" entries:
https://stackoverflow.com/a/75236812/924597
We did have an entry talking about "enhanced monitoring".
Tried enabling "enhanced monitoring" via the CDK, because it was listed as
a "recommended action" on the RDS console.
After deploying the CDK stack, the console showed the enhanced monitoring was
enabled, but the "recommended action" to enable it was still listed.
At this point, the console still showed 14.3 as the only option in the list on
the modify screen.
Posted to StackOverflow: Can't change RDS Postgres major version from the AWS console?
Posted to AWS repost: https://repost.aws/questions/QU4zuJeb9OShGfVISX6_Kx4w/
Stopped work for the day.
2023-02-03
In the morning, the RDS console no longer shows the "recommended action" to
enabled enhanced monitoring.
The modify screen now shows "14.3, 14.4, 14.5 and 14.6" as options for the
DB Engine Verison (as expected and orginally desired).
Given the number of changes I tried above, I'm not sure what, if any of them
may have caused the console to start displaying the correct options.
It may have been a temporary issue with RDS, or AWS support may have seen my
question on the AWS repost forum and done something to the account.
Note that I did not raise a formal support request via the AWS console.
I wanted to try and confirm if the enhanced monitoring was the cause of the
issue, so I changed the CDK back (there is no "enhance monitoring" flag, I
just commented out the code that set the monitoring role and interval).
After deploying the CDK stack, there was no change to the RDS instance -
enhanced monitoring was still enabled.
I did a manual modify via the RDS console to disable enhanced monitoring.
The change did apply and was visible in the consle, but the "recommended
actions" list did not have any issues.
At this point I had to attend a bunch of meetings, lunch, etc.
When I came back after lunch, the "recommended actions" list now shows an
"enhanced monitoring" entry.
But the modify console page still shows the 14.3 - 14.6 DB engine options, so
I don't think "enhanced monitoring" was the cause of the problem.
I scheduled the major version upgrade (14.3 -> 14.5, because 14.6 is not yet
supported by the CDK) for the next maintenance window.
Analysis
My guess is that the issue was caused by having allowMajorVersionUpgrade set
to false. I think changing it to true is what caused the other
version options to eventually show up on the modify page. I think the
reason the options didn't show up on the modify page after depoloying the
CDK change is down to waiting for an eventual consistency conflict to converge.

Related

Could not get HttpClient cache - No ThreadContext available for thread id=1

I'm working on upgrading our service to use 3.63.0 (upgrading from 3.57.0) and I've noticed the following warning (with stack trace) shows up in the logs that wasn't there on the previous version:
2022-02-18 14:03:41.038 WARN 1088 --- [ main] c.s.c.s.c.c.AbstractHttpClientCache : Could not get HttpClient cache.
com.sap.cloud.sdk.cloudplatform.thread.exception.ThreadContextAccessException: No ThreadContext available for thread id=1.
at com.sap.cloud.sdk.cloudplatform.thread.ThreadLocalThreadContextFacade.lambda$tryGetCurrentContext$0(ThreadLocalThreadContextFacade.java:39) ~[cloudplatform-core-3.63.0.jar:na]
at io.vavr.Value.toTry(Value.java:1414) ~[vavr-0.10.4.jar:na]
at com.sap.cloud.sdk.cloudplatform.thread.ThreadLocalThreadContextFacade.tryGetCurrentContext(ThreadLocalThreadContextFacade.java:37) ~[cloudplatform-core-3.63.0.jar:na]
at io.vavr.control.Try.flatMapTry(Try.java:490) ~[vavr-0.10.4.jar:na]
at io.vavr.control.Try.flatMap(Try.java:472) ~[vavr-0.10.4.jar:na]
at com.sap.cloud.sdk.cloudplatform.thread.ThreadContextAccessor.tryGetCurrentContext(ThreadContextAccessor.java:84) ~[cloudplatform-core-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.RequestScopedHttpClientCache.getCache(RequestScopedHttpClientCache.java:28) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.AbstractHttpClientCache.tryGetOrCreateHttpClient(AbstractHttpClientCache.java:78) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.AbstractHttpClientCache.tryGetHttpClient(AbstractHttpClientCache.java:46) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.HttpClientAccessor.tryGetHttpClient(HttpClientAccessor.java:153) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.sap.cloud.sdk.cloudplatform.connectivity.HttpClientAccessor.getHttpClient(HttpClientAccessor.java:131) ~[cloudplatform-connectivity-3.63.0.jar:na]
at com.octanner.mca.service.MarketingCloudApiContactService.uploadContacts(MarketingCloudApiContactService.java:138) ~[classes/:na]
...
This happens when the following calls are made...
Using the lower level API
HttpClient httpClient = HttpClientAccessor.getHttpClient(destination); // warning happens here
ODataRequestResultMultipartGeneric batchResult = requestBatch.execute(httpClient);
Using the higher level API
service
.getAllContactOriginData()
.withQueryParameter("$expand", "AdditionalIDs")
.top(size)
.filter(filter)
.executeRequest(destination)); // warning happens here
Even though this warning shows up in the logs the service requests do continue to work as expected. It's just a little concerning to see this and I'm wondering if maybe I have something misconfigured. I reviewed all of the java docs and the troubleshooting page and didn't see anything out of the ordinary other than how I am fetching my destination, but even using the DestinationAccessor didn't seem to make a difference. Also, I'm not doing any asynchronous or multi-tenant processing.
Any help you or guidance you can give on this would be appreciated!
Cheers!
Such an issue is often the result of missing Spring Boot annotations - especially in synchronous executions.
Please refer to our documentation to learn more about the SAP Cloud SDK Spring Boot integration.
Edit Feb. 28th 2022
It is safe to ignore the logged warning if your application does not need any of the SAP Cloud SDK's multitenancy features.
Error Cause
The SAP Cloud SDK for Java recently (in version 3.63.0) introduced a change to the thread propagation behavior of the HttpClientCache.
With that change, we also adapted the logging in case the propagation didn't work as expected - this is often caused by not using the ThreadContextExecutor for wrapping asynchronous operations.
This is the reason for logs like the one described by the issue author.
Planned Mitigation
In the meanwhile, we realized that these WARN logs are causing confusion on the consumer side.
We are working on improving the situation by degrading the log level to INFO for the message and to DEBUG for the exception.

Core data and cloudkit sync wwdc 2019 not working for beta 3

I am trying to replicate the result of WWDC talk on syncing core data with cloud kit automatically.
I tried three approaches:
Making a new master slave view app and following the steps at in
wwdc 2019 talk, in this case no syncing happens
Downloading the sample wwdc 2019 app also in this case no symcing happens
I made a small app with a small core data and a cloud kit container in this case syncing happens but I have to restart the app. I suspected it had to do with history management so observed the NSPersistentStoreRemoteChange notification not nothing receives.
Appreciate any help.
I also played around with CoreData and iCloud and it work perfectly. I would like to list some important points that may help you go further:
You have to run the app on a real device with iCloud Acc We can now test iCloud Sync on Simulator, but it will not get notification automatically. We have to trigger manually by select Debug > Trigger iCloud Sync
Make sure you added Push Notification and iCloud capability to your app. Make sure that you don't Dave issue with iCloud container (in this case, you will see red text on iCloud session in Xcode)
In order to refresh the view automatically, you need to add this line into your Core Data Stack: container.viewContext.automaticallyMergesChangesFromParent = true.
Code:
public lazy var persistentContainer: NSPersistentCloudKitContainer = {
/*
The persistent container for the application. This implementation
creates and returns a container, having loaded the store for the
application to it. This property is optional since there are legitimate
error conditions that could cause the creation of the store to fail.
*/
let container = NSPersistentCloudKitContainer(name: self.modelName)
container.viewContext.automaticallyMergesChangesFromParent = true
container.loadPersistentStores(completionHandler: { (storeDescription, error) in
if let error = error as NSError? {
// Replace this implementation with code to handle the error appropriately.
// fatalError() causes the application to generate a crash log and terminate. You should not use this function in a shipping application, although it may be useful during development.
/*
Typical reasons for an error here include:
* The parent directory does not exist, cannot be created, or disallows writing.
* The persistent store is not accessible, due to permissions or data protection when the device is locked.
* The device is out of space.
* The store could not be migrated to the current model version.
Check the error message to determine what the actual problem was.
*/
fatalError("Unresolved error \(error), \(error.userInfo)")
}
})
return container
}()
When you add some data, normally you should see console log begin with CloudKit: CoreData+CloudKit: ..........
Sometimes the data is not synced immediately, in this case, I force close the app and build a new one, then the data get syncing.
There was one time, the data get synced after few hours :(
I found that the NSPersistentStoreRemoteChange notification is posted by the NSPersistentStoreCoordinator and not by the NSPersistentCloudKitContainer, so the following code solves the problem:
// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(self.storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: container.persistentStoreCoordinator)
Also ran into the issue with .NSPersistentStoreRemoteChange notification not being sent.
Code from Apples example:
// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(type(of: self).storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: container)
Solution for me was to not set the container as object for the notification, but nil instead. Is it not used anyway and prevents the notification from being received:
// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(type(of: self).storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: nil)
Update:
As per this answer: https://stackoverflow.com/a/60142380/3187762
The correct way would be to set container.persistentStoreCoordinator as object:
// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(type(of: self).storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: container.persistentStoreCoordinator)
I had the same problem, reason was that iCloudDrive must be enabled in your devices. Check it in the Settings of every your device
I understand this answer comes late and is not actually specific to the WWDC 19 SynchronizingALocalStoreToTheCloud Apple's sample project to which OP refers to, but I had syncing issues (not upon launch, when it synced fine, but only during the app being active but idle, which seems to be case 3 of the original question) in a project that uses Core Data + CloudKit with NSPersistentCloudKitContainer and I believe the same problems I had - and now apparently I have solved - might affect other Users reading this question in the future.
My app was built using Xcode's 11 Master-Detail template with Core Data + CloudKit from the start, so I had to do very little to have syncing work initially:
Enable Remote Notifications Background Mode in Signing & Capabilities for my target;
Add the iCloud capability for CloudKit;
Select the container iCloud.com.domain.AppName
Add viewContext.automaticallyMergesChangesFromParent = true
Basically, I followed Getting Started With NSPersistentCloudKitContainer by Andrew Bancroft and this was enough to have the MVP sync between devices (Catalina, iOS 13, iPadOS 13) not only upon launch, but also when the app was running and active (thanks to step 4 above) and another device edited/added/deleted an object.Being the Xcode template, it did not have the additional customisations / advanced behaviours of WWDC 2019's sample project, but it actually accomplished the goal pretty well and I was satisfied, so I moved on to other parts of this app's development and stopped thinking about sync.
A few days ago, I noticed that the iOS/iPadOS app was now only syncing upon launch, and not while the app was active and idle on screen; on macOS the behaviour was slightly different, because a simple command-tab triggered sync when reactivating the app, but again, if the Mac app was frontmost, no syncing for changes coming from other devices.
I initially blamed a couple of modifications I did in the meantime:
In order to have the sqlite accessible in a Share Extension, I moved the container in an app group by subclassing NSPersistentCloudKitContainer;
I changed the capitalisation in the name of the app and, since I could not delete the CloudKit database, I created a new container named iCloud.com.domain.AppnameApp (CloudKit is case insensitive, apparently, and yes, I should really start to care less about such things).
While I was pretty sure that I saw syncing work as well as before after each one of these changes, having sync (apparently) suddenly break convinced me, for at least a few hours, that either one of those modification from the default path caused the notifications to stop being received while the app was active, and that then the merge would only happen upon launch as I was seeing because the running app was not made aware of changes.
I should mention, because this could help others in my situation, that I was sure notifications were triggered upon Core Data context saves because CloudKit Dashboard was showing the notifications being sent:
So, I tried a few times clearing Derived Data (one never knows), deleting the apps on all devices and resetting the Development Environment in CloudKit's Dashboard (something I already did periodically during development), but I still had the issue of the notifications not being received.
Finally, I realised that resetting the CloudKit environment and deleting the apps was indeed useful (and I actually rebooted everything just to be safe ;) but I also needed to delete the app data from iCloud (from iCloud's Settings screen on the last iOS device where the app was still installed, after deleting from the others) if I really wanted a clean slate; otherwise, my somewhat messed up database would sync back to the newly installed app.
And indeed, a truly clean slate with a fresh Development Environment, newly installed apps and rebooted devices resumed the notifications being detected from the devices also when the apps are frontmost.So, if you feel your setup is correct and have already read enough times that viewContext.automaticallyMergesChangesFromParent = true is the only thing you need, but still can't see changes come from other devices, don't exclude that something could have been messed up beyond your control (don't get me wrong: I'm 100% sure that it must have been something that I did!) and try to have a fresh start... it might seem obscure, but what isn't with the syncing method we are choosing for our app?

Mongo doesn't save data into disk

In our project we often have a problem when mongo doesn't save its state into disk, and after rebooting the application we lose data. I could not determine when and why this happens - somehow and somewhen :). Does anybody know how to synchronize mongodb storage to disk with some api? We use mongorito ODM. PLeasure to hear any variants.
Some details.
Mongo version 3.2.
Application - it is an electron application. Under the hood it uses mongo as storage - we use mongo on client side and install it as a windows service advantagely. Application starts, makes different transactions, read/write data from/to mondo db - nothing strange. When we close this application and reopen next time - we cannot find last rows (documents) in some collections that were succesfully (according to mongo answers) saved. We have no errors.
Can anyone explain what the write concern is and how to setup it not to wait 60 seconds before flushing the data - may be this is the reason?
Some code of db connect/disconnect. app means an electron application:
const {Database} = require('mongorito');
const db = new Database(__DBPATH__);
db.connect();
db.register(__MONGORITO_MODEL__);
app.on('window-all-closed', () => {
db.disconnect();
});
I'd take a look at the write concern setting within your application and make sure it's set to the requirements of your business - https://docs.mongodb.com/manual/reference/write-concern/
Also, make sure you're running a replica set in your production environment 👍
Thanks to everyboy, I've solved the problem. The reason was the journaling. I turn on the journaling for mongodb service and the problem has gone.
mongod.exe --journal

Can't backup to S3 with OpsCenter 5.2.1

I upgraded OpsCenter from 5.1.3 to 5.2.0 (and then to 5.2.1). I had a scheduled backup to local server and an S3 location configured before the upgrade, which worked fine with OpsCenter 5.1.3. I made to no changes to the scheduled backup during or after the upgrade.
The day after the upgrade, the S3 backup failed. In opscenterd.log, I see these errors:
2015-09-28 17:00:00+0000 [local] INFO: Instructing agents to start backups at Mon, 28 Sep 2015 17:00:00 +0000
2015-09-28 17:00:00+0000 [local] INFO: Scheduled job 458459d6-d038-41b4-9094-7d450e4bac6f finished
2015-09-28 17:00:00+0000 [local] INFO: Snapshots started on all nodes
2015-09-28 17:00:08+0000 [] WARN: Marking request d960ad7b-2ccd-40a4-be7e-8351ac038c53 as failed: {'sstables': {u'solr_admin': {u'solr_resources': {'total_size': 155313, 'total_files': 12, 'done_files': 0, 'errors': [u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', shortened for brevity.
The S3 location no longer appears in OpsCenter when I edit the scheduled backup job. When I try to re-add the S3 location, using the same bucket and credentials as before, I get the following error:
Location validation error: Call to /local/backups/destination_validate timed out.
Also, I don't know if this is related, but for completeness, I see some of these errors in the opscenterd.log as well:
WARN: No http agent exists for definition file update. This is likely due to SSL import failure.
I get this behavior with either DataStax Enterprise 4.5.1 or 4.7.3.
I have been having the exact same problem since updating to OpsCenter 5.2.x and just was able to get it working properly.
I removed all the settings suggested in the previous answer and then created new buckets in us-west-1, us-west-2 and us-standard. After this I was able to successfully able to add all of those as destinations quickly and easily.
It appears to me that the problem is that OpsCenter may be trying to list the objects in the bucket that you configure initially, which in my case for the 2 existing ones we were using had 11TB and 19GB of data in them respectively.
This could explain why increasing the timeout for some worked and not others.
Hope this helps.
Try adding the remote_backup_region property to the cluster configuration file under the [agents] heading in "cluster-name".conf. Valid values are: us-standard, us-west-1, us-west-2, eu-west-1, ap-northeast-1, ap-southeast-1
Does that help?
The problem was resolved by a combination of 2 things.
Delete the entire contents of the existing S3 bucket (or create a new bucket as previously suggested by #kaveh-nowroozi).
Edit /etc/datastax-agent/datastax-agent-env.sh and increase the heap size to 512M as suggested by a DataStax engineer. The default was set at 128M and I kept doubling it until backups became successful.

SQL Azure unexpected database deletion/recreation

I've been scratching my head on this for hours, but can't seem to figure out what's wrong.
Here's our project basic setup:
MVC 3.0 Project with ASP.NET Membership
Entity Framework 4.3, Code First approach
Local environment: local SQL Server with 2 MDF database files attached (aspnet.mdf + entities.mdf)
Server environment: Windows Azure + 2 SQL Azure databases (aspnet and entities)
Here's what we did:
Created local and remote databases, modified web.config to use SQLEXPRESS connection strings in debug mode and SQL Azure connection strings in release mode
Created a SampleData class extending DropCreateDatabaseAlways<Entities> with a Seed method to seed data.
Used System.Data.Entity.Database.SetInitializer(new Models.SampleData()); in Application_Start to seed data to our databases.
Ran app locally - tables were created and seeded, all OK.
Deployed, ran remote app - tables were created and seeded, all OK.
Added pre-processor directives to stop destroying the Entity database at each application start on our remote Azure environment:
#if DEBUG
System.Data.Entity.Database.SetInitializer(new Models.SampleData());
#else
System.Data.Entity.Database.SetInitializer<Entities>(null);
#endif
Here's where it got ugly
We enabled Migrations using NuGet, with AutomaticMigrationsEnabled = true;
Everything was running smooth and nice. We left it cooking for a couple days
Today, we noticed an unknown bug on the Azure environment:
we have several classes deriving from a superclass SuperClass
the corresponding Entity table stores all of these objects in the same SuperClass table, using a discriminator to know which column to feed from when loading the various classes
While the loading went just fine before today, it doesn't anymore. We get the following error message:
The 'Foo' property on 'SubClass1' could not be set to a 'null' value. You must set this property to a non-null value of type 'Int32'.
After a quick check, our SuperClass table has columns Foo and Foo1. Logical enough, since SuperClass has 2 subclasses SubClass1 and SubClass2, each with a Foo property. In our case, Foo is NULL but Foo1 has an int32 value. So the problem is not with the database - rather, it would seem that the link between our Model and Database has been lost. The discriminator logic was corrupted.
Trying to find indications on what could've gone wrong, we noticed several things:
Even though we never performed any migration on the SQL Azure Entity database, the database now has a _MigrationHistory table
The _MigrationHistory table has one record:
MigrationID: 201204102350574_InitialCreate
CreatedOn: 4/10/2012 11:50:57 PM
Model: <Binary data>
ProductVersion: 4.3.1
Looking at other tables, most of them were emptied when this migration happened. Only the tables that were initially seeded with SampleData remained untouched.
Checking in with the SQL Azure Management portal, our Entity database shows the following creation date: 4/10/2012 23:50:55.
Here is our understanding
For some reason, SQL Azure deleted and recreated our database
The _MigrationHistory table was created in the process, registering a starting point to test the model against for future migrations
Here are our Questions
Who / What triggered the database deletion / recreation?
How could EF re-seed our sample data since Application_Start has System.Data.Entity.Database.SetInitializer<Entities>(null);?
EDIT: Looking at what could've gone wrong, we noticed one thing we didn't respect in this SQL Azure tutorial: we didn't remove PersistSecurityInfo from our SQL Azure Entity database connection string after the database was created. Can't see why on Earth it could have caused the problem, but still worth mentioning...
Nevermind, found the cause of our problem. In case anybody wonders: we hadn't made any Azure deployment since the addition of the pre-processor directives. MS must have restarted the machine our VM resided on, and the new VM recreated the database using see data.
Lesson learned: always do frequent Azure deployments.

Resources