Azure "/_partitionKey" - azure

I created multiple collections through code without realising the importance of having a partition key. I have since read the only way to add a partition key and redistribute the data is by deleting the collection and recreating it.
I don't really want to have to do this as I have quite a-lot of data already and want to avoid the downtime. When I look the Scale & Settings menu in Azure for each of my collections is see this below.
Can someone explain this - I thought my partition key was null but looks like MS have given me one called _partitionKey? Can I not just add _partitionKey to my documents, run a script to update them all to the key I want to use (e.g country)?

This is a new feature which allows non-partitioned collections (now called containers in the latest SDKs) to start using partitions with 0 downtime. The big caveat is that you need to be using the latest SDKs (which will be announced GA really soon (in fact most are already published, just waiting on doc publishing/etc.). Portal got the feature first since it's using the latest SDK under the covers already.

Related

Possible to make partition key added on server side Azure Table Storage

I noticed that Windows Azure Diagnostics uses a UTC ticks primary key as a method of making it easy to access entries by time ranges. I would like to implement a similar system for my table.
However a major issue is that the systems that will be doing the uploading will not necessarily have their time synced to the millisecond (not to mention ping time differences) so setting the Partition key locally and then uploading doesn't work well (I am having all kinds of race condition issues). Ideally I would like to guarantee that any time a table entry is made its partition key is certain to be greater than or equal to any partition key already in the table (since that's how time works).
The only way that I can think to ensure this guarantee is by having the "timestamp" partition key set server side. Is there some way to have this happen, such as via a server side script?
Note: I realize that a timestamp is added already for when an entry is made, but tables are not indexed by this timestamp.
I would recommend using twitter / snowflake solution for that purpose. I had very similar requirement and that approach worked perfectly for me.
I used Flake ID Generator. It is .net implementation based on twitter / snowflake.
The generator can be independently deployed to different Azure instances (or work as a independent service) - I was generating ids independently on each Azure service instance. Generated 64-bit ids are directly sortable and always unique (even if come from the different instances at the same time). You also have access to its source code so you can also add customizations if needed.
I hope that will help.

Azure Web Site Migrations & Concurrency

I have two Azure Websites set up - one that serves the client application with no database, another with a database and WebApi solution that the client gets data from.
I'm about to add a new table to the database and populate it with data using a temporary Seed method that I only plan on running once. I'm not sure what the best way to go about it is though.
Right now I have the database initializer set to MigrateDatabaseToLatestVersion and I've tested this update locally several times. Everything seems good to go but the update / seed method takes about 6 minutes to run. I have some questions about concurrency while migrating:
What happens when someone performs CRUD operations against the database while business logic and tables are being updated in this 6-minute window? I mean - the time between when I hit "publish" from VS, and when the new bits are actually deployed. What if the seed method modifies every entry in another table, and a user adds some data mid-seed that doesn't get hit by this critical update? Should I lock the site while doing it just in case (far from ideal...)?
Any general guidance on this process would be fantastic.
Operations like creating a new table or adding new columns should have only minimal impact on the performance and be transparent, especially if the application applies the recommended pattern of dealing with transient faults (for instance by leveraging the Enterprise Library).
Mass updates or reindexing could cause contention and affect the application's performance or even cause errors. Depending on the case, transient fault handling could work around that as well.
Concurrent modifications to data that is being upgraded could cause problems that would be more difficult to deal with. These are some possible approaches:
Maintenance window
The most simple and safe approach is to take the application offline, backup the database, upgrade the database, update the application, test and bring the application back online.
Read-only mode
This approach avoids making the application completely unavailable, by keeping it online but disabling any feature that changes the database. The users can still query and view data while the application is updated.
Staged upgrade
This approach is based on carefully planned sequences of changes to the database structure and data and to the application code so that at any given stage the application version that is online is compatible with the current database structure.
For example, let's suppose we need to introduce a "date of last purchase" field to a customer record. This sequence could be used:
Add the new field to the customer record in the database (without updating the application). Set the new field default value as NULL.
Update the application so that for each new sale, the date of last purchase field is updated. For old sales the field is left unchanged, and the application at this point does not query or show the new field.
Execute a batch job on the database to update this field for all customers where it is still NULL. A delay could be introduced between updates so that the system is not overloaded.
Update the application to start querying and showing the new information.
There are several variations of this approach, such as the concept of "expansion scripts" and "contraction scripts" described in Zero-Downtime Database Deployment. This could be used along with feature toggles to change the application's behavior dinamically as the upgrade stages are executed.
New columns could be added to records to indicate that they have been converted. The application logic could be adapted to deal with records in the old version and in the new version concurrently.
The Entity Framework may impose some additional limitations in the options, because it generates the SQL statements on behalf of the application, so you would have to take that into consideration when planning the stages.
Staging environment
Changing the production database structure and executing mass data changes is risky business, especially when it must be done in a specific sequence while data is being entered and changed by users. Your options to revert mistakes can be severely limited.
It would be necessary to do extensive testing and simulation in a separate staging environment before executing the upgrade procedures on the production environment.
I agree with the maintenance window idea from Fernando. But here is the approach I would take given your question.
Make sure your database is backed up before doing anything (I am assuming its SQL Azure)
Put up a maintenance page on the Client Application
Run the migration via Visual Studio to your database(I am assuming you are doing this through the console) or a unit test
Publish the website/web api websites
Verify your changes.
The main thing is working with the seed method via Entity Framework is that its easy to get it wrong and without a proper backup while running against Prod you could get yourself in trouble real fast. I would probably run it through your test database/environment first (if you have one) to verify what you want is happening.

How to optimize transactions costs from testing existence of keys?

I'm designing an application using Azure Storage Blobs/Table/Queue, handling massive amount of data.
One important aspect of the application, is that work will be done if a given key don't exist, and determining the existence of a key is a frequent and intensive task.
I need to optimize as much possible billable transactions from existence checks of keys.
It could be either against blobs or tables.
I looked at this document Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity It seems that 404 errors are not counted only from anonymous requests.
I was also thinking of using a BatchTableOperation to check 100 keys at once, maybe using a Replace or Merge, and determine in the results if the key indeed existed (haven't tried, actualy I got the idea while writing)
Any good hack are welcomed.
You should use Windows Azure Caching:
Load all existing keys in the cache
Each time you add a record to Table Storage, also add it to cache
Once you've done that, have your application check cache first. If the item is not present there, check Table Storage just to be sure (to cover edge cases). But 99% of the time, if the item has already been processed the key will be available in the cache and you won't need to query Table Storage (this will drastically reduce transactions to Table Storage).
If using Windows Azure Caching is not an option there are alternatives, like using MemoryCache, save all keys in a file, ...

aspnet_regsql and deployment to Azure

I'm pretty new to Azure and trying to work on deploying an already existing MVC 3 website (I'm late to the project).
It has membership information (where the tables should be genned from aspnet_regsql) and it links those tables to application specific tables. To get it into a working state I need to insert some form of "default data" as the code does (unfortunately) make some assumptions about what should be in the database.
No bother, I have an app that creates a default database and inserts the required data. I can then import that into Azure, this doesn't work as Azure demands clustered indexes. This is because aspnet_regsql creates some auth table keys as unclustered so I'm now left having to alter these tables as part of the process to make the primary keys clustered.
I was just wondering if aspnet_regsql had been superceded somehow due to Azure demanding clustered indexes? Am I missing a trick here or is writing a script to modify the clustering of these indexes the sensible approach?
Found the solution elsewhere here:
http://support.microsoft.com/kb/2006191/de
If you use the Universal Providers, you don't need the scripts.
Check out Hanselman's post. The Universal providers will manage the database creation if you are working with SQL Server, Compact Edition, or Windows Azure Database
There are a lot of references to updated scripts including some on my own blog that are no longer needed.

Update crashes after coredata automatic lightweight migration

I recently submitted an upgrade of my app which included a lightweight coredata migration (including new fields in existing tables and a couple of new tables). I followed every tip regarding this migration, including some I found on this site.
I thoroughly tested the update on three different devices and it all went ok!!!
However, this update is crashing an all my devices and probably on all my customers. I can't explain why this is happening.
Could you please help me understand this debacle?
To truly test your app and migration, you need to run your original app to create data store according to the original data model. Then you need to run your new app, opening data store that was generated with original app. This can be a real pain and is easier (at least initially) to do in Simulator because you have more control over the file system and can swap in a saved original data store. On iDevice you need to regenerate original data store for each test.
If you are testing on your own development devices then you have already migrated your data store. Is it possible that your test devices created their data stores with new data model - and never actually performed a migration?
I only generally use automatic migration during beta testing, for quick revisions, other than that I always use a mapping model, so that you have control.
the other issue is that if your model shifts far enough between releases, auto migration from v1-v2 could be fine, and v2-v3 could be ok, but v1-v3 could be too drastic to be inferred. by making maps for them, you retain control of the migration.

Resources