If sensitive data was to enter a bigquery table, is it possible to permanently delete the automatic backups that are used by the time travel feature before the retention period (default of 7 days, but can be a minimum of 2 days) elapses?
Thus making it impossible to roll back and recover a snapshot of the table that contains the sensitive data and allowing for a complete and irreversible purge of the data from the project.
I haven't yet seen anything in the BQ google docs to suggest this is possible or how to handle a situation like this, but this seems like a big caveat to handling sensitive data in bigquery.
If this is not possible what other options are there to restrict access to the historical data in bigquery? Is time travel a permission that can be withdrawn by a custom role?
Related
I am using the Azure blob's metadata information mechanism mentioned here to save some information in the blob store, and later retrieve information from it.
My questions are mainly related to performance and maintenance concerns.
Is there any upper limit on the size of this metadata? What is the
maximum number of keys I can store ?
Does it expire after a certain date?
Is there any chance of losing data that is stored in the blob
metadata
If yes, I would go ahead, and write these to a database, from the service I am writing. However, ideally, I would like to use the blob's metadata feature, which is very useful, and well thought out.
Check out this documentation:
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/Setting-and-Retrieving-Properties-and-Metadata-for-Blob-Resources?redirectedfrom=MSDN
The size of the metadata cannot exceed 8 KB altogether. This means keys, values, semicolons, everything. There is no explicit limitation for the number of keys themselves, but all of them (with the actual values and other characters) must fit into the 8 KB limit.
As for the expiration, I don't think so. At least the documentation doesn't mention it. I guess if expiration was an issue, it would be important enough to be mentioned in the documentation :)
As for losing the metadata: metadata is stored along the blob, so if you lose the blob you lose the metadata (like the datacenter explodes and you didn't have the appropriate replication for your account). Other than that, I don't think it can just disappear. The documentation also states that partial updates are not possible, so it is either updated fully or not, you can't lose half of your updates.
So I've come across this AzCopy tool, and multiple tutorials that say it's good for backing up my storage blobs and whatnot.
Isn't Azure Storage automatically backed up? Isn't that what locally redundant means?
I just want to make sure I'm not missing something and putting my application in jeopardy by not running some external backup.
Redundancy is different from back-ups. Redundancy means that all your changes are replicated to another location. In case of a failover your slave can theoretically function as a master and serve the (hopefully) latest state of your file system. However, the fact that everything is replicated also means that your accidental delete actions, file corruptions, etc. are replicated. Back-ups are meant to prevent this. In case you accidentally mess something up and perform some delete requests, you still have the back-ups and you can usually go back to any point in time (if you made a backup at that time of course).
And of course it's not a bad idea to be not fully dependent on Azure.
The most important thing about any backup policy is that before you create it you decide what you are protecting against, and what sort of data are you backing up.
If the data you backing up is an offsite backup of working data. If access to that data is restricted to admin personnel and they all know what the data is. Then replication could well be all you need to protect from a hardware failure on Azure.
If however you are backing up customer data, or file data that fred in accounts randomly deletes when he falls asleep at the keyboard then you have a different threat model and you should consider your backups accordingly.
Where you back it up is very much a matter of personal requirements and philosophy. I have known customers who will keep backups on Azure and AWS (even though their only compute workload was Azure) If in your threat model you want to protect against MS going bust and selling all of their kit on ebay one morning, then it makes sense to back up elsewhere. Or you can decide that you trust Azure to go bust and just split data across multiple regions.
TL;DR
Understand what you are protecting your data from, and design your backup policy from that.
I'm new to Azure Mobile Services as well as mobile development.
From my experience in web development, retrieving data from the database is done part by part as the user requests more data i.e. the website doesn't load all the data on one go.
I'm implementing this principle in mobile app wherein data is loaded (if already in the local db) or downloaded (if not yet in the local db) as the user scrolls down.
I'm using Azure Mobile Services Sync Table to handle the loading of data in the app. However, i wont be able to paginate the downloading of data. According to this post, the PullAsync method downloads all data that has changed/added since its last sync and doesn't allow for using take/skip methods. This is because PullAsync uses incremental sync.
This would mean there will be a large download of data during the first ever launch of the app or if the app hasn't been online for a while even if the user hasn't requested for the said data (i.e. scrolled to it).
Is this a good way for handling data in mobile apps? I like using SyncTable cos it handles quite a lot of important data upload/download stuff e.g. data upload queuing, download/upload of data changes. I'm just concerned with downloading data that the user doesn't need yet.
Or maybe there's something i can do to limit the items PullAsync downloads? (aside from deleted = false and UserId = current user's UserId)
Currently, i limited the times PullAsync is called to the Loading Screen after the user logs in and when the user pulls to refresh.
Mobile development is very different from web development. While loading lots of data to a stateless web page is a bad thing, loading the same data to a mobile app might actually be a good thing. It can help app performance and usability.
The main purpose of using something like the offline data storage is for occasionally disconnected scenarios. There are always architectural tradeoffs that have to be considered. "How much is too much" is one of those tradeoffs. How many roundtrips to the server is too much? How much data transfer is too much? Can you find the right balance of the data that you pass to the mobile device? Mobile applications that are "chatty" with the servers can become unusable when the carrier signal is lost.
In your question, you suggest "maybe there's something i can do to limit the items PullAsync downloads". In order to avoid the large download, it may make sense for you to design your application to allow the user to set criteria for download. If UserId doesn't make sense, maybe a Service Date or a number of days forward or back in the schedule. Finding the right "partition" of data to load to the device will be a key consideration for usability of your app...both online and offline.
There is no one right answer for your solution. However, key considerations should be bandwidth, data plan limits, carrier coverage and user experience both connected and disconnected. Remember...your mobile app is "stateful" and you aren't limited to round-trips to the server for data. This means you have a bit of latitude to do things you wouldn't on a web page.
On some of our pages, we display some statistics like number of times that page has been viewed today, number of times it's been viewed the past week, etc. Additionally, we have an overall statistics page where we list the pages, in order, that have been viewed the most.
Today, we just insert these pageviews and event counts into our database as they happen. We also send them to Google Analytics via normal page tracking and their API. Ideally, instead of querying our database for these stats to display on our webpages, we just query Google Analytics' API. Google Analytics does a FAR better job figuring out who the real uniques are and avoids counting people who artificially inflate their pageview counts (we allow people to create pages on our site).
So the question is if it's possible to use Google Analytics' API for updating the statistics on our webpages? If I cache the results is it more feasible? Or just occasionally update our stats? I absolutely love Google Analytics for our site metrics, but maybe there's a better solution for this particular need?
So the question is if it's possible to use Google Analytics' API for updating the statistics on our webpages?
Yes, it is. But, the authentication process and xml return may slow things up. You can speed it up by limiting the rows/columns returned. Also, authentication for the way you want to display the data (if I understood you correctly) would require you to use the client authentication method. You send the username and password. Security is an issue.
I have done exactly what you described but had to put a loading graphic on the page for the stats.
If I cache the results is it more feasible? Or just occasionally update our stats?
Either one but caching seems like it would work especially since GA data is not real-time data anyway. You could make the api call and store (or process then store) the returned xml for display later.
I haven't done this but I think I might give it a go. Could even run as a scheduled job.
I absolutely love Google Analytics for our site metrics, but maybe there's a better solution for this particular need?
There are some third-party solutions (googling should root them out) but money and feasibility should be considered.
I already audit authorization success, failure and logout.
I've considered auditing (logging) every method call and keeping a version of every row and column that was ever modified, but both of those options will greatly increase the complexity of auditing. Auditing a random subset sees too random.
The legal specs (FISMA, C&A) just say something needs to be audited.
Are there any other non-domain specific auditing strategies I'm forgetting?
Considering that auditing is often about accountability you may wish to log those actions that could contribute to any event where someone/something needs to be accountable..
Alteration of client records
Alteration of configuration
Deletion of Data
It is a good idea to keep some of these things versioned, such that you can roll-back changes to vital data. Adding 'who altered' in the first place is quite straight forward.
Unless someone has direct access to the database, often application logging of any event that affects the database ... such as a transaction altering many tables ... may be sufficient. So long as you can link auditable logical action to a logical unit of accountability, regardless of what subsystem it affects, you should be able to trace accountability.
You should not be logging method calls and database alterations directly, but the business logic that led to those calls and changes, and who used that logic. Some small amount of backend code linking causality between calls/table alterations and some audit-message would be beneficial too (if you have resources).
Think of your application's audit-elements as a tree of events. The root is what you log, for example 'Dave deleted customer record 2938'. Any children of the root can be logged and you can tie it to the root, if it is important to log it as part of the audit event. For example you can assert that some audit event, 'Dave deleted ...' was tied to some billing information also going walkies as part of a constraint or something
But I am not an expert.
I agree with a lot of what Aiden said, but I strongly believe that auditing should be at the database level. Too many databases are accessed with dynamic sql so permissions are at the table level (at least in SQL Server). So the person committing fraud can enter delete or change data at the database bypassing all rules. In a well-designed system only a couple of people (the dba and backup) would have the rights to change audit triggers in prod and thus most people could get caught if they were changing data they were not authorized to change. Auditing through the app would never catch these people. Of course there is almost no way to prevent the dbas from committing fraud if they choose to do so, but someone must have admin rights to the database, so you must be extra careful in choosing such people.
We audit all changes to data, all inserts and all deletes on most tables in our database. This allows for easy backing out of a change as well as providing an audit trail. Depending on what your database stores, you may not need to do that. But I would audit every financial transaction, every personnel transaction, and every transaction having to do with orders, warehousing or anything else that might be subject to criminal activity.
If you really want to know what absolutely must be audited, talk to the people who will be auditing you and ask what they will want to see.