Does SoftDeleteColumnDeletionDetectionPolicy still keep data in Azure Search? - azure

This is mostly a GDPR question.
When data is soft-deleted using Azure Search's SoftDeleteColumnDeletionDetectionPolicy, is all of the original document data still kept, or is only enough info (document ID and the IsDeleted bit) kept to know that the document has been deleted?
Looking around at the Azure documentation, this isn't clear. It's clear that this is intended to use for soft deletes on the data source side, but it's not clear whether Azure also treats this strictly as a soft-delete, and therefore keeps all the data and just marks the document as deleted via the soft-delete bit.

When an Azure Search indexer processes a document marked as deleted, the document is removed from the search index (i.e., "hard" deleted).

Related

Deleting bigquery automatic backups/snapshots

If sensitive data was to enter a bigquery table, is it possible to permanently delete the automatic backups that are used by the time travel feature before the retention period (default of 7 days, but can be a minimum of 2 days) elapses?
Thus making it impossible to roll back and recover a snapshot of the table that contains the sensitive data and allowing for a complete and irreversible purge of the data from the project.
I haven't yet seen anything in the BQ google docs to suggest this is possible or how to handle a situation like this, but this seems like a big caveat to handling sensitive data in bigquery.
If this is not possible what other options are there to restrict access to the historical data in bigquery? Is time travel a permission that can be withdrawn by a custom role?

KTA Retention Policy Failures

I am getting following error from Retention Policy Deletion:
"Folder can’t be locked since its hierarchy contains locked document"
I am not sure how were these documents locked and how can I unlock them again.
Also, will unlocking them cause any data issues?
Help will be greatly appreciated.
Regards
Retention Policies should only try to delete documents older than whatever you have configured (by date of last access). So if this document were locked by normal use in the system, then it should have a recent last access date and not be ready for removal by retention policies.
Thus this is probably a problem occurring in the product. You will need to open a technical support case so Kofax can diagnose the problem. In all likelihood, the support team can provide a SQL script to unlock the documents once they make sure there are no other problems. But you should not do it on your own since it is not supported and, yes, it is possible to cause data issues by modifying the database.

Maximum size of metadata in Azure Blob Store

I am using the Azure blob's metadata information mechanism mentioned here to save some information in the blob store, and later retrieve information from it.
My questions are mainly related to performance and maintenance concerns.
Is there any upper limit on the size of this metadata? What is the
maximum number of keys I can store ?
Does it expire after a certain date?
Is there any chance of losing data that is stored in the blob
metadata
If yes, I would go ahead, and write these to a database, from the service I am writing. However, ideally, I would like to use the blob's metadata feature, which is very useful, and well thought out.
Check out this documentation:
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/Setting-and-Retrieving-Properties-and-Metadata-for-Blob-Resources?redirectedfrom=MSDN
The size of the metadata cannot exceed 8 KB altogether. This means keys, values, semicolons, everything. There is no explicit limitation for the number of keys themselves, but all of them (with the actual values and other characters) must fit into the 8 KB limit.
As for the expiration, I don't think so. At least the documentation doesn't mention it. I guess if expiration was an issue, it would be important enough to be mentioned in the documentation :)
As for losing the metadata: metadata is stored along the blob, so if you lose the blob you lose the metadata (like the datacenter explodes and you didn't have the appropriate replication for your account). Other than that, I don't think it can just disappear. The documentation also states that partial updates are not possible, so it is either updated fully or not, you can't lose half of your updates.

Chrome Extension Database Storage

I am working on a page action extension and would like to store information that all users of the extension can access. The information will be key:value pairs, where the key is a web url and the value is an array of links.
I have to be able to update the database without redeploying the extension to the chrome store. What is it that I should look into using? The storage APIs seem oriented towards user data rather than data stored by the app and updated by the developer.
If you want something to be updated without deploying an updated version through CWS, you'll need to host the data yourself somewhere and have the extension query it.
Using chrome.storage.local as a cache for said data would be totally appropriate.
the question is pretty broad so ill give you some ideas Ive done before.
since you say you dont want to republish when the db changes, you need to store the data for the db yourself. this doesnt mean you need to store an actual db, just a way for the extension to get the data.
ideally, you are only adding new pairs. if so, an easy way is to store your pairs in a public google spreadsheet. the extension then remembers the last row synced and uses the row feed to get data incrementally.
there a few tricks to get right the spreadsheet sync. take a look at my github "plus for trello" for a full implementation.
this is a good way to incrementally sync, thou if the db isnt huge you could just host a csv file and get it periodically from the extension.
now that you can get the data into the extension, decide how to store it. chrome.storage.local or indexedDb should be fine thou indexedDb is usually best for later querying more complex things than just a hash table.

Reverse Engineering data from Lucene/Solr Indexes

I am investigating whether it is feasable to deploy search servers to the cloud and one of the questions I had revolved around data security. Currently all of our fields (except a few used for faceting) are indexed and not stored (except for the ID, which we use to retrieve the document after search has completed).
If for some reason the servers within the cloud were compromized, would it be possible for that person to reverse engineer our data from the indexes even without the fields being stored.
Depends on the security level you need and the sensitivity of the document content...
With a configuration you describe it wouldn't be possible to rebuild the original as a "clone"... BUT it would be possible to reverse enough information to gain a lot of knowledge about the content... depending on the context this could be damaging...
An important point:
If you use the cloud based servers to build the index and they get compromized THEN there would be no need for "reversing" depending on your configuration: at least for any document you index after the servers get compromized because for building the index the document gets sent over as it is (for example when using http://wiki.apache.org/solr/ExtractingRequestHandler)...
As Yahia says, it's possible to get some information. If you're really concerned about this, use an encrypted file system, as Amazon suggests.

Resources