where does azure form recognizer store results?

where does azure form recognizer store results? - azure

The Azure Cognitive Services Form Recognizer API accepts requests via a POST, and then makes the results available for 48 hours via a GET request to a resource:
POST "https://westus.api.cognitive.microsoft.com/formrecognizer/v2.1-preview.3/custom/models/{modelId}/analyze?includeTextDetails={boolean}
GET "https://westus.api.cognitive.microsoft.com/formrecognizer/v2.1-preview.3/custom/models/{modelId}/analyzeResults/{resultId}]
Presumably Azure stores these results somewhere, but I can't find any documentation regarding it. After 48 hours, is the data deleted or just made unavailable? Where does the data reside? Who owns the data? Does the account owner have access to the underlying storage account or database?

Check here in their official doc. It mentions of a "The following diagram illustrates how your data is processed." but I cannot see one 🤔
Anyways, answering ahead....
is the data deleted or just made unavailable?
deleted
The input data and results are deleted within 48 hours and not used
for any other purpose.
To learn more about privacy and security commitments, see the
Microsoft Trust Center and cognitive services compliance and privacy.
Where does the data reside? Who owns the data?
Azure internal 😕
The incoming data is processed in the same region where the Cognitive
Services Azure resource was created. When you submit your documents to
a Form Recognizer operation, it starts the process of analyzing the
document to extract all text and identify structure and key values in
a document. Your data and results are then temporarily encrypted and
stored in Azure Storage.
When you create a Form Recognizer resource in the Azure portal, you
specify a region. From then on, your resource and all of its
operations stay associated with that particular Azure server region.
Does the account owner have access to the underlying storage account or database?
Analyze Form Result API
The "Get Analyze Results" operation is authenticated against the same
API key that was used to call the "Analyze" operation to ensure no
other customer can access your data.
Azure temporarily stores the results for customers to retrieve: Analyze and
Get Results are asynchronous calls. In other words, the service
doesn't know when the customers will call the Get Results operation to
fetch the extracted results. To facilitate checking the completion
status and returning the extracted results to the customer upon
completion, the extracted results are stored temporarily in Azure
Storage. This behavior allows customers to poll the asynchronous Get
Results operation for job completion status and fetch the results upon
completion.
There is already a similar ask from a user in feedback forum, you can vote that too to get product group's attention ✌

Related

Azure Log Analytics Workspace and GDPR

Is there any way to purge/mask data in a Log Analytics workspace with regular expressions or similar, to be able to remove sensitive data that has been sent to the workspace?
Like social security numbers, that is a part of an URL?

As per this Microsoft Document, Log Analytics is a flexible store, which while prescribing a schema to your data, allows you to override every field with custom values. we can Mask the data in the Log Analytics workspace and here are a few approaches where we can set a few strategies for handling personal data
Where possible, stop the collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered "private". This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy.
Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient.
Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.
Here is the KQL query for verifying the private data in log analytics
search *
| where * matches regex #'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp
| summarize count() by $table

Azure LogicApp for files migration

I am trying to figure out if Azure LogicApp can be used for files/documents migration from Azure Blob Storage to a custom service, where we have REST API. Here is the shortlist of requirements I have right now:
Files/documents must be uploaded into Azure Storage weekly or daily, which means that we need to migrate only new items. The amount of files/documents per week is about hundreds of thousands
The custom service REST API is secured and any interaction with endpoints should have JWT passed in the headers
I did the following exercise according to tutorials:
Everything seems fine, but the following 2 requirements make me worry:
Getting only new files and not migrate those that already moved
Getting JWT to pass security checks in REST
For the first point, I think that I can introduce a DB instance (for example Azure Table Storage) to track files that have been already moved, and for the second one I have an idea to use Azure Function instead of HTTP Action. But everything looks quite complicated and I believe that there might be better and easier options.
Could you please advise what else I can use for my case?

For the first point, you can use "When a blob is added or modified" trigger as the logic app's trigger. Then it will just do operation on the new blob item.
For the second point, just provide some steps for your reference:
1. Below is a screenshot that I request for the token in logic app in the past.
2. Then use "Parse JSON" action to parse the response body from the "HTTP" action above.
3. After that, your can request your rest api (with the access token from "Parse JSON" above)

How can I detect the user sign up event from Azure API Management?

I've been using Azure API Management recently, and I would like to know if there is a way to detect the sign up process performed from the included Developer Portal.
Basically I need to be able to get the user unique id, to be able to map data stored in a database.
Is configuring Delegation the only way to capture this event?

Try to enable the Resource Logs, it includes the userId.
Reference - https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-azure-monitor#resource-logs

Where is personGroup saved for the face api from MS Cognitive Services?

I believe I am misunderstanding how this API works...
I understand that Person Groups contain Persons which contain Person Faces.
And to use Face - Verify, I compare the new image FaceId to the personGroupID and personID.
However, I don't seem to understand how/where the Person Groups are saved.
Are all Person Groups saved within one JSON and can be stored in blob storage or a DB?
Thanks in advance.
P.S. using Node.js

The PersonGroup Create API is a little different in that the caller specifies the ID. Regardless, this ID is stored in the Cognitive Services storage, along with all related information such as the people who comprise this group. The exact nature of the storage is intentionally opaque.
You can list the PersonGroup's associated with your API key with List API.
If you're looking for sample code in NodeJS, you can find some here.

Can Application Insights merge Azure.Mobile.Server.Files activity with Azure Storage blob diagnostic logs?

When a client calls GetStorageTokenAsync on the server, it gets a token that can read, write, or delete objects on the target container.
The activity done on this container is more or less hidden from my application unless I scan the logs.
Therefore I'm left to guess, or do some cumbersome programming to determine what the files where, the content of the files, determine what changed, etc...
I want to gather empirical evidence of what a given userID did with a certain known token Shared Access Signature, and aggregate the into either an administrative console like Application Insights, or some other tool that will allow programmatic response to the user's actions.
Question
What is the best way to align the actions a user takes, with a given Shared Access Signature (specifically in the context of the Nuget package Microsoft.Azure.Mobile.Server.Files?

and aggregate the into either an administrative console like Application Insights, or some other tool that will allow programmatic response to the user's actions.
There is not a tool or service which support it until now. I am afraid that you need to develop this tool by yourself. Storage Analytics Log stored in the $logs container of your storage account. All the logs are written to a text file line by line.
What is the best way to align the actions a user takes, with a given Shared Access Signature
Based on the Storage Analytics Log Format, we can only get whether a storage operation is authenticated by Shared Access Signature or account key by 'authentication-type' (If the operation is authenticated by Shared Access Signature, the value of authentication-type will be ‘sas’). We can’t get the operations which related to a given Shared Access Signature.
Azure Storage Analytics log also contains the requester-ip-address which could identity the client which send the storage operate request. If the IP address of client is static, it will help you get all the actions which the specific user taked.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string