AZURE blob storage v2 lifecycle management

AZURE blob storage v2 lifecycle management - azure

I am preparing for DP-200 exam and confused with lifecycle management.
The sample scenario is:
You manage a financial computation data analysis process. Microsoft Azure virtual machines (VMs) run the process in daily jobs, and store the results in virtual hard drives (VHDs.)
The VMs product results using data from the previous day and store the results in a snapshot of the VHD. When a new month begins, a process creates a new
VHD.
You must implement the following data retention requirements:
- Daily results must be kept for 90 days
- Data for the current year must be available for weekly reports
- Data from the previous 10 years must be stored for auditing purposes
- Data required for an audit must be produced within 10 days of a request.
You need to enforce the data retention requirements while minimizing cost.
How should you configure the lifecycle policy? To answer, drag the appropriate JSON segments to the correct locations. Each JSON segment may be used once, more than once, or not at all. You may need to drag the split bat between panes or scroll to view content.
<code>"BaseBlob": {
"TierToArchive": { "DaysAfterModificationGreaterThan": 365 },
"Delete": { "DaysAfterModificationGreaterThan": 3650 }
},
"Snapshot": {
"TierToCool": {"DaysAfterCreationGreaterThan": 90 }
}</code>
That's the solution marked as correct. I try to find the "snapshot" documentation regarding lifecycle, but cannot find it in MS docs.
Can someone explain me the purpose of a Snapshot here please? and what correct lifecycle should be.

--Daily results must be kept for 90 days
--Data for the current year must be available for weekly reports
--Data from the previous 10 years must be stored for auditing purposes
--Data required for an audit must be produced within 10 days of a request
baseblob
-- delete -10years
-- tiertocold -10days
-- tiertoarchive -1 years
snapshot
-- delete -90days
Reference link for the snapshot: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=template#rules

Related

Azure cost analysis for a particular subscription using Python SDK

So I'm trying to automate fetching the current cost and cost forecast (Like it is shown under cost analysis for a particular subscription) for a particular subscription using python SDK but I haven't been able to find a single API that does this yet.
I've tried using UsageAggregate and Rate card but I haven't really figured out a way to find the cost for the current month to date. If there is an API that I'm missing or if I need to calculate monthly costs myself, I'd appreciate any code snippets or help.

If you already have the usage and the ratecard data, then you must combine them.
Take the meterId of the usage data and get the related ratecard data.
The ratecard data contains the MeterRates and the IncludedQuantity which you must take.
There are probably multiple meter rates and the included quantity because there are probably different costs per usage (e.g. first 10 calls for free, 3 GB for free, ...).
The consumption starts/is reseted at the 14th of the month. That's the reason why you have to read the data from the whole billing period (begins with 14th of each month), because that's the only way how you get the correct consumption.
So, if you are using e.g. Azure Functions and you have a usage of 100.000 units per day and you want the costs from 20th - 30th, then the calculation works as follows:
read data from 14th - 30th. These are 17 days and therefore it used 1.700.000 units. The first 400.000 are for free = IncludedQuantity (so in this sample the first 4 days).
From the 400.001 unit on, you have to take the meter rate (0,0000134928 €) and calculate the costs. 1.300.000 * 0,0000134928 = ~17,54€.
Fortunately, the azure functions have only one rate. If the rate changes e.g. after 5.000.000 units, then you also have to take this into account. If you have the whole costs, then you can filter on your date which is 20.-30. and you will get the result.
Its calculation implemented in C# and published it as a NuGet package here. It also contains a sample console which you could use to export the data.

I know I am bit late to the party, but after struggling with the same problem, I managed to create the code for getting the cost of a resource group using
azure.mgmt.costmanagement
Link to cost management API
Code sample is in my answer here

How to copy managed database?

AFAIK there is no REST API providing this functionality directly. So, I am using restore for this (there are other ways but those don’t guarantee transactional consistency and are more complicated) via Create request.
Since it is not possible to turn off short time backup (retention has to be at least 1 day) it should be reliable. I am using current time for ‘properties.restorePointInTime’ property in request. This works fine for most databases. But one db returns me this error (from async operation request):
"error": {
"code": "BackupSetNotFound",
"message": "No backups were found to restore the database to the point in time 6/14/2021 8:20:00 PM (UTC). Please contact support to restore the database."
}
I know I am not out of range because if the restore time is before ‘earliestRestorePoint’ (this can be found in GET request on managed database) or in future I get ‘PitrPointInTimeInvalid’ error. Nevertheless, I found some information that I shouldn’t use current time but rather current time - 6 minutes at most. This is also true if done via Azure Portal (where it fails with the same error btw) which doesn’t allow to input time newer than current - 6 minutes. After few tries, I found out that current time - circa 40 minutes starts to work fine. But 40 minutes is a lot and I didn’t find any way to find out what time works before I try and wait for result of async operation.
My question is: Is there a way to find what is the latest time possible for restore?
Or is there a better way to do ‘copy’ of managed database which guarantees transactional consistency and is reasonably quick?
EDIT:
The issue I was describing was reported to MS. It was occuring when:
there is a custom time zone format e.g. UTC + 1 hour.
Backups are skipped for the source database at the desired point in time because the database is inactive (no active transactions).
This should be fixed as of now (25th of August 2021) and I were not able to reproduce it with current time - 10 minutes. Also I was told there should be new API which would allow to make copy without using PITR (no sooner than 1Q/22).

To answer your first question "Is there a way to find what is the latest time possible for restore?"
Yes. Via SQL. The only way to find this out is by using extended event (XEvent) sessions to monitor backup activity.
Process to start logging the backup_restore_progress_trace extended event and report on it is described here https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/backup-activity-monitor
Including the SQL here in case the link goes stale.
This is for storing in the ring buffer (max last 1000 records):
CREATE EVENT SESSION [Verbose backup trace] ON SERVER
ADD EVENT sqlserver.backup_restore_progress_trace(
WHERE (
[operation_type]=(0) AND (
[trace_message] like '%100 percent%' OR
[trace_message] like '%BACKUP DATABASE%' OR [trace_message] like '%BACKUP LOG%'))
)
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
ALTER EVENT SESSION [Verbose backup trace] ON SERVER
STATE = start;
Then to see output of all backup events:
WITH
a AS (SELECT xed = CAST(xet.target_data AS xml)
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe
ON (xe.address = xet.event_session_address)
WHERE xe.name = 'Verbose backup trace'),
b AS(SELECT
d.n.value('(#timestamp)[1]', 'datetime2') AS [timestamp],
ISNULL(db.name, d.n.value('(data[#name="database_name"]/value)[1]', 'varchar(200)')) AS database_name,
d.n.value('(data[#name="trace_message"]/value)[1]', 'varchar(4000)') AS trace_message
FROM a
CROSS APPLY xed.nodes('/RingBufferTarget/event') d(n)
LEFT JOIN master.sys.databases db
ON db.physical_database_name = d.n.value('(data[#name="database_name"]/value)[1]', 'varchar(200)'))
SELECT * FROM b
NOTE: This tip came to me via Microsoft support when I had the same issue of point in time restores failing what seemed like randomly. They do not give any SLA for log backups. I found that on a busy database the log backups seemed to happen every 5-10 minutes but on a quiet database hourly. Recovery of a database this way can be slow depending on number of transaction logs and amount of activity to replay etc. (https://learn.microsoft.com/en-us/azure/azure-sql/database/recovery-using-backups)
To answer your second question: "Or is there a better way to do ‘copy’ of managed database which guarantees transactional consistency and is reasonably quick?"
I'd have to agree with Thomas - if you're after guaranteed transactional consistency and speed you need to look at creating a failover group https://learn.microsoft.com/en-us/azure/azure-sql/database/auto-failover-group-overview?tabs=azure-powershell#best-practices-for-sql-managed-instance and https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/failover-group-add-instance-tutorial?tabs=azure-portal
A failover group for a managed instance will have a primary server and failover server with the same user databases on each kept in synch.
But yes, whether this suits your needs depends on the question Thomas asked of what is the purpose of the copy.

SharePoint list update item takes more than 5 seconds

We have the Sharepoint 2016 hosted on prem with a minimum set of services running on the server. The resource utilization is very low and the user base is around 100. There are no workflows or any other resource consuming service running.
We use list to store and update information for certain users with the help of a form for the end user. Of recent, the time consumed for the update has increased to over 6 seconds for a list data update.
Example:
https://sitename_url/_api/web/lists/GetByTitle('WFListInfo')/items(15207)
This list has about 15 items, mostly numbers and single line text or number or DateTime.
The indexing is set to automatic.
As part of the review, we conducted a few checks and DB indexing on our cluster, however there is no improvement.
Looking forward to any help / suggestions. Thank you.

How to change retention duration for Azure Application Insights?

At the moment most of the data retained for 90 days by default. I was wondering if there way to change this setting to 30-40 days. I know that I can export them to keep the data longer but what I'm looking for is mainly keep the data for shorter duration for the upcoming regulations.

Update
The default retention for Application Insights resources is 90 days. Different retention periods can be selected for each Application Insights resource. The full set of available retention periods is 30, 60, 90, 120, 180, 270, 365, 550 or 730 days.
Note: If you need to keep data longer than 730 days, you can use Continuous Export to copy it to a storage account during data ingestion.
To change the retention, from your Application Insights resource, go to the Usage and Estimated Costs page and select the Data Retention option:
Reference

Sometimes the only answer is a no. In this case, you can't. From the docs:
Raw data points (that is, items that you can query in Analytics and inspect in Search) are kept for up to 90 days. If you need to keep data longer than that, you can use continuous export to copy it to a storage account.
Aggregated data (that is, counts, averages and other statistical data that you see in Metric Explorer) are retained at a grain of 1 minute for 90 days.
I remember that a long time ago the pricing tier dictated the maximum retention period but it is now fixed to 90 days for all plans.
You can try give your feedback / ask for this feature here.

It is now available as an option in the Azure portal. If not, you need to get in touch with Azure support to get it activated.

Azure : Resource usage API issue

I tried to pull the Azure resource usage data for billing metrics. I followed the steps as mentioned in the blog to get Usage data of resources.
https://msdn.microsoft.com/en-us/library/azure/mt219001.aspx
Even If I set "start and endtime" parameter in the URL, its not take effect. It returns entire output [ from resource created/added time ].
For example :
https://management.azure.com/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Commerce/UsageAggregates?api-version=2015-06-01-preview&reportedStartTime=2017-03-03T00%3a00%3a00%2b00%3a00&reportedEndTime=2017-03-04T00%3a00%3a00%2b00%3a00&aggregationGranularity=Hourly&showDetails=true"
As per the above URL, it should return the data between "2017-03-03 to 2017-03-04". But It shows the data from 2nd March [ 2017-03-02]. don't know why this return entire output and time filter section is not working.
Note : Endtime parameter value takes effect, mean it shows the output upto what mentioned in the endtime. But it doesn't consider the start time.
Anyone have a suggestion on this.

So there are a few things to consider:
There is usage date/time and then there is reported date/time.
Former tells you the date/time when the resources were used while the
latter tells you the date/time when this information was received by
the billing sub-system. There will be some delay in when the
resources used versus when they are reported. From this link:
Set {dateTimeOffset-value} for reportedStartTime and reportedEndTime
to valid dateTime values. Please note that this dateTimeOffset value
represents the timestamp at which the resource usage was recorded
within the Azure billing system. As Azure is a distributed system,
spanning across 19 datacenters around the world, there is bound to be
a delay between the resource usage time (when the resource was
actually consumed) and the resource usage reported time (when the
usage event reached the billing system) and callers need a predictable
way to get all usage events for a subscription for a given time
period.
The query only lets you search for reported date/time and there is no provision for usage date/time. However the data returned back to you contains usage date/time and not the reported date/time.
Long story short, because of the delay in propagating the usage information to the billing sub-system, the behavior you're seeing is correct. In my experience, it takes about 24 hours for all the usage information to show up in the billing sub-system.
The way we handle this scenario in our application is we fetch the data for a longer duration and then pick up only the data we're interested in seeing. So for example, if I need to see the data for 1st of March then we query the data for reported date/time from 1st March to say 4th March (i.e. today's date) and then discard any data where usage date is not 1st of March.
If we don't find any data (which is quite possible and is happening in your case as well), we simply tell the users that usage information is not yet available.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string