I know I can increase memory allocation for my flows using code. But can I do the same thing on Prefect Cloud somewhere?
To some extent, you can - it depends on what you are asking. What is currently a bottleneck for you?
E.g. you can allocate more memory to your VM, Kubernetes job or a serverless container used as infrastructure blocks. Here is an example for KubernetesJob:
from prefect.infrastructure import KubernetesJob
k8s_job = KubernetesJob(
namespace="prefect",
customizations=[
{
"op": "add",
"path": "/spec/template/spec/resources",
"value": {"limits": {"memory": "8Gi", "cpu": "4000m"}},
},
{
"op": "add",
"path": "/spec/template/spec/resources",
"value": {"requests": {"memory": "2Gi", "cpu": "1000m"}},
}
],
)
k8s_job.save("prod")
and here is one for ECSTask:
from prefect_aws.ecs import ECSTask
from prefect_aws.credentials import AwsCredentials
aws_credentials_block = AwsCredentials(
aws_access_key_id="xxx",
aws_secret_access_key="yyy",
)
ecs = ECSTask(
aws_credentials=aws_credentials_block,
cpu="256",
memory="512",
)
ecs.save("prod")
Related
I'm trying to create a batch pool via the az CLI as follows: az batch pool create --json-file foo.json.
The contents of foo.json are
{
"id": "testpool2",
"vmSize": "standard_d2s_v3",
"virtualMachineConfiguration": {
"imageReference": {
"publisher": "microsoftwindowsserver",
"offer": "windowsserver",
"sku": "2019-datacenter-core-with-containers-smalldisk",
"version": "latest"
},
"nodeAgentSKUId": "batch.node.windows amd64",
"windowsConfiguration": {
"enableAutomaticUpdates": false
},
"containerConfiguration": {
"type": "dockerCompatible",
"containerImageNames": [
"mcr.microsoft.com/windows/servercore:10.0.17763.2928-amd64"
]
},
"nodePlacementConfiguration": {
"policy": "Zonal"
}
},
"resizeTimeout": "PT15M",
"targetDedicatedNodes": 1,
"targetLowPriorityNodes": 0,
"enableAutoScale": false,
"enableInterNodeCommunication": false,
"networkConfiguration": {
"subnetId": "/subscriptions/path/to/my/subnet",
"dynamicVNetAssignmentScope": "none",
"publicIPAddressConfiguration": {
"provision": "BatchManaged"
}
},
"taskSlotsPerNode": 1,
"taskSchedulingPolicy": {
"nodeFillType": "Pack"
},
"identity": {
"type": "UserAssigned",
"userAssignedIdentities": {
"/subscriptions/path/to/my/user/assigned/identity": {}
}
}
}
This successfully creates the pool, but with a null identity property. Not surprisingly, any authentication relying on that user-assigned identity being in place fails.
Per the documentation, the --json-file property accepts a JSON file that conforms to the REST API body. However, the REST API body does not contain a suitable identity block.
I looked at the JSON that's POSTed to the REST API when creating the pool through the portal, and it looks very similar to what I have, except it's structured like this:
"properties": {
"id": "id value",
...etc...
},
"identity": {
"type": "UserAssigned",
...etc...
}
Making my JSON match up with that request body results in a JSON parsing error. The JSON I'm providing is syntactically correct, it just seems like it's expecting the contents of the properties section only.
There's this existing question which has a terrible link-only answer to Microsoft Q&A, where the recommendation is to add an identity block that looks exactly like the one I'm providing. Please note that as far as I can tell this question is not a duplicate of that one -- they are receiving a different error, and they didn't explicitly state that they are using the Azure CLI, just that they're trying to use "JSON".
There doesn't seem to be any definitive documentation or examples of how to use the --json-file parameter with the Azure CLI to create a batch pool that uses a user-assigned identity. If it is possible, some guidance on how to accomplish it would be most welcome.
After searching in vain for an answer to the same question, I posted a slight variation of the question on the MS support page and they came up with a working solution for our case, which seems to be near-identical to what has been asked here.
Edit:
Adding the following to the JSON file made it work in our case.
{
"type": "Microsoft.Batch/batchAccounts/pools",
"name": "TestPool",
"identity": {
"type": "UserAssigned",
"userAssignedIdentities": {"/subscriptions/<MySubscription>/resourceGroups/<MyResourceGroup>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<MyUserAssignedManagedIdentity>":{}}
},
"properties":{ All the remaining properties defining the pool itself }
}
Answer from MS support
Whenever a new release pipeline is ran in Azure DevOps, the URL Is changed.. currently my ARM template has a hard-coded URL which can be annoying to keep on adding in manually.
"cors": {
"allowedOrigins": [
"[concat('https://',parameters('storage_account_name'),'.z10.web.core.windows.net')]"
}
The only thing that changes is the 10 part in the z10 so essentially i want it to be something like
[concat('https://',parameters('storage_account_name'),'.z', '*', '.web.core.windows.net')] I dont know if something like that is valid but essentially its so that the cors policy will accept the URL regardless of the z number.
Basically speaking this is not possible, because of the CORS standard (see docs).
which allows only for exact origins, wildcard, or null.
For instance, ARM for Azure Storage is also following this pattern allowing you to put a list of exact origins or a wildcard (see ARM docs)
However, if you know your website name, in your ARM you can receive the full host and use it in your CORS:
"[reference(resourceId('Microsoft.Web/sites', parameters('SiteName')), '2018-02-01').defaultHostName]"
The same with a static website (which is your case I guess) if you know the storage account name:
"[reference(concat('Microsoft.Storage/storageAccounts/', variables('storageAccountName')), '2019-06-01', 'Full').properties.primaryEndpoints.web]"
Advance reference output manipulation
Answering on comment - if you would like to replace some characters in the output from the reference function the easiest way is to use build-in replace function (see docs)
In case you need a more advanced scenario I am pasting my solution by introducing a custom function which is removing https:// and / from the end so https://contonso.com/ is transformed to contonso.com:
"functions": [
{
"namespace": "lmc",
"members": {
"replaceUri": {
"parameters": [
{
"name": "uriString",
"type": "string"
}
],
"output": {
"type": "string",
"value": "[replace(replace(parameters('uriString'), 'https://',''), '/','')]"
}
}
}
}
],
# ...(some code)...
"resources": [
# ... (some resource)...:
"properties": {
"hostName": "[lmc.replaceUri(reference(variables('storageNameCdn')).primaryEndpoints.blob)]"
}
]
We have an index with a join datatype and the indexing speed is very slow.
At best we are indexing 100/sec, but mostly around 50/sec, the times is varying depending of the document size. We are using multiple threads with .NET Nest when indexing but both batching and single inserts are pretty slow. We have tested various batch sizes but still not getting any speed to talk about. Even with only small documents containing "metadata" it is slow, but speed will drop radically when the document size is increasing. Document size in this solution can vary from small up to 6 MB
What can we expect using the join datatype and indexing? How much penalty must we expect to get using it? We did of course try to avoid this when designing it, but we did not find any way around it. Any tips or tricks?
We are using a 3-node cluster in Azure, all with 32 GB of RAM and premium SSD disks. The Java Heap size is set to 16GB. Swapping is Disabled. Memory usage on the VM’s is stable about 60% of total, but the CPU is very low < 10 %. We are running Elasticsearch v. 6.2.3.
A short version of the mapping:
"mappings": {
"log": {
"_routing": {
"required": true
},
"properties": {
"body": {
"type": "text"
},
"description": {
"type": "text"
},
"headStepJoinField": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"head": "step"
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"statusId": {
"type": "keyword"
},
"stepId": {
"type": "keyword"
}
}
}
}
I have a Copy job should copy 100 GB of excel files between two Azure DataLake.
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "AzureDataLakeStoreSource",
"recursive": true,
"maxConcurrentConnections": 256
},
"sink": {
"type": "AzureDataLakeStoreSink",
"maxConcurrentConnections": 256
},
"enableStaging": false,
"parallelCopies": 32,
"dataIntegrationUnits": 256
},
"inputs": [
{
"referenceName": "SourceLake",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "DestLake",
"type": "DatasetReference"
}
]
}
],
my throughput is about 4 MB/s. As I read here it should be 56 MB/s. What should I do to reach this throughput?
You can use the Copy actives Performance tuning to help you tune the performance of your Azure Data Factory service with the copy activity.
Summary:
Take these steps to tune the performance of your Azure Data Factory service with the copy activity.
Establish a baseline. During the development phase, test your pipeline by using the copy activity against a representative data sample. Collect execution details and performance characteristics following copy activity monitoring.
Diagnose and optimize performance. If the performance you observe doesn't meet your expectations, identify performance bottlenecks. Then, optimize performance to remove or reduce the effect of bottlenecks.
In some cases, when you run a copy activity in Azure Data Factory, you see a "Performance tuning tips" message on top of the copy activity monitoring page, as shown in the following example. The message tells you the bottleneck that was identified for the given copy run. It also guides you on what to change to boost copy throughput.
Your file is about 100 GB size. But test files for file-based stores are multiple files with 10 GB in size. The performance may be different.
Hope this helps.
I want to create a virtual machine that anyone can launch using the ARM REST API.
How do I do that? I cannot find instructions.
Apparently it is possible to create public virtual machine images here: https://vmdepot.msopentech.com/help/contribute/vhd.html/
There are a couple of ways you could do this. Presuming you have got a website / application etc at the frontend, and it is simply the backend communication you're looking for.
Prerequisites
The option here presumes that you have an active Microsoft Azure account, and are able to create a VM there via the portal. Once you are at a stage that you can do that, you can use the REST API to create a machine instead.
Option 1
You can either use the REST API to directly create a VM by PUTing a request to this URI -
https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Compute/virtualMachines/{vm-name}?validating={true|false}&api-version={api-version}
You would need to attach a JSON document to that request that would define the machine you are creating.
{
"id":"/subscriptions/{subscription-id}/resourceGroups/myresourcegroup1/providers/Microsoft.Compute/virtualMachines/myvm1",
"name":"myvm1",
"type":"Microsoft.Compute/virtualMachines",
"location":"westus",
"tags": {
"department":"finance"
},
"properties": {
"availabilitySet": {
"id":"/subscriptions/{subscription-id}/resourceGroups/myresourcegroup1/providers/Microsoft.Compute/availabilitySets/myav1"
},
"hardwareProfile": {
"vmSize":"Standard_A0"
},
"storageProfile": {
"imageReference": {
"publisher":"MicrosoftWindowsServerEssentials",
"offer":"WindowsServerEssentials",
"sku":"WindowsServerEssentials",
"version":"latest"
},
"osDisk": {
"name":"myosdisk1",
"vhd": {
"uri":"http://mystorage1.blob.core.windows.net/vhds/myosdisk1.vhd"
},
"caching":"ReadWrite",
"createOption":"FromImage"
},
"dataDisks": [ {
"name":"mydatadisk1",
"diskSizeGB":"1",
"lun": 0,
"vhd": {
"uri" : "http://mystorage1.blob.core.windows.net/vhds/mydatadisk1.vhd"
},
"createOption":"Empty"
} ]
},
"osProfile": {
"computerName":"myvm1",
"adminUsername":"username",
"adminPassword":"password",
"customData":"",
"windowsConfiguration": {
"provisionVMAgent":true,
"winRM": {
"listeners": [ {
"protocol": "https",
"certificateUrl": "url-to-certificate"
} ]
},
"additionalUnattendContent": {
"pass":"oobesystem",
"component":"Microsoft-Windows-Shell-Setup",
"settingName":"FirstLogonCommands|AutoLogon",
"content":"<XML unattend content>"
}
"enableAutomaticUpdates":true
},
"secrets":[ {
"sourceVault": {
"id": "/subscriptions/{subscription-id}/resourceGroups/myresourcegroup1/providers/Microsoft.KeyVault/vaults/myvault1"
},
"vaultCertificates": [ {
"certificateUrl": "https://myvault1.vault.azure.net/secrets/{secretName}/{secretVersion}"
"certificateStore": "{certificateStoreName}"
} ]
} ]
},
"networkProfile": {
"networkInterfaces": [ {
"id":"/subscriptions/{subscription-id}/resourceGroups/myresourceGroup1/providers /Microsoft.Network/networkInterfaces/mynic1"
} ]
}
}
}
More details about the authentication and parameters can be found at the Azure Virtual Machine Rest documentation - Create or update a virtual machine
Option 2
Alternatively you can create an Azure Resource Manager Template, such as 101-vm-simple-linux on Azure's Github template repository
Once you have a template defined for the VM you want to deploy you can PUT another request to this URI
https://management.azure.com/subscriptions/{subscription-id}/resourcegroups/{resource-group-name}/providers/microsoft.resources/deployments/{deployment-name}?api-version={api-version}
If you copy that template file to an Azure blob, along with another file specifying any parameters it needs, and send this JSON document with the PUT request
{
"properties": {
"templateLink": {
"uri": "http://mystorageaccount.blob.core.windows.net/templates/template.json",
"contentVersion": "1.0.0.0",
},
"mode": "Incremental",
"parametersLink": {
"uri": "http://mystorageaccount.blob.core.windows.net/templates/parameters.json",
"contentVersion": "1.0.0.0",
}
}
}
You can find the documentation for this at - Create a template deployment
This is to elaborate on #Michael B's answer: To discover what images are available, you can use the VMDepot -- of course -- or you can query for all the marketplace images. Look at the publishers list first, and then from there you can decide which images you would like.
The URN value you discover will be the one you want to use in your REST call. Hope this helps...