How to check EMR spot instance price history with boto - python-3.x

I'd like to create an EMR cluster programmatically using spot pricing to achieve some cost savings. To do this, I am trying to retrieve EMR spot instance pricing from AWS using boto3 but the only API available that I'm aware of from Boto3 is to use the ec2 client's decribe_spot_price_history call - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.describe_spot_price_history
The prices from EC2 are not indicative of the pricing for EMR as seen here - https://aws.amazon.com/emr/pricing/. The values are almost double that of EMR's.
Is there a way that I can see the spot price history for EMR similar to EC2? I have checked https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html and several other pages of documentation from AWS online about this and have found nothing.
Here's a code snippet that I use to check approximate pricing that I can use to bid on EMR instances.
max_bid_price = 0.140
min_bid_price = max_bid_price
az_choice = ''
response = ec2.describe_spot_price_history(
Filters=[{
'Name': 'availability-zone',
'Values': ['us-east-1a', 'us-east-1c', 'us-east-1d']
},
{
'Name': 'product-description',
'Values': ['Linux/UNIX (Amazon VPC)']
}],
InstanceTypes=['r5.2xlarge'],
EndTime=datetime.now(),
StartTime=datetime.now()
)
# TODO: Add more Subnets in other AZ's if picking from our existing 3 is an issue
# 'us-east-1b', 'us-east-1e', 'us-east-1f'
for spot_price_history in response['SpotPriceHistory']:
print(spot_price_history)
if float(spot_price_history['SpotPrice']) <= min_bid_price:
min_bid_price = float(spot_price_history['SpotPrice'])
az_choice = spot_price_history['AvailabilityZone']
The above fails since the prices for EC2 spot instances are a bit higher than what Amazon would charge for the normal hourly amount for EMR on-demand instances. (e.g. on demand for a cluster of that size only costs $0.126/hour, but on demand for EC2 is $0.504/hour and spot instances go for about $0.20/hour).

There's no such thing called EMR spot pricing, as already mentioned in the comment. Spot pricing is for EC2 instances. You can look at this AWS spot advisor page to find out which instance categories have lower interruption rate, and choose based on that.
Since 2017, AWS has changed the algorithm for spot pricing, "where prices adjust more gradually, based on longer-term trends in supply and demand", so you probably don't need to look at the historical spot prices. More details about that can be found here.
Nowadays, you're most likely gonna be fine using the last price (+ delta) for that instance. This can be achieved using the following code snippet:
def get_bid_price(instancetype, aws_region):
instance_types = [instancetype]
start = datetime.now() - timedelta(days=1)
ec2_client = boto3.client('ec2', aws_region)
price_dict = ec2_client.describe_spot_price_history(StartTime=start,
InstanceTypes=instance_types,
ProductDescriptions=['Linux/UNIX (Amazon VPC)']
)
if len(price_dict.get('SpotPriceHistory')) > 0:
PriceHistory = namedtuple('PriceHistory', 'price timestamp')
price_list = [PriceHistory(round(float(item.get('SpotPrice')), 3), item.get('Timestamp'))
for item in price_dict.get('SpotPriceHistory')]
price_list.sort(key=lambda tup: tup.timestamp, reverse=True)
# Maybe add 10 cents to the last spot price
bid_price = round(float(price_list[0][0] + .01), 3)
return bid_price
else:
raise ValueError('Invalid instance type: {} provided. '
'Please provide correct instance type.'.format(instancetype))

Related

How to get list of running VMs from AzureML

I am a beginner with Python and with AzureML.
Currently, my task is to list all the running VMs (or Compute Instances) with status and (if running) for how long they ran.
I managed to connect to AzureML and list Subscriptions, Resource Groups and Workspaces, but I'm stuck on how to list running VMs now.
Here's the code that I have currently:
# get subscriptions list using credentials
subscription_client = SubscriptionClient(credentials)
sub_list = subscription_client.subscriptions.list()
print("Subscription ID".ljust(column_width) + "Display name")
print(separator)
for group in list(sub_list):
print(f'{group.subscription_id:<{column_width}}{group.display_name}')
subscription_id = group.subscription_id
resource_client = ResourceManagementClient(credentials, subscription_id)
group_list = resource_client.resource_groups.list()
print(" Resource Groups:")
for group in list(group_list):
print(f" {group.name}{group.location}")
print(" Workspaces:")
my_ml_client = Workspace.list(subscription_id, credentials, group.name)
for ws in list(my_ml_client):
try:
print(f" {ws}")
if ws:
compute = ComputeTarget(workspace=ws, name=group.name)
print('Found existing compute: ' + group.name)
except:()
Please note that this is more or less a learning exercise and it's not the final shape of the code, I will refactor once I get it to work.
Edit: I found an easy way to do this:
workspace = Workspace(
subscription_id=subscription_id,
resource_group=group.name,
workspace_name=ws,
)
print(workspace.compute_targets)
Edit2: If anyone stumbles on this question and is just beginning to understand Python+Azure just like I do, all this information is from official documentation (which is just hard to follow as a beginner).
The result from 'workspace.compute_targets' will contain both Compute Instances and AML Instances.
If you need to retrieve only the VMs (like I do) you need to take an extra step to filter the result like this:
if type(compute_list[vm]) == ComputeInstance:

Managing Auto Scaling Group via terraform

Let say I have a auto-scaling group which I manage via terraform. And i want that auto scaling group to scale up and scale down based on our business hours .
The TF template for managing ASG :
resource "aws_autoscaling_group" "foobar" {
availability_zones = ["us-west-2a"]
name = "terraform-test-foobar5"
max_size = 1
min_size = 1
health_check_grace_period = 300
health_check_type = "ELB"
force_delete = true
termination_policies = ["OldestInstance"]
}
resource "aws_autoscaling_schedule" "foobar" {
scheduled_action_name = "foobar"
min_size = 0
max_size = 1
desired_capacity = 0
start_time = "2016-12-11T18:00:00Z"
end_time = "2016-12-12T06:00:00Z"
autoscaling_group_name = aws_autoscaling_group.foobar.name
}
As we can see here i have to set a particular date and time for the action.
what I want is : I want to scale down on saturday night 9 pm by 10% of my current capacity, and then again want to scale up by 10% on monday morning 6 am .
How can I achieve this.
Any help is highly appreciated. Please let me know how to get through this.
The solution is not straightforward, but is doable. The required steps are:
create a Lambda function that scales down the ASG (e.g. with Boto3 and Python)
assign an IAM role with the right permissions
create a Cron trigger for "every saturday 9pm" with aws_cloudwatch_event_rule
create a aws_cloudwatch_event_target, with the previously created Cron trigger and Lambda function
repeat for scaling up
This module will probably fit your needs, you just have to code the Lambda and use the module to trigger it on a schedule.

Cleaning up AMIs and EBS Snapshots via AWS Lambda

I have created following lambda function on my local machine so I can deploy it and run it through cloudwatch event cron expression on daily basis to cleanup the desired AMI and its SnapShots. It also takes care of abandon EBS SnapShots as well.
The criteria of deleting the AMI is first find the AMI that doesn't have DoNotDelete:true tag, and if its more than 7 days old, mark it for deletion. The function exempt the AMI which is currently being used by AWS Launch Configuration.
I am sure there are few ways to optimize this lambda function and code and I would like to know how can I improve/optimize this further.
import boto3
from datetime import timedelta, datetime, timezone
import logging
import botocore
#Intialize logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def ami_cleanup(event,context):
'''Clean AMIs and its associated SnapShots which are older than 7 Days and without "DoNotDelete=true" tag in a AWS Region
Exempt AMI which is currently being used in AWS Launch Config'''
ec2 = boto3.client('ec2')
autoscaling = boto3.client('autoscaling')
ami_response = ec2.describe_images(Owners=['self'])
snapshot_response = ec2.describe_snapshots(OwnerIds=['self'])
lc_response = autoscaling.describe_launch_configurations()
amis = {}
amidnd = []
for i in ami_response['Images']:
for tag in i.get('Tags',''):
if 'DoNotDelete' in tag.values():
amidnd.append(i.get('ImageId'))
break
for ami in lc_response['LaunchConfigurations']:
if ami['ImageId'] not in amidnd:
amidnd.append(ami['ImageId'])
for i in ami_response['Images']:
if i.get('Tags') == None or i['ImageId'] not in amidnd:
amis[i.get('ImageId')] = i.get('CreationDate')
if not amis:
logger.info('No AMIs and SnapShots found to be deregister')
else:
for ami,cdate in amis.items():
if cdate < (datetime.now(timezone.utc)-timedelta(days=7)).isoformat():
logger.info('De-registering...'+ami)
ec2.deregister_image(ImageId=ami)
for snapshot in snapshot_response['Snapshots']:
if ami in snapshot.get('Description',''):
logger.info('Deleting '+snapshot.get('SnapshotId') + " of "+ami)
ec2.delete_snapshot(SnapshotId=snapshot.get('SnapshotId'))
else:
logger.info('No AMIs and SnapShots found to be older than 7 days')
break
abandon_snap_clean(ami_response,snapshot_response)
def abandon_snap_clean(ami_response,snapshot_response):
'''Clean abandon ebs snapshots of which no AMI has been found'''
snapdndids = []
for i in ami_response['Images']:
for snap in i['BlockDeviceMappings']:
if 'Ebs' in snap.keys():
snapdndids.append(snap['Ebs']['SnapshotId'])
for snapid in snapshot_response['Snapshots']:
if snapid['SnapshotId'] not in snapdndids:
try:
logger.info('Deleting abandon snapshots '+snapid['SnapshotId'])
ec2.delete_snapshot(SnapshotId=snapid['SnapshotId'])
except botocore.exceptions.ClientError as error:
if error.response['Error']['Code'] == 'InvalidSnapshot.InUse':
logger.info('SnapShotId '+snapid['SnapShotId']+' is already being used by an AMI')
else:
raise error
else:
logger.info('No abandon EBS SnapShots found to clean up')
break
else:
logger.info('No SnapShots found')
It does seem that you have a logic issue here, if you come across an image that isn't more than 7 days old, the loop breaks while there could still be other images that are older than 7 days. Switch the break to continue
if cdate < (datetime.now(timezone.utc)-timedelta(days=7)).isoformat():
logger.info('De-registering...'+ami)
ec2.deregister_image(ImageId=ami)
for snapshot in snapshot_response['Snapshots']:
if ami in snapshot.get('Description',''):
logger.info('Deleting '+snapshot.get('SnapshotId') + " of "+ami)
ec2.delete_snapshot(SnapshotId=snapshot.get('SnapshotId'))
else:
logger.info('No AMIs and SnapShots found to be older than 7 days')
continue

How to get latest Snapshot for a volume in AWS using API

I want only the latest snapshot for a specific volume.
response_v=boto3.client("ec2").describe_snapshots(Filters=[{"Name":"volume-id","Values":["vol-fffffffffff"]}])
How can it be done?
It looks like the describe_snapshots method returns the newest one first but you really shouldn't count on that.
I think you can safely rely on the StartTime field, looking for the greatest value for all snapshots returned.
Snapshots occur asynchronously; the point-in-time snapshot is created immediately
Because of that the "largest" StartTime will be the latest snapshot
I wrote this bit of code to print the snapshot_id with the latest snapshot start time. My python-fu is not the greatest but this works.
import boto3
import datetime
import pytz
utc = pytz.UTC
starttime=datetime.datetime(1,1,1,tzinfo=utc)
snap_id = ""
volume_id = "<put your volume id here or write something more elegant to pass it in>"
region = 'us-east-1'
session = boto3.Session(profile_name='default')
ec2 = session.client('ec2', region_name=region)
response = ec2.describe_snapshots(Filters=[{"Name":"volume-id","Values":[volume_id]}])
# print(response['Snapshots'])
for snap in response['Snapshots']:
if snap['StartTime'] > starttime:
snap_id = snap['SnapshotId']
starttime= snap['StartTime']
print(snap_id)
References
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html

Easiest way to match security groups to their instances in Boto 3?

I have a python list of Security Groups and I need to find to which EC2/RDS instance or ELB they are associated. What's the easiest way to do this in Boto3?
Also, to my best understanding one security group can be attached to several instances and one EC2 instance can have several security groups attached, so I need to find a way to identify those relationships to better clean it up. What I have right is a python list of security group objects.
This is my current code:
import boto3
import json
# regions = ["us-east-1","ap-southeast-1","ap-southeast-2","ap-northeast-1","eu-central-1","eu-west-1"]
regions = ["us-east-1"]
uncompliant_security_groups = []
for region in regions:
ec2 = boto3.resource('ec2', region_name=region)
sgs = list(ec2.security_groups.all())
for sg in sgs:
for rule in sg.ip_permissions:
# Check if list of IpRanges is not empty, source ip meets conditions
if len(rule.get('IpRanges')) > 0 and rule.get('IpRanges')[0]['CidrIp'] == '0.0.0.0/0':
if rule.get('FromPort') == None:
uncompliant_security_groups.append(sg)
if rule.get('FromPort') != None and rule.get('FromPort') < 1024 and rule.get('FromPort') != 80 and rule.get('FromPort') != 443:
uncompliant_security_groups.append(sg)
print(uncompliant_security_groups)
print(len(uncompliant_security_groups))
for sec_group in uncompliant_security_groups:
If you enable AWS Config Aggregator in the account (granted you have to pay for it):
account_id = '0123456789'
region = 'us-east-2'
sg_id = 'sg-0123456789'
relationship_data = CONFIG_CLIENT.get_aggregate_resource_config(
ConfigurationAggregatorName='agg_name',
ResourceIdentifier={
'SourceAccountId': account_id,
'SourceRegion': region,
'ResourceId': sg_id,
'ResourceType': 'AWS::EC2::SecurityGroup'
}
)]
relationship_data = relationship_data['ConfigurationItem']['relationships']
print(relationship_data)
Which should return some data like:
[
{'resourceType': 'AWS::EC2::NetworkInterface', 'resourceId': 'eni-0123456789', 'relationshipName': 'Is associated with NetworkInterface'},
{'resourceType': 'AWS::EC2::Instance', 'resourceId': 'i-0123456789', 'relationshipName': 'Is associated with Instance'},
{'resourceType': 'AWS::EC2::VPC', 'resourceId': 'vpc-0123456789', 'relationshipName': 'Is contained in Vpc'}
]
NOTE: This appears to ONLY work with AWS CONFIG AGGREGATORS! I have NO idea why this is, or if the data can be obtained from aws config by itself. However my org uses aws config so this enables this type of data for me.
Boto3 config docs:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/config.html

Resources