I have a python list of Security Groups and I need to find to which EC2/RDS instance or ELB they are associated. What's the easiest way to do this in Boto3?
Also, to my best understanding one security group can be attached to several instances and one EC2 instance can have several security groups attached, so I need to find a way to identify those relationships to better clean it up. What I have right is a python list of security group objects.
This is my current code:
import boto3
import json
# regions = ["us-east-1","ap-southeast-1","ap-southeast-2","ap-northeast-1","eu-central-1","eu-west-1"]
regions = ["us-east-1"]
uncompliant_security_groups = []
for region in regions:
ec2 = boto3.resource('ec2', region_name=region)
sgs = list(ec2.security_groups.all())
for sg in sgs:
for rule in sg.ip_permissions:
# Check if list of IpRanges is not empty, source ip meets conditions
if len(rule.get('IpRanges')) > 0 and rule.get('IpRanges')[0]['CidrIp'] == '0.0.0.0/0':
if rule.get('FromPort') == None:
uncompliant_security_groups.append(sg)
if rule.get('FromPort') != None and rule.get('FromPort') < 1024 and rule.get('FromPort') != 80 and rule.get('FromPort') != 443:
uncompliant_security_groups.append(sg)
print(uncompliant_security_groups)
print(len(uncompliant_security_groups))
for sec_group in uncompliant_security_groups:
If you enable AWS Config Aggregator in the account (granted you have to pay for it):
account_id = '0123456789'
region = 'us-east-2'
sg_id = 'sg-0123456789'
relationship_data = CONFIG_CLIENT.get_aggregate_resource_config(
ConfigurationAggregatorName='agg_name',
ResourceIdentifier={
'SourceAccountId': account_id,
'SourceRegion': region,
'ResourceId': sg_id,
'ResourceType': 'AWS::EC2::SecurityGroup'
}
)]
relationship_data = relationship_data['ConfigurationItem']['relationships']
print(relationship_data)
Which should return some data like:
[
{'resourceType': 'AWS::EC2::NetworkInterface', 'resourceId': 'eni-0123456789', 'relationshipName': 'Is associated with NetworkInterface'},
{'resourceType': 'AWS::EC2::Instance', 'resourceId': 'i-0123456789', 'relationshipName': 'Is associated with Instance'},
{'resourceType': 'AWS::EC2::VPC', 'resourceId': 'vpc-0123456789', 'relationshipName': 'Is contained in Vpc'}
]
NOTE: This appears to ONLY work with AWS CONFIG AGGREGATORS! I have NO idea why this is, or if the data can be obtained from aws config by itself. However my org uses aws config so this enables this type of data for me.
Boto3 config docs:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/config.html
Related
I am a beginner with Python and with AzureML.
Currently, my task is to list all the running VMs (or Compute Instances) with status and (if running) for how long they ran.
I managed to connect to AzureML and list Subscriptions, Resource Groups and Workspaces, but I'm stuck on how to list running VMs now.
Here's the code that I have currently:
# get subscriptions list using credentials
subscription_client = SubscriptionClient(credentials)
sub_list = subscription_client.subscriptions.list()
print("Subscription ID".ljust(column_width) + "Display name")
print(separator)
for group in list(sub_list):
print(f'{group.subscription_id:<{column_width}}{group.display_name}')
subscription_id = group.subscription_id
resource_client = ResourceManagementClient(credentials, subscription_id)
group_list = resource_client.resource_groups.list()
print(" Resource Groups:")
for group in list(group_list):
print(f" {group.name}{group.location}")
print(" Workspaces:")
my_ml_client = Workspace.list(subscription_id, credentials, group.name)
for ws in list(my_ml_client):
try:
print(f" {ws}")
if ws:
compute = ComputeTarget(workspace=ws, name=group.name)
print('Found existing compute: ' + group.name)
except:()
Please note that this is more or less a learning exercise and it's not the final shape of the code, I will refactor once I get it to work.
Edit: I found an easy way to do this:
workspace = Workspace(
subscription_id=subscription_id,
resource_group=group.name,
workspace_name=ws,
)
print(workspace.compute_targets)
Edit2: If anyone stumbles on this question and is just beginning to understand Python+Azure just like I do, all this information is from official documentation (which is just hard to follow as a beginner).
The result from 'workspace.compute_targets' will contain both Compute Instances and AML Instances.
If you need to retrieve only the VMs (like I do) you need to take an extra step to filter the result like this:
if type(compute_list[vm]) == ComputeInstance:
I tried to create a function that does the following things.
(dont know if its worth mentioning but this function is invoked by another function)
connects to aws resources using boto3
gets a the number of messages available in an sqs queue
counts the number of ec2 instances
evaluates a set of conditions based on the sqs queue and ec2 instances and either do nothing or write to an sns topic.
Basically i want to publish to a message to sns topic every time the sqs queue is high and the number of ec2 instances which are digesting these, is low.
import os
import boto3
import logging
import types
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
# Create session and clients
sts_client = boto3.client('sts')
sqs_client = boto3.client('sqs')
ec2_client = boto3.client('ec2')
sns_client = boto3.client('sns')
# Call the assume_role method of the STSConnection object and pass the role ARN and a role session name.
assumed_role_object = sts_client.assume_role(
RoleArn=os.environ['ROLE_ARN'],
RoleSessionName="AssumeRoleFromCloudOperations"
)
# From the response that contains the assumed role, get the temporary credentials
credentials = assumed_role_object['Credentials']
assumed_role_session = boto3.Session(
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken']
)
# Check the queue size
def sqs():
queue_size = sqs_client.get_queue_attributes(
QueueUrl=os.environ['SQS_QUEUE_URL'],
AttributeNames=['ApproximateNumberOfMessages']
)
messages = int(queue_size["Attributes"]["ApproximateNumberOfMessages"])
return messages
# Count the number of active ec2 instances
def count_instances(ec2):
total_instances = 0
instances = ec2.instances.filter(Filters=[
{
'Instance State': 'instance-state-name',
'Values': ['running'],
'Name': 'tag:Name',
'Values': ['NameOfInstance']
},
])
for _ in instances:
total_instances += 1
return total_instances
print(f"Total number of active scan servers is: {total_instances}")
# Define the SNS Topic which will be integrated with OpsGenie
def sns():
topicArn = os.environ['SNS_ARN']
# Evaluate the set of conditions
def evaluate_conditions(context, event):
sqs()
if messages > int(os.environ['AVG_QUEUE_SIZE']) and count_instances.total_instances > int(os.environ['AVG_NR_OF_EC2_SCAN_SERVERS']):
print('False alert')
logger.info()
elif messages < int(os.environ['AVG_QUEUE_SIZE']) and count_instances.total_instances < int(os.environ['AVG_NR_OF_EC2_SCAN_SERVERS']):
print('False alert')
logger.info()
elif messages < int(os.environ['AVG_QUEUE_SIZE']) and count_instances.total_instances > int(os.environ['AVG_NR_OF_EC2_SCAN_SERVERS']):
print('False alert')
logger.info()
else:
sns.publish(TopicArn=os.environ['SNS_ARN'],
Message='sameple message',
Subject='sample subject')
print("Published to SNS Topic")
the handler is handler.evaluate_conditions
My question is how can i have some structure in this lambda function?
When i run the function i get a naming error:
{
"errorMessage": "name 'messages' is not defined",
"errorType": "NameError",
"stackTrace": [
" File \"/var/task/mdc_alert/handler.py\", line 67, in evaluate_conditions\n if messages > int(os.environ['AVG_QUEUE_SIZE']) and count_instances.total_instances > int(\n"
]
}
So it seems that i cannot use the message variable in the evaluate_conditions() function.
How can i make the "message" and "total_instances" variables usable in the evaluate_conditions() function?
I've written this function completely based on google searches, stackoverflow and boto3 docs since i don't have any experience with programming.
Is this structure any good, or does it need a complete overhaul?
Do i need to change the order of the functions, or maybe create a class?
The immediate issue is that the messages variable is not defined. Your sqs function returns a value but since you're calling it in a void context you're not actually doing anything with that value. You can fix this by changing this line:
sqs()
to this one:
messages = sqs()
I also see some issues with the count_instances function. It's expecting to receive an ec2 variable but you're calling it incorrectly from evaluate_conditions. You could either pass it the ec2_client variable or just use the ec2_client variable directly from within the function.
I suggest renaming your functions to more accurately reflect their return values:
sqs -> sqs_msg_count
count_instances -> running_ec2_count
Making these changes will allow you to refactor evaluate_conditions to shorten the if-then lines, making your code overall easier to read and follow. If you took all these suggestions into account, your code might look something like this:
# Check the queue size
def sqs_msg_count():
messages = sqs_client.get_queue_attributes(
QueueUrl=os.environ['SQS_QUEUE_URL'],
AttributeNames=['ApproximateNumberOfMessages']
)
return int(messages["Attributes"]["ApproximateNumberOfMessages"])
# Count the number of active ec2 instances
def running_instance_count():
running_instances = ec2_client.instances.filter(Filters=[
{
'Instance State': 'instance-state-name',
'Values': ['running'],
'Name': 'tag:Name',
'Values': ['NameOfInstance']
},
])
return len(running_instances)
# Evaluate the set of conditions
def evaluate_conditions(context, event):
sqs_count = sqs_msg_count()
sqs_average = int(os.environ['AVG_QUEUE_SIZE'])
ec2_count = running_instance_count()
ec2_average = int(os.environ['AVG_NR_OF_EC2_SCAN_SERVERS'])
if sqs_count > sqs_average and ec2_count > ec2_average:
print('False alert')
logger.info()
elif sqs_count < sqs_average and ec2_count < ec2_average:
print('False alert')
logger.info()
elif sqs_count < sqs_average and ec2_count > ec2_average:
print('False alert')
logger.info()
else:
sns_client.publish(
TopicArn=os.environ['SNS_ARN'],
Message='sameple message',
Subject='sample subject'
)
print("Published to SNS Topic")
I went through the official docs of google cloud but I don't have an idea how to use these to list resources of specific organization by providing the organization id
organizations = CloudResourceManager.Organizations.Search()
projects = emptyList()
parentsToList = queueOf(organizations)
while (parent = parentsToList.pop()) {
// NOTE: Don't forget to iterate over paginated results.
// TODO: handle PERMISSION_DENIED appropriately.
projects.addAll(CloudResourceManager.Projects.List(
"parent.type:" + parent.type + " parent.id:" + parent.id))
parentsToList.addAll(CloudResourceManager.Folders.List(parent))
}
organizations = CloudResourceManager.Organizations.Search()
projects = emptyList()
parentsToList = queueOf(organizations)
while (parent = parentsToList.pop()) {
// NOTE: Don't forget to iterate over paginated results.
// TODO: handle PERMISSION_DENIED appropriately.
projects.addAll(CloudResourceManager.Projects.List(
"parent.type:" + parent.type + " parent.id:" + parent.id))
parentsToList.addAll(CloudResourceManager.Folders.List(parent))
}
You can use Cloud Asset Inventory for this. I wrote this code for performing a sink in BigQuery.
import os
from google.cloud import asset_v1
from google.cloud.asset_v1.proto import asset_service_pb2
def asset_to_bq(request):
client = asset_v1.AssetServiceClient()
parent = 'organizations/{}'.format(os.getEnv('ORGANIZATION_ID'))
output_config = asset_service_pb2.OutputConfig()
output_config.bigquery_destination.dataset = 'projects/{}}/datasets/{}'.format(os.getEnv('PROJECT_ID'),
os.getEnv('DATASET'))
output_config.bigquery_destination.table = 'asset_export'
output_config.bigquery_destination.force = True
response = client.export_assets(parent, output_config)
# For waiting the finish
# response.result()
# Do stuff after export
return "done", 200
if __name__ == "__main__":
asset_to_bq('')
Be careful is you use it, the sink must be done in an empty/not existing table or set the force to true.
In my case, some minutes after the Cloud Scheduler that trigger my function and extract the data to BigQuery, I have a Scheduled Query into BigQuery that copy the data to another table, for keeping the history.
Note: It's also possible to configure an extract in Cloud Storage if you prefer.
I hope that is a starting point for you and for achieving what do you want to do.
I am able to list the project but I also want to list the folder and resources under folder and folder.name and tags and i also want to specify the organization id to resources information from a specific organization
import os
from google.cloud import resource_manager
def export_resource (organizations):
client = resource_manager.Client()
for project in client.list_projects():
print("%s, %s" % (project.project_id, project.status))
I'd like to create an EMR cluster programmatically using spot pricing to achieve some cost savings. To do this, I am trying to retrieve EMR spot instance pricing from AWS using boto3 but the only API available that I'm aware of from Boto3 is to use the ec2 client's decribe_spot_price_history call - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.describe_spot_price_history
The prices from EC2 are not indicative of the pricing for EMR as seen here - https://aws.amazon.com/emr/pricing/. The values are almost double that of EMR's.
Is there a way that I can see the spot price history for EMR similar to EC2? I have checked https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html and several other pages of documentation from AWS online about this and have found nothing.
Here's a code snippet that I use to check approximate pricing that I can use to bid on EMR instances.
max_bid_price = 0.140
min_bid_price = max_bid_price
az_choice = ''
response = ec2.describe_spot_price_history(
Filters=[{
'Name': 'availability-zone',
'Values': ['us-east-1a', 'us-east-1c', 'us-east-1d']
},
{
'Name': 'product-description',
'Values': ['Linux/UNIX (Amazon VPC)']
}],
InstanceTypes=['r5.2xlarge'],
EndTime=datetime.now(),
StartTime=datetime.now()
)
# TODO: Add more Subnets in other AZ's if picking from our existing 3 is an issue
# 'us-east-1b', 'us-east-1e', 'us-east-1f'
for spot_price_history in response['SpotPriceHistory']:
print(spot_price_history)
if float(spot_price_history['SpotPrice']) <= min_bid_price:
min_bid_price = float(spot_price_history['SpotPrice'])
az_choice = spot_price_history['AvailabilityZone']
The above fails since the prices for EC2 spot instances are a bit higher than what Amazon would charge for the normal hourly amount for EMR on-demand instances. (e.g. on demand for a cluster of that size only costs $0.126/hour, but on demand for EC2 is $0.504/hour and spot instances go for about $0.20/hour).
There's no such thing called EMR spot pricing, as already mentioned in the comment. Spot pricing is for EC2 instances. You can look at this AWS spot advisor page to find out which instance categories have lower interruption rate, and choose based on that.
Since 2017, AWS has changed the algorithm for spot pricing, "where prices adjust more gradually, based on longer-term trends in supply and demand", so you probably don't need to look at the historical spot prices. More details about that can be found here.
Nowadays, you're most likely gonna be fine using the last price (+ delta) for that instance. This can be achieved using the following code snippet:
def get_bid_price(instancetype, aws_region):
instance_types = [instancetype]
start = datetime.now() - timedelta(days=1)
ec2_client = boto3.client('ec2', aws_region)
price_dict = ec2_client.describe_spot_price_history(StartTime=start,
InstanceTypes=instance_types,
ProductDescriptions=['Linux/UNIX (Amazon VPC)']
)
if len(price_dict.get('SpotPriceHistory')) > 0:
PriceHistory = namedtuple('PriceHistory', 'price timestamp')
price_list = [PriceHistory(round(float(item.get('SpotPrice')), 3), item.get('Timestamp'))
for item in price_dict.get('SpotPriceHistory')]
price_list.sort(key=lambda tup: tup.timestamp, reverse=True)
# Maybe add 10 cents to the last spot price
bid_price = round(float(price_list[0][0] + .01), 3)
return bid_price
else:
raise ValueError('Invalid instance type: {} provided. '
'Please provide correct instance type.'.format(instancetype))
def get_instance_id_from_pip(self, pip):
subscription_id="69ff3a41-a66a-4d31-8c7d-9a1ef44595c3"
compute_client = ComputeManagementClient(self.credentials, subscription_id)
network_client = NetworkManagementClient(self.credentials, subscription_id)
print("Get all public IP")
for public_ip in network_client.public_ip_addresses.list_all():
if public_ip.ip_address == pip:
print(public_ip)
# Get id
pip_id= public_ip.id.split('/')
print("pip id : {}".format(pip_id))
rg_from_pip = pip_id[4].lower()
print("RG : {}".format(rg_from_pip))
pip_name = pip_id[-1]
print("pip name : {}".format(pip_name))
for vm in compute_client.virtual_machines.list_all():
vm_id = vm.id.split('/')
#print("vm ref id: {}".format(vm_id))
rg_from_vm = vm_id[4].lower()
if rg_from_pip == rg_from_vm:
## this is the VM in the same rg as pip
for ni_reference in vm.network_profile.network_interfaces:
ni_reference = ni_reference.id.split('/')
ni_name = ni_reference[8]
print("ni reference: {}".format(ni_reference))
net_interface = network_client.network_interfaces.get(rg_from_pip, ni_name)
print("net interface ref {}".format(net_interface))
public_ip_reference = net_interface.ip_configurations[0].public_ip_address
if public_ip_reference:
public_ip_reference = public_ip_reference.id.split('/')
ip_group = public_ip_reference[4]
ip_name = public_ip_reference[8]
print("IP group {}, IP name {}".format(ip_group, ip_name))
if ip_name == pip_name:
print("Thank god. Finallly !!!!")
print("VM ID :-> {}".format(vm.id))
return vm.id
I have above code to get the VM instance ID from Public ip but its not working. What is real surprising is that for all instances, I am getting x.public_ip_address.ip_address as None value.
I had multiple readthedocs references for python SDK of Azure. but, some how all links not working. Good job Azure :)
Some of them:
https://azure-sdk-for-python.readthedocs.io/en/v1.0.3/resourcemanagementcomputenetwork.html
https://azure-storage.readthedocs.io/en/latest/ref/azure.storage.file.fileservice.html
Second edit:
I got the answer to this and above code will return vm id given the public ip address. Though, you can see it is not absolutely optimized answer. Hence, better answers are welcome. Thanks!
Docs have moved here:
https://learn.microsoft.com/python/azure/
We made some redirection, but unfortunately it's not possible to go global redirection on RTD, so some pages are 404 :/
About your trouble, I would try to use the publicip operation group directly:
https://learn.microsoft.com/en-us/python/api/azure.mgmt.network.v2017_11_01.operations.publicipaddressesoperations?view=azure-python
You get this one in the Network client, as client.public_ip_addresses