AWS Lambda, boto3 - start instances, error while testing (not traceable) - python-3.x

I am trying to create a Lambda function to automatically start/stop/reboot some instances (with some additional tasks in the future).
I created the IAM role with a policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:RebootInstances"
],
"Condition": {
"StringEquals": {
"ec2:ResourceTag/critical":"true"
}
},
"Resource": [
"arn:aws:ec2:*:<12_digits>:instance/*"
],
"Effect": "Allow"
}
]
}
The Lambda function has been granted access to the correct VPC, subnet, and security group.
I assigned the role to a new Lambda function (Python 3.9):
import boto3
from botocore.exceptions import ClientError
# instance IDs copied from my AWS Console
instances = ['i-xx1', 'i-xx2', 'i-xx3', 'i-xx4']
ec2 = boto3.client('ec2')
def lambda_handler(event, context):
print(str(instances))
try:
print('The break occurs here \u2193')
response = ec2.start_instances(InstanceIds=instances, DryRun=True)
except ClientError as e:
print(e)
if 'DryRunOperation' not in str(e):
print("You don't have permission to reboot instances.")
raise
try:
response = ec2.start_instances(InstanceIds=instances, DryRun=False)
print(response)
except ClientError as e:
print(e)
return response
I cannot find anything due to no message in the test output about where the error is. I had thought it had been a matter of time duration, then I set the time limit to 5 mins to be sure if it was a matter of time. For example:
Test Event Name
chc_lambda_test1
Response
{
"errorMessage": "2022-07-30T19:15:40.088Z e037d31d-5658-40b4-8677-1935efd3fdb7 Task timed out after 300.00 seconds"
}
Function Logs
START RequestId: e037d31d-5658-40b4-8677-1935efd3fdb7 Version: $LATEST
['i-xx', 'i-xx', 'i-xx', 'i-xx']
The break occurs here ↓
END RequestId: e037d31d-5658-40b4-8677-1935efd3fdb7
REPORT RequestId: e037d31d-5658-40b4-8677-1935efd3fdb7 Duration: 300004.15 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 79 MB Init Duration: 419.46 ms
2022-07-30T19:15:40.088Z e037d31d-5658-40b4-8677-1935efd3fdb7 Task timed out after 300.00 seconds
Request ID
e037d31d-5658-40b4-8677-1935efd3fdb7
I had tried increasing the Lambda memory too, but it hasn't worked (it is not the case, since Max Memory Used: 79 MB).

The main reason the issue occurred, is the lack of internet access to the subnet assigned to the Lambda function. I have added (as Ervin Szilagyi suggested) an endpoint in the VPC (with an assignment to the subnet and security group).
The next step was to provide authorization - thanks to this idea Unauthorized operation error occurs when using Boto3 to launch an EC2 instance with an IAM role, I added the IAM access key and secret key to the client invocation.
ec2 = boto3.client(
'ec2',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
)
However, please be careful with security settings, I am a new user (and working on my private projects at the moment), therefore, you shouldn't take this solution as secure.

Related

AWS Lambda basic print function

Why am I unable to make an AWS Lambda python print function in console? It shows successfully executed but in results I never see my desired print words.
I used this code and it showed following execution result-
target = "blue"
prediction = "red"
print(file_name,target,prediction, (lambda: '+' if target==prediction else '-')) ```
**Execution result-**
```Response:
{
"statusCode": 200,
"body": "\"Hello from Lambda!\""
}
Request ID:
"xxxxxxx"
Function logs:
START RequestId: xxxxxx Version: $LATEST
END RequestId: xxxxxx
REPORT RequestId: xxxx Duration: 1.14 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 52 MB
If your AWS Lambda function uses Python, then any print() statement will be sent to the logs.
The logs are displayed when a function is manually run in the console. Also, logs are sent the Amazon CloudWatch Logs for later reference.
Ensure that your Lambda function has been assigned the AWSLambdaBasicExecutionRole, which includes permission to write the CloudWatch Logs.

Azure-ML Deployment does NOT see AzureML Environment (wrong version number)

I've followed the documentation pretty well as outlined here.
I've setup my azure machine learning environment the following way:
from azureml.core import Workspace
# Connect to the workspace
ws = Workspace.from_config()
from azureml.core import Environment
from azureml.core import ContainerRegistry
myenv = Environment(name = "myenv")
myenv.inferencing_stack_version = "latest" # This will install the inference specific apt packages.
# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..."
myenv.docker.arguments = None
# Environment variable (I need python to look at folders
myenv.environment_variables = {"PYTHONPATH":"/root"}
# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python"
from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep
myenv.register(workspace=ws) # works!
I have a score.py file configured for inference (not relevant to the problem I'm having)...
I then setup inference configuration
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
I setup my compute cluster:
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException
# Choose a name for your cluster
aks_name = "theclustername"
# Check to see if the cluster already exists
try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")
aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output=True)
from azureml.core.webservice import AksWebservice
# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=3,
cpu_cores=4,
memory_gb=10)
Everything succeeds; then I try and deploy the model for inference:
from azureml.core.model import Model
model = Model(ws, name="thenameofmymodel")
# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'
# Deploy the model
aks_service = Model.deploy(ws,
aks_service_name,
models=[model],
inference_config=inference_config,
deployment_config=gpu_aks_config,
deployment_target=aks_target,
overwrite=True)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)
And it fails saying that it can't find the environment. More specifically, my environment version is version 11, but it keeps trying to find an environment with a version number that is 1 higher (i.e., version 12) than the current environment:
FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here:
Error:
{
"code": "BadRequest",
"statusCode": 400,
"message": "The request is invalid",
"details": [
{
"code": "EnvironmentDetailsFetchFailedUserError",
"message": "Failed to fetch details for Environment with Name: myenv Version: 12."
}
]
}
I have tried to manually edit the environment JSON to match the version that azureml is trying to fetch, but nothing works. Can anyone see anything wrong with this code?
Update
Changing the name of the environment (e.g., my_inference_env) and passing it to InferenceConfig seems to be on the right track. However, the error now changes to the following
Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
"code": "DeploymentFailed",
"statusCode": 404,
"message": "Deployment not found"
}
Solution
The answer from Anders below is indeed correct regarding the use of azure ML environments. However, the last error I was getting was because I was setting the container image using the digest value (a sha) and NOT the image name and tag (e.g., imagename:tag). Note the line of code in the first block:
myenv.docker.base_image = "4fb3..."
I reference the digest value, but it should be changed to
myenv.docker.base_image = "imagename:tag"
Once I made that change, the deployment succeeded! :)
One concept that took me a while to get was the bifurcation of registering and using an Azure ML Environment. If you have already registered your env, myenv, and none of the details of the your environment have changed, there is no need re-register it with myenv.register(). You can simply get the already register env using Environment.get() like so:
myenv = Environment.get(ws, name='myenv', version=11)
My recommendation would be to name your environment something new: like "model_scoring_env". Register it once, then pass it to the InferenceConfig.

CloudFormation stack deletion failing to remove VPC

I have created aws infrastructure with collection EC2, Redshift, VPC etc. via CLOUDFORMATION. Now I want to delete it in particular reverse order. Exa. All resources are dependent on VPC. VPC should be deleted at the end. But somehow every stack is deleting but VPC stack is not deleting via python BOTO3.It shows some subnet or network interface dependency error. But when I try to delete via console, It deletes it successfully.
Has anyone faced this issue?
I have tried to delete everyting like loadbalancer which is attached to it. But still VPC is not deleting.
AWS CloudFormation creates a dependency graph between resources based upon DependsOn references in the template and references between resources.
It then tries to deploy resources in parallel, but takes dependencies into account.
For example, a Subnet might be defined as:
Subnet1:
Type: AWS::EC2::Subnet
Properties:
CidrBlock: 10.0.0.0/24
VpcId: !Ref ProdVPC
In this situation, there is an explicit reference to ProdVPC, so CloudFormation will only create Subnet1 after ProdVPC has been created.
When a CloudFormation stack is deleted, the reverse logic is applied. In this case, Subnet1 will be deleted before ProdVPC is deleted.
However, CloudFormation is not aware of resources created outside of the stack. This means that if a resource (eg an Amazon EC2 instance) is created inside the Subnet, then stack deletion will fail because the Subnet cannot be deleted while there is an EC2 instance using it (or, more accurately, an ENI is attached to it).
In such situations, you will need to manually delete the resources that are causing the "delete failure" and then try the delete command again.
A good way to find such resources is to look in the Network Interfaces section of the EC2 management console. Make sure that there are no interfaces connected to the VPC.
As you specified that you are having issues with deleting VPC within stacks containing lambdas which themselves are in VPC, this most probably could be because of the network interfaces being generated by lambdas to connect to other resources in the VPC.
Technically these network interfaces should be auto-deleted when lambdas are undeployed from the stack but in my experience, I have observed orphaned ENI's which doesn't let the VPC be undeployed.
For this reason, I created a custom resource backed lambda which cleans up the ENI's after all lambdas within VPC's have been undeployed.
This is the cloud formation part where you setup the custom resource and pass the VPC ID
##############################################
# #
# Custom resource deleting net interfaces #
# #
##############################################
NetInterfacesCleanupFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src
Handler: cleanup/network_interfaces.handler
Role: !GetAtt BasicLambdaRole.Arn
DeploymentPreference:
Type: AllAtOnce
Timeout: 900
PermissionForNewInterfacesCleanupLambda:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:invokeFunction
FunctionName:
Fn::GetAtt: [ NetInterfacesCleanupFunction, Arn ]
Principal: lambda.amazonaws.com
InvokeLambdaFunctionToCleanupNetInterfaces:
DependsOn: [PermissionForNewInterfacesCleanupLambda]
Type: Custom::CleanupNetInterfacesLambda
Properties:
ServiceToken: !GetAtt NetInterfacesCleanupFunction.Arn
StackName: !Ref AWS::StackName
VPCID:
Fn::ImportValue: !Sub '${MasterStack}-Articles-VPC-Ref'
Tags:
'owner': !Ref StackOwner
'task': !Ref Task
And this is the corresponding lambda. This lambda tries 3 times to detach and delete orphaned network interfaces and if fails if it can't which means there's still a lambda which is generating new network interfaces and you need to debug for that.
import boto3
from botocore.exceptions import ClientError
from time import sleep
# Fix this wherever your custom resource handler code is
from common import cfn_custom_resources as csr
import sys
MAX_RETRIES = 3
client = boto3.client('ec2')
def handler(event, context):
vpc_id = event['ResourceProperties']['VPCID']
if not csr.__is_valid_event(event, context):
csr.send(event, context, FAILED, validate_response_data(result))
return
elif event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
result = {'result': 'Don\'t trigger the rest of the code'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
return
try:
# Get all network intefaces for given vpc which are attached to a lambda function
interfaces = client.describe_network_interfaces(
Filters=[
{
'Name': 'description',
'Values': ['AWS Lambda VPC ENI*']
},
{
'Name': 'vpc-id',
'Values': [vpc_id]
},
],
)
failed_detach = list()
failed_delete = list()
# Detach the above found network interfaces
for interface in interfaces['NetworkInterfaces']:
detach_interface(failed_detach, interface)
# Try detach a second time and delete each simultaneously
for interface in interfaces['NetworkInterfaces']:
detach_and_delete_interface(failed_detach, failed_delete, interface)
if not failed_detach or not failed_delete:
result = {'result': 'Network interfaces detached and deleted successfully'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
else:
result = {'result': 'Network interfaces couldn\'t be deleted completely'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
# print(response)
except Exception:
print("Unexpected error:", sys.exc_info())
result = {'result': 'Some error with the process of detaching and deleting the network interfaces'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
def detach_interface(failed_detach, interface):
try:
if interface['Status'] == 'in-use':
detach_response = client.detach_network_interface(
AttachmentId=interface['Attachment']['AttachmentId'],
Force=True
)
# Sleep for 1 sec after every detachment
sleep(1)
print(f"Detach response for {interface['NetworkInterfaceId']}- {detach_response}")
if 'HTTPStatusCode' not in detach_response['ResponseMetadata'] or \
detach_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_detach.append(detach_response)
except ClientError as e:
print(f"Exception details - {sys.exc_info()}")
def detach_and_delete_interface(failed_detach, failed_delete, interface, retries=0):
detach_interface(failed_detach, interface)
sleep(retries + 1)
try:
delete_response = client.delete_network_interface(
NetworkInterfaceId=interface['NetworkInterfaceId'])
print(f"Delete response for {interface['NetworkInterfaceId']}- {delete_response}")
if 'HTTPStatusCode' not in delete_response['ResponseMetadata'] or \
delete_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_delete.append(delete_response)
except ClientError as e:
print(f"Exception while deleting - {str(e)}")
print()
if retries <= MAX_RETRIES:
if e.response['Error']['Code'] == 'InvalidNetworkInterface.InUse' or \
e.response['Error']['Code'] == 'InvalidParameterValue':
retries = retries + 1
print(f"Retry {retries} : Interface in use, deletion failed, retrying to detach and delete")
detach_and_delete_interface(failed_detach, failed_delete, interface, retries)
else:
raise RuntimeError("Code not found in error")
else:
raise RuntimeError("Max Number of retries exhausted to remove the interface")
The link to the lambda is https://gist.github.com/revolutionisme/8ec785f8202f47da5517c295a28c7cb5
More information about configuring lambdas in a VPC - https://docs.aws.amazon.com/lambda/latest/dg/vpc.html

How do I get an authorized user for PostText API call for a Lex bot runtime

Sorry for the long post. I am trying to call a Lex bot with the PostText runtime API with my lambda function. However when I test this call then it returns that the userID is not authorized to use this. This is the error message I receive:
Response:
{
"errorMessage": "An error occurred (AccessDeniedException) when calling the PostText operation: User: arn:aws:sts::981709171824:assumed-role/lambda_basic_execution/OrchestratorAPIApp is not authorized to perform: lex:PostText on resource: arn:aws:lex:us-east-1:981709171824:bot:SupportBot_BookCab:SupportBot_BookCab",
"errorType": "ClientError",
"stackTrace": [
[
"/var/task/lambda_function.py",
18,
"lambda_handler",
"inputText= userInput"
],
[
"/var/runtime/botocore/client.py",
314,
"_api_call",
"return self._make_api_call(operation_name, kwargs)"
],
[
"/var/runtime/botocore/client.py",
612,
"_make_api_call",
"raise error_class(parsed_response, operation_name)"
]
]
}
Request ID:
"677f1820-6ed2-11e8-b891-33ab1951c65f"
Function Logs:
START RequestId: 677f1820-6ed2-11e8-b891-33ab1951c65f Version: $LATEST
An error occurred (AccessDeniedException) when calling the PostText operation: User: arn:aws:sts::981709171824:assumed-role/lambda_basic_execution/OrchestratorAPIApp is not authorized to perform: lex:PostText on resource: arn:aws:lex:us-east-1:981709171824:bot:SupportBot_BookCab:SupportBot_BookCab: ClientError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 18, in lambda_handler
inputText= userInput
File "/var/runtime/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the PostText operation: User: arn:aws:sts::981709171824:assumed-role/lambda_basic_execution/OrchestratorAPIApp is not authorized to perform: lex:PostText on resource: arn:aws:lex:us-east-1:981709171824:bot:SupportBot_BookCab:SupportBot_BookCab
END RequestId: 677f1820-6ed2-11e8-b891-33ab1951c65f
REPORT RequestId: 677f1820-6ed2-11e8-b891-33ab1951c65f Duration: 325.25 ms Billed Duration: 400 ms Memory Size: 128 MB Max Memory Used: 31 MB
This is my code to calling the API:
import boto3
def lambda_handler(event, context):
responderName = event["DestinationBot"]
userId = event["RecipientID"]
userInput = event["message"]["text"]
client = boto3.client('lex-runtime')
response = client.post_text(
botName=responderName,
botAlias=responderName,
userId=userId,
sessionAttributes={
},
requestAttributes={
},
inputText= userInput
)
This is my sample test input:
{
"DestinationBot": "SupportBot_BookCab",
"RecipientID": "12345",
"message": {
"text": "book me a cab"
}
}
The userID of PostText is the way you persist the conversation back and forth between the user and Lex. It can be anything that you can identify the user by in their incoming request that is consistent and unique to them, at least for that session.
From AWS PostText Docs:
userID
The ID of the client application user. Amazon Lex uses this to identify a user's conversation with your bot. At runtime, each request must contain the userID field.
...
Length Constraints: Minimum length of 2. Maximum length of 100.
Pattern: [0-9a-zA-Z._:-]+
So if a user is using Facebook messenger, they will have a Facebook ID that is passed with their messages and you can use that as their userID.
If a user is using Twilio-SMS, they will have a phone number passed with their messages and you can use that as their userID.
Your code is currently taking event["RecipientID"] and using that as a userID. But the RecipientID from an incoming message is yourself, the receiver of the incoming message.
Your error is telling you that
... User: arn:aws:sts::XXXXXXXXXX:assumed-role/lambda_basic_execution/OrchestratorAPIApp
So the userID = event["RecipientID"] = 'arn:aws:sts::XXXXXXXX:assumed-role/lambda_basic_execution/OrchestratorAPIApp'
You definitely don't want the Recipient ID to be used.
Instead you want the sender's ID to be the userID. Something like:
userId = event["SenderID"]
That might not be the exact code you use, its just an example. You should be able to view the incoming request, and locate something in there to use as a proper userID as I explained above with Facebook and Twilio.

Python Step Functions API: get_activity_task seems to always timeout

I've got a lambda function like this:
import boto3
import os
import json
step_functions = boto3.client('stepfunctions')
workers_topic = boto3.resource('sns').Topic(os.environ.get("WORKERS_TOPIC_ARN"))
def test_push_to_workers_sns(event, context):
activity_response = \
step_functions.get_activity_task(
activityArn=os.environ.get("ACKNOWLEDGE_ACTIVITY_ARN"),
workerName='test_push_to_workers_sns'
)
task_token, input_ = activity_response['taskToken'], activity_response['input']
print(f"Task token is {task_token}")
print(f"Input is {input}")
if not task_token:
print("No activity found")
return
workers_topic.publish(Message="blah blah")
When I set off an execution of the step function I have and it reaches the activity, I've repeatedly checked that running aws stepfunctions get-activity-task --activity-arn <ACKNOWLEDGE_ACTIVITY_ARN> on my terminal returns a taskToken and input that are both correct. However this lambda function seems to always time out regardless of whether or not the activity is running (I've got my timeout value set to 1 min 15 secs on the lambda function, and the activity state on the step function's timeout at 1 hour)
I checked this case using following CloudFormation template and it works:
AWSTemplateFormatVersion: "2010-09-09"
Description: Stack creating AWS Step Functions state machine and lambda function calling GetActivityTask.
Resources:
LambdaFunction:
Type: AWS::Lambda::Function
Properties:
Handler: "index.handler"
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import boto3
import os
import json
step_functions = boto3.client('stepfunctions')
workers_topic = boto3.resource('sns').Topic(os.environ.get("WORKERS_TOPIC_ARN"))
def handler(event, context):
activity_response = step_functions.get_activity_task(
activityArn=os.environ.get("ACKNOWLEDGE_ACTIVITY_ARN"),
workerName='test_push_to_workers_sns'
)
if 'taskToken' not in activity_response:
return
task_token, task_input = activity_response['taskToken'], activity_response['input']
print(f"Task token is {task_token}")
print(f"Input is {input}")
workers_topic.publish(Message="blah blah")
step_functions.send_task_success(
taskToken=task_token,
output=task_input
)
Runtime: "python3.6"
Timeout: 25
Environment:
Variables:
WORKERS_TOPIC_ARN: !Ref WorkersTopic
ACKNOWLEDGE_ACTIVITY_ARN: !Ref AcknowledgeActivity
StateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
RoleArn: !GetAtt StatesExecutionRole.Arn
DefinitionString: !Sub
- >
{
"Comment": "State Machine for GetActivityTask testing purposes.",
"StartAt": "FirstState",
"States": {
"FirstState": {
"Type": "Task",
"Resource": "${ACKNOWLEDGE_ACTIVITY_ARN}",
"End": true
}
}
}
- ACKNOWLEDGE_ACTIVITY_ARN: !Ref AcknowledgeActivity
AcknowledgeActivity:
Type: AWS::StepFunctions::Activity
Properties:
Name: !Sub ${AWS::AccountId}-AcknowledgeActivity
WorkersTopic:
Type: AWS::SNS::Topic
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
Policies:
- PolicyName: StepFunctionsAccess
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- states:GetActivityTask
- states:SendTaskFailure
- states:SendTaskSuccess
Resource: arn:aws:states:*:*:*
- PolicyName: SNSAccess
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- SNS:Publish
Resource: arn:aws:sns:*:*:*
StatesExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- !Sub states.${AWS::Region}.amazonaws.com
Action: sts:AssumeRole
Path: "/"
Policies: []
I created execution manually from Step Functions console and executed Lambda manually from Lambda console.
Keep in mind that after you take taskToken using GetActivityTask the execution related with it is waiting for response (SendTaskSuccess or SendTaskFailure) until it reaches timeout (default is very long). So if you have taken token before then that execution isn't available for GetAcitivtyTask. You can lookup status of execution in Step Functions console looking at events for specific execution.
You should call SendTaskSuccess or SendTaskFailure from your code after getting token from GetActivityTask (otherwise execution will be hanging until it reaches timeout or is stopped).
Aside from original question: GetActivityTask is not designed to be called from Lambda. You can pass Lambda Function as resource to state machine (instead of activity) and it will be called when execution reaches specified state (event in handler will contain execution state). Activities should be used only for long-running jobs on dedicated machines (EC2, ECS). I should also point that there are service limits for GetActivityTask calls (25 RPS with bucket of size 1000) and Lambda-based states are limited basically only by transition count limit (400 per second with bucket of size 800). You can read more about step function limits here: https://docs.aws.amazon.com/step-functions/latest/dg/limits.html

Resources