DynamoDB scan not returning desired output

DynamoDB scan not returning desired output - python-3.x

I have a simple python script that is scanning a DynamoDB table. The table holds ARNs for all the accounts I own. There is one primary key "ARNs" of data type string. When I scan the table, I would like to only get the ARN string returned. I am having trouble finding anything in the boto3 documentation that can accomplish this. Below is my code, the returned output, and the desired output.
CODE:
import boto3
dynamo = boto3.client('dynamodb')
# Scans Dynamo for all account role ARNs
def get_arns():
response = dynamo.scan(TableName='AllAccountARNs')
print(response)
get_arns()
OUTPUT:
{'ARNs': {'S': 'arn:aws:iam::xxxxxxx:role/custom_role'}},
{'ARNs': {'S': 'arn:aws:iam::yyyyyyy:role/custom_role'}},
{'ARNs': {'S': 'arn:aws:iam::zzzzzzz:role/custom_role'}}
DESIRED OUPUT:
arn:aws:iam::xxxxxxx:role/custom_role
arn:aws:iam::yyyyyyy:role/custom_role
arn:aws:iam::zzzzzzz:role/custom_role

Here's an example of how to do this with a boto3 DynamoDB Client:
import boto3
ddb = boto3.client('dynamodb')
rsp = ddb.scan(TableName='AllAccountARNs')
for item in rsp['Items']:
print(item['ARNs']['S'])
Here's the same thing, but using a boto3 DynamoDB Table Resource:
import boto3
dynamodb = boto3.resource('dynamodb')
tbl = dynamodb.Table('AllAccountARNs')
rsp = tbl.scan()
for item in rsp['Items']:
print(item['ARNs'])
Note that these examples do not handle large result sets. If LastEvaluatedKey is present in the response, you will need to paginate the result set. See the boto3 documentation.
For more information on Client vs. Resource, see here.

Related

Python boto3 get item with specific non-parition-key attribute value

My AWS dynamoDB has id as Partition key and there is no Sort key. The following does not return the existing record from the table:
response = producttable.scan(FilterExpression=Attr('title').eq("My Product"))
response['ScannedCount'] is less than the total count of the table.

You need to pagínate.
For example code look at https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithScans/scan_paginate.py

Not able to query AWS Glue/Athena views in Databricks Runtime ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;']

Attempting to read a view which was created on AWS Athena (based on a Glue table that points to an S3's parquet file) using pyspark over a Databricks cluster throws the following error for an unknown reason:
java.lang.IllegalArgumentException: Can not create a Path from an empty string;
The first assumption was that access permissions are missing, but that wasn't the case.
When keep researching, I found the following Databricks' post about the reason for this issue: https://docs.databricks.com/data/metastores/aws-glue-metastore.html#accessing-tables-and-views-created-in-other-system

I was able to come up with a python script to fix the problem. It turns out that this exception occurs because Athena and Presto store view's metadata in a format that is different from what Databricks Runtime and Spark expect. You'll need to re-create your views through Spark
Python script example with execution example:
import boto3
import time
def execute_blocking_athena_query(query: str, athenaOutputPath, aws_region):
athena = boto3.client("athena", region_name=aws_region)
res = athena.start_query_execution(QueryString=query, ResultConfiguration={
'OutputLocation': athenaOutputPath})
execution_id = res["QueryExecutionId"]
while True:
res = athena.get_query_execution(QueryExecutionId=execution_id)
state = res["QueryExecution"]["Status"]["State"]
if state == "SUCCEEDED":
return
if state in ["FAILED", "CANCELLED"]:
raise Exception(res["QueryExecution"]["Status"]["StateChangeReason"])
time.sleep(1)
def create_cross_platform_view(db: str, table: str, query: str, spark_session, athenaOutputPath, aws_region):
glue = boto3.client("glue", region_name=aws_region)
glue.delete_table(DatabaseName=db, Name=table)
create_view_sql = f"create view {db}.{table} as {query}"
execute_blocking_athena_query(create_view_sql, athenaOutputPath, aws_region)
presto_schema = glue.get_table(DatabaseName=db, Name=table)["Table"][
"ViewOriginalText"
]
glue.delete_table(DatabaseName=db, Name=table)
spark_session.sql(create_view_sql).show()
spark_view = glue.get_table(DatabaseName=db, Name=table)["Table"]
for key in [
"DatabaseName",
"CreateTime",
"UpdateTime",
"CreatedBy",
"IsRegisteredWithLakeFormation",
"CatalogId",
]:
if key in spark_view:
del spark_view[key]
spark_view["ViewOriginalText"] = presto_schema
spark_view["Parameters"]["presto_view"] = "true"
spark_view = glue.update_table(DatabaseName=db, TableInput=spark_view)
create_cross_platform_view("<YOUR DB NAME>", "<YOUR VIEW NAME>", "<YOUR VIEW SQL QUERY>", <SPARK_SESSION_OBJECT>, "<S3 BUCKET FOR OUTPUT>", "<YOUR-ATHENA-SERVICE-AWS-REGION>")
Again, note that this script keeps your views compatible with Glue/Athena.
References:
https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/issues/29
https://docs.databricks.com/data/metastores/aws-glue-metastore.html#accessing-tables-and-views-created-in-other-system

Autocreate tables in Bigquery for multiple CSV files

I want to generate tables automatically in Bigquery whenever a file is uploaded in storage bucket using cloud function in python.
For example- if sample1.csv file is uploaded to bucket then a sample1 table will be created in Bigquery.
How to automate it using cloud function using Python i tried with below code but was able to generate 1 table and all data got appended to that table, how to proceed
def hello_gcs(event, context):
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "test_project.test_dataset.test_Table"
job_config = bigquery.LoadJobConfig(
autodetect=True,
skip_leading_rows=1,
# The source format defaults to CSV, so the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://test_bucket/*.csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Processing file: {file['name']}.")

Sounds like you need to do three things:
Extract the name of the CSV file/object from the notification event you're receiving to fire your function.
Update the table_id in your example code to set the table name based on the filename you extracted in the first step.
Update the uri in your example code to only use the single file as the input. As written, your example attempts to load data from all matching CSV objects in GCS to the table.

Insert items in DynamoDB using lambdas without losing data

I am new to AWS and I am trying to load data into a DynamoDB using lambda functions and python. The problem I have is the following: When I try to load a record into a table, the items that have the same Partition Key as the element I'm trying to insert are removed from the table. This is the code that I'm using (I got it from the AWS documentation):
import boto3
from pprint import pprint
def put_car(car_id, car_type, message, dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', region_name='eu-west-1')
table = dynamodb.Table('Cars')
response = table.put_item(
Item={
'car_type': car_type,
'car_id': car_id,
'message': message,
}
)
return response
def lambda_handler(event, context):
car_resp = put_car("1", "Cartype1",
"Car 1")
print("Put car succeeded:")
pprint(car_resp)
A possible solution would be to read all the records first and load them all again, including the record I wanted to insert, but this solution seems quite inefficient and I think there may be some easier way to do it.

Python dict datatype error while after reading message from AWS SQS and Put it into AWS DynamoDB

My use case is to take a JSON message from SQS body and insert data into DynamoDB
Using the lambda function in python.
the issue is I am able to read and print the JSON message from SQS queue into cloud watch log but when I try to insert the same JSON in dynamoDB it gives below Error
Invalid type for parameter Item, value: {'name': 2}, type: class 'str', valid types: class 'dict'
Below is the lambda code I am using and an error occurred at line number 12 where I am trying to insert using put_item
import json
import boto3
dynamodb = boto3.resource('dynamodb')
dynamoTable = dynamodb.Table('message')
def lambda_handler(event, context):
for record in event['Records']:
data1 = record["body"]
jsondata1 = json.loads(data1)
print(jsondata1)
dynamoTable.put_item(Item=jsondata1)
However, it is able to print the SQS JSON to cloud watch log as below

after so many R&D i am able to find the solution is by splitting the string by comma and then recreating an json which will create json of dict data type and not string
Below is the code for the same solution
import json
import boto3
import ast
dynamodb = boto3.resource('dynamodb')
dynamoTable = dynamodb.Table('message')
def lambda_handler(event, context):
for record in event['Records']:
data1 = record["body"]
jsondata1 = json.loads(data1)
mess1 = jsondata1["Message"]
id = jsondata1["MessageId"]
jsonmess = json.loads(mess1)
s = jsonmess.replace("{" ,"")
finalstring = s.replace("}" , "")
split = finalstring.split(",")
dict = {'messageID':id}
for x in split:
keyvalue = x.split(":")
print(keyvalue)
dict[keyvalue[0]]=keyvalue[1]
dynamoTable.put_item(Item=dict)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

DynamoDB scan not returning desired output - python-3.x

Related

Python boto3 get item with specific non-parition-key attribute value

Not able to query AWS Glue/Athena views in Databricks Runtime ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;']

Autocreate tables in Bigquery for multiple CSV files

Insert items in DynamoDB using lambdas without losing data

Python dict datatype error while after reading message from AWS SQS and Put it into AWS DynamoDB

Categories

Resources