Setting environment variable in python for google bigquery in Mac - python-3.x

Before calling the Jupyter Notebook, i run the below code in Terminal for Google Application Credentials :
export GOOGLE_APPLICATION_CREDENTIALS="/Users/mac/Desktop/Bigquery-Key.json"
Then set the below configuration in Jupyter Notebook :
%load_ext google.cloud.bigquery
# Imports the Google Cloud Client Library
from google.cloud import bigquery
# Instantiates a Client for Bigquery Service
bigquery_client = bigquery.Client()
Now, i wanted to write a Python script(.py file) which will do both the tasks instead of using Terminal.
How can it be done ? Kindly advise ?
Thanks

You can change the environment within a Python script. The environment is stored in the dictionary os.environ:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/mac/Desktop/Bigquery-Key.json"

Related

How to directly read excel file from s3 with pandas in airflow dag?

I am trying to read an excel file from s3 inside an aiflow dag with python, but it does not seem to work. It is very weird because it works when I read it from outside airflow with pd.read_excel(s3_excel_path).
What I did :
Set AWS credential in airflow (this works well as I can list my s3 bucket)
Install pandas, s3fs in my Docker environment where I run Airflow
Try to read the file with pd.read_excel(s3_excel_path)
As I said, it works when I try it outside of Airflow. Moreover, I don't get any error, the dag just continues to run undefinitely (at the step where it is supposed to read the file) and nothing happens, even if I wait 20 minutes.
(I would like to avoir to download the file from s3, process it and then upload it back to s3, which is why I am trying to read it directly from s3)
Note: I does not work with csv as well.
EDIT : Likewise, I can't save my dataframe directly to S3 with df.to_csv('s3_path') in airflow dag while I can do it in python
To read data files stored in S3 using pandas, you have two options, download them using boto3 (or AWS CLI) and read local files, which is the solution you are not locking for, and use s3fs API supported by pandas:
import os
import pandas as pd
AWS_S3_BUCKET = os.getenv("AWS_S3_BUCKET")
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")
key = "path/to/excel/file"
books_df = pd.read_excel(
f"s3://{AWS_S3_BUCKET}/{key}",
storage_options={
"key": AWS_ACCESS_KEY_ID,
"secret": AWS_SECRET_ACCESS_KEY,
"token": AWS_SESSION_TOKEN,
},
)
to use this solution, you need to install s3fs and apache-airflow-providers-amazon
pip install s3fs
pip install apache-airflow-providers-amazon

The exe file of python app creates the database but doesn't create tables in some PCs

I have implemented an application using python, PostgreSQL, pyqt5, Sqlalchemy and made an execution file for it through pyInstaller.
I tried installing this app on some laptops which had PostgreSQL 13 installed already on them. But there is a problem.
In some Laptops it's Ok and everything runs successfully and the database is created along with its tables on PostgreSQL, we can check it through Pgadmin 4 and we can work with the application successfully, but in some other laptops the database is created but not its tables and so the console stops and nothing appears and when we check Pgadmin there only is the database name not its tables.
P.S: the systems are Windows 10 and Windows 7.
I have no idea what to check or what to do I would appreciate it if anyone can give me any ideas.
the following code is base.py:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy_utils import database_exists, create_database
from sqlalchemy import Column
engine = create_engine('postgresql://postgres:123#localhost:5432/db23')
if not database_exists(engine.url):
create_database(engine.url)
Session = sessionmaker(bind=engine)
Base = declarative_base()
and the following function is called in initializer function of the app:
def the_first_time_caller(self):
session = Session()
# 2 - generate database schema
Base.metadata.create_all(engine) # create tables in the database
session.commit()
session.close()
with Updating Python and downgrading Sqlalchemy, It runs successfully now.

How to get execution ID for Google Cloud Functions triggered from HTTP?

I am trying to write logs to Logging from Python applications by using Cloud Logging API Cloud client library with "execution ID" that as same as google's default value.
logger setup:
from google.cloud import logging
from google.cloud.logging.resource import Resource
log_client = logging.Client()
# This is the resource type of the log
log_name = 'cloudfunctions.googleapis.com%2Fcloud-functions'
# Inside the resource, nest the required labels specific to the resource type
res = Resource(type="cloud_function",
labels={
"function_name": "my-function",
"region": "asia-east2"
})
logger = log_client.logger(log_name.format("my-project"))
write log:
logger.log_struct({"message": request.remote_addr}, resource=res, severity='INFO')
It's currently not possible to do this using the purely the Cloud Function Framework itself, but you can try to extract the executionId from the request itself by using the following:
request.headers.get('function-execution-id')
I found an issue in Cloud Functions Github tracking the implementation of a native way to get those values, you can follow this thread for updates, if you'd like.
I had the same issue using an older version of google-cloud-logging. I was able to get this functional using the default python logging module. In a cloud function running python 3.8 and google-cloud-logging==2.5.0, the executionId is correctly logged with logs, as well as the severity within stackdriver.
main.py:
# Imports the Cloud Logging client library
import google.cloud.logging
# Instantiates a client
client = google.cloud.logging.Client()
# Retrieves a Cloud Logging handler based on the environment
# you're running in and integrates the handler with the
# Python logging module. By default this captures all logs
# at INFO level and higher
client.get_default_handler()
client.setup_logging()
# Imports Python standard library logging
import logging
def hello_world(req):
# Emits the data using the standard logging module
logging.info('info')
logging.warning('warn')
logging.error('error')
return ""
requirements.txt:
google-cloud-logging==2.5.0
Triggering this cloud function results in the following in stackdriver:

Run python Flask API on AWS EC2 through boto3

I'm new at AWS, so I'm building a code to create a instance from an Image and I want that at the same time that this EC2 is created it run a pyhton code like this:
python /folder/folder2/api_flask.py
Here's my code on boto to create my instance.
import boto3
client = boto3.session('ec2')
client.run_instances(ImageId='ami-id_number_of_img', MinCount=1, MaxCount=1, InstanceType='t2.nano')
Thnks for your help.
run_instances has an option called UserData which allows you to Run commands on your Linux instance at launch.
Thus to run your code, you can try to the following:
import boto3
client = boto3.client('ec2') # not boto3.session('ec2')
client.run_instances(ImageId='ami-id_number_of_img',
MinCount=1,
MaxCount=1,
InstanceType='t2.nano',
UserData='#!/bin/bash\npython /folder/folder2/api_flask.py\n')
Since you mention you are new to AWS, consider using CloudFormation for provisioning AWS Infrastructure. You'll still need leverage UserData as Marcin mentioned.
MyInstance:
Type: AWS::EC2::Instance
Properties:
UserData:
Fn::Base64: !Sub |
python /folder/folder2/api_flask.py
InstanceType: t2.nano
ImageId: ami-id_number_of_img
Why CloudFormation? It'd be more readable, an allows for in-place updates as well as tear downs. You could then launch the stack via boto3 (disclaimer: not tested, but demonstrates the idea):
import boto3
client = boto3.client('cloudformation')
with open('mytemplate.yml', 'r') as f:
response = client.create_stack(
StackName='my-stack',
TemplateBody=f.read())

get list of tables in database using boto3

I’m trying to get a list of the tables from a database in my aws data catalog. I’m trying to use boto3. I’m running the code below on aws, in a sagemaker notebook. It runs forever (like over 30 minutes) and doesn’t return any results. The test_db only has 4 tables in it. My goal is to run similar code as part of an aws glue etl job, that I would run in an edited aws etl job script. Does anyone see what the issue might be or suggest how to do this?
code:
import boto3
from pprint import pprint
glue = boto3.client('glue', region_name='us-east-2')
response = glue.get_tables(
DatabaseName=‘test_db’
)
print(pprint(response['TableList']))
db = session.resource('dynamodb', region_name="us-east-2")
tables = list(db.tables.all())
print(tables)
resource
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/dynamodb.html

Resources