Streaming json into bigquery using python

Streaming json into bigquery using python - python-3.x

We have a code to reads some electricity meter data ,which we want to push to bigquery so that it can be visualized in data studio. We tried usign Cloud function, but it seems the code generates streaming data and cloud function timesout. So this may not be a correct use case for cloud function
def test():
def print_recursive(usage_dict, info, depth=0):
for gid, device in usage_dict.items():
for channelnum, channel in device.channels.items():
name = channel.name
if name == 'Main':
name = info[gid].device_name
d = datetime.now()
t = d.strftime("%x")+' '+d.strftime("%X")
print(d.strftime("%x"),d.strftime("%X"))
res={'Gid' : gid,
'ChannelNumber' : channelnum[0],
'Name' : channel.name,
'Usage' : channel.usage,
'unit':'kwh',
'Timestamp':t
}
global resp
resp = res
print(resp)
return resp
devices = vue.get_devices()
deviceGids = []
info ={}
for device in devices:
if not device.device_gid in deviceGids:
deviceGids.append(device.device_gid)
info[device.device_gid] = device
else:
info[device.device_gid].channels += device.channels
device_usage_dict = vue.get_device_list_usage(deviceGids=deviceGids,
instant=datetime.utcnow(), scale=Scale.SECOND.value, unit=Unit.KWH.value)
print_recursive(device_usage_dict, info)
This generates a electricity consumption data in real time
Can anyone suggest which GCP service would be ideal here? based on my research it seems pub/sub => bigquery . But I my question is can we programmatically ingest data into pubsub ? if yes then what are the prerequisites ?

Related

Ble dbus python: advertise ServiceUUIDs information

Background: I want to use ble for server/client software. But in this case, the server should be the peripheral with multiple connections. For that i start multiple services, up to 10. I think that's the limit for peripheral connections on my chip. The different services are the same in functionality but should connect to only one client. I see no other possibility to differentiate the clients in the notification function.
Problem: For that i restart the advertising after each connection. In the advertisement, I set the uuid information field “ServiceUUIDs” for the next free service to connect. This is working after restarting my pc. In my flutter app, I’m able to read this field perfectly in the advertisment. But when I restart the python script, the uuid is no longer advertised and the field “ServiceUUIDs” is empty although {'Type': 'peripheral', 'ServiceUUIDs': dbus.Array(['0000ac01-0000-1000-8000-00805f9b34fb'], signature=dbus.Signature('s')), 'LocalName': dbus.String('Hello'), 'Discoverable': dbus.Boolean(True)} is set in the bluetooth_constants.ADVERTISING_MANAGER_INTERFACE. Also restarting the bluetooth module is not working. Therefore I must restart my pc everytime I want to restart my server... After connecting to the server, the client can scan all services and does not know, which one of the 10 services is free to connect.
My python code is the same as in the dbus tutorial:
class Advertisement(dbus.service.Object):
PATH_BASE = '/org/bluez/ldsg/advertisement'
def __init__(self, bus, index, advertising_type):
self.path = self.PATH_BASE + str(index)
self.bus = bus
self.ad_type = advertising_type
self.service_uuids = ['0000ac01-0000-1000-8000-00805f9b34fb']
self.manufacturer_data = None
self.solicit_uuids = None
self.service_data = None
self.local_name = 'Hello'
self.include_tx_power = False
self.data = None #{0x26: dbus.Array([0x01, 0x01, 0x01], signature='y')}
self.discoverable = True
dbus.service.Object.__init__(self, bus, self.path)
def get_properties(self):
properties = dict()
properties['Type'] = self.ad_type
if self.service_uuids is not None:
properties['ServiceUUIDs'] = dbus.Array(self.service_uuids,
signature='s')
if self.solicit_uuids is not None:
properties['SolicitUUIDs'] = dbus.Array(self.solicit_uuids,
signature='s')
if self.manufacturer_data is not None:
properties['ManufacturerData'] = dbus.Dictionary(
self.manufacturer_data, signature='qv')
if self.service_data is not None:
properties['ServiceData'] = dbus.Dictionary(self.service_data,
signature='sv')
if self.local_name is not None:
properties['LocalName'] = dbus.String(self.local_name)
if self.discoverable is not None and self.discoverable == True:
properties['Discoverable'] = dbus.Boolean(self.discoverable)
if self.include_tx_power:
properties['Includes'] = dbus.Array(["tx-power"], signature='s')
if self.data is not None:
properties['Data'] = dbus.Dictionary(
self.data, signature='yv')
print(properties)
return {bluetooth_constants.ADVERTISING_MANAGER_INTERFACE: properties}
def start_advertising():
global adv
global adv_mgr_interface
# we're only registering one advertisement object so index (arg2) is hard coded as 0
print("Registering advertisement",adv.get_path())
adv_mgr_interface.RegisterAdvertisement(adv.get_path(), {},
reply_handler=register_ad_cb,
error_handler=register_ad_error_cb)
bus = dbus.SystemBus()
adv_mgr_interface = dbus.Interface(bus.get_object(bluetooth_constants.BLUEZ_SERVICE_NAME,adapter_path), bluetooth_constants.ADVERTISING_MANAGER_INTERFACE)
adv = Advertisement(bus, 0, 'peripheral')
start_advertising()
Maybe there is an other way to send additional information in the advertisment? I also tried "Data" and "ServiceData" but then the advertisement throws an error. I haven't found a good tutorial for this. I'm only looking for a stable way to give the additional information for the next free service in the advertisment.
Thank You!

Azure python sdk service bus receive message

I am a bit confused about the azure python servicebus.
I have a servicebus TOPIC and SUBSCRIPTION which listen to specific messages, I have the code to receive those messages which then they will be processed by aws comprehend.
Following Microsoft documentation, the basic code to receive the message work and I am able to print it, but when I integrate the same logic with comprehend it fails.
Here is the example, this is the bit of code from Microsoft documentation:
with servicebus_client:
# get the Queue Receiver object for the queue
receiver = servicebus_client.get_queue_receiver(queue_name=QUEUE_NAME, max_wait_time=5)
with receiver:
for msg in receiver:
print("Received: " + str(msg))
# complete the message so that the message is removed from the queue
receiver.complete_message(msg)
and the output is this
{"ModuleId":"123458", "Text":"This is amazing."}
Receive is done.
My first thought was that the message received, was a Json object. so I started writing the code to read data from a json outputs as follow:
servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR)
with servicebus_client:
receiver = servicebus_client.get_subscription_receiver(
topic_name=TOPIC_NAME,
subscription_name=SUBSCRIPTION_NAME
)
with receiver:
received_msgs = receiver.receive_messages(max_message_count=10, max_wait_time=5)
for msg in received_msgs:
# print(str(msg))
message = json.dumps(msg)
text = message['Text']
#passing the text to comprehend
result_json= json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4)
result = json.loads(result_json) # converting json to python dictionary
#extracting the sentiment value
sentiment = result["Sentiment"]
#extracting the sentiment score
if sentiment == "POSITIVE":
value = round(result["SentimentScore"]["Positive"] * 100,2)
elif sentiment == "NEGATIVE":
value = round(result["SentimentScore"]["Negative"] * 100,2)
elif sentiment == "NEUTRAL":
value = round(result["SentimentScore"]["Neutral"] * 100,2)
elif sentiment == "MIXED":
value = round(result["SentimentScore"]["Mixed"] * 100,2)
#store the text, sentiment and value in a dictionary and convert it tp JSON
output={'Text':text,'Sentiment':sentiment, 'Value':value}
output_json = json.dumps(output)
print('Text: ',text,'\nSentiment: ',sentiment,'\nValue: ', value)
print('In JSON format\n',output_json)
receiver.complete_message(msg)
print("Receive is done.")
But when I run this I get the following error:
TypeError: Object of type ServiceBusReceivedMessage is not JSON serializable
Did this ever happened to anybody who can help me to understand what is the type of servicebus that is coming back from the receive?
Thank you so much everyone

Did this ever happened to anybody who can help me to understand what
is the type of servicebus that is coming back from the receive?
The type of the received message is ServiceBusReceivedMessage which is derived from ServiceBusMessage. The contents of the message can be fetched from its body property.
Can you please try something like:
message = json.dumps(msg.body)

How to use google-cloud-os-config classes in python code?

In a Google Cloud function (python 3.7) , I need to fetch the compliance state of all VMs in a given location in a project.
From available google documentation here I could see the REST API format:
https://cloud.google.com/compute/docs/os-configuration-management/view-compliance#view_compliance_state
On searching for the client library here , I found this:
class google.cloud.osconfig_v1alpha.types.ListInstanceOSPoliciesCompliancesRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]
Bases: proto.message.Message
A request message for listing OS policies compliance data for all Compute Engine VMs in the given location.
parent
Required. The parent resource name.
Format: projects/{project}/locations/{location}
For {project}, either Compute Engine project-number or project-id can be provided.
Type
str
page_size
The maximum number of results to return.
Type
int
page_token
A pagination token returned from a previous call to ListInstanceOSPoliciesCompliances that indicates where this listing should continue from.
Type
str
filter
If provided, this field specifies the criteria that must be met by a InstanceOSPoliciesCompliance API resource to be included in the response.
Type
str
And the response class as:
class google.cloud.osconfig_v1alpha.types.ListInstanceOSPoliciesCompliancesResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)[source]
Bases: proto.message.Message
A response message for listing OS policies compliance data for all Compute Engine VMs in the given location.
instance_os_policies_compliances
List of instance OS policies compliance objects.
Type
Sequence[google.cloud.osconfig_v1alpha.types.InstanceOSPoliciesCompliance]
next_page_token
The pagination token to retrieve the next page of instance OS policies compliance objects.
Type
str
property raw_page
But I am not sure how to use this information in the python code.
I have written this but not sure if this is correct:
from google.cloud.osconfig_v1alpha.services.os_config_zonal_service import client
from google.cloud.osconfig_v1alpha.types import ListInstanceOSPoliciesCompliancesRequest
import logging
logger = logging.getLogger(__name__)
import os
def handler():
try:
project_id = os.environ["PROJECT_ID"]
location = os.environ["ZONE"]
#list compliance state
request = ListInstanceOSPoliciesCompliancesRequest(
parent=f"projects/{project}/locations/{location}")
response = client.instance_os_policies_compliance(request)
return response
except Exception as e:
logger.error("Unable to get compliance - %s " % str(e))
I could not find any usage example for the client library methods anywhere.
Could someone please help me here?
EDIT:
This is what I am using now:
from googleapiclient.discovery import build
def list_policy_compliance():
projectId = "my_project"
zone = "my_zone"
try:
service = build('osconfig', 'v1alpha', cache_discovery=False)
compliance_response = service.projects().locations(
).instanceOsPoliciesCompliances().list(
parent='projects/%s/locations/%s' % (
projectId, zone)).execute()
return compliance_response
except Exception as e:
raise Exception()

Something like this should work:
from google.cloud import os_config_v1alpha as osc
def handler():
client = osc.OsConfigZonalService()
project_id = "my_project"
location = "my_gcp_zone"
parent = f"projects/{project_id}/locations/{location}"
response = client.list_instance_os_policies_compliances(
parent=parent
)
# response is an iterable yielding
# InstanceOSPoliciesCompliance objects
for result in response:
# do something with result
...
You can also construct the request like this:
response = client.list_instance_os_policies_compliances(
request = {
"parent": parent
}
)

Answering my own question here , this is what I used:
from googleapiclient.discovery import build
def list_policy_compliance():
projectId = "my_project"
zone = "my_zone"
try:
service = build('osconfig', 'v1alpha', cache_discovery=False)
compliance_response = service.projects().locations(
).instanceOsPoliciesCompliances().list(
parent='projects/%s/locations/%s' % (
projectId, zone)).execute()
return compliance_response
except Exception as e:
raise Exception()

Google cloud function (python) does not deploy - Function failed on loading user code

I'm calling a simple python function in google cloud but cannot get it to save. It shows this error:
"Function failed on loading user code. This is likely due to a bug in the user code. Error message: Error: please examine your function logs to see the error cause: https://cloud.google.com/functions/docs/monitoring/logging#viewing_logs. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging. Please visit https://cloud.google.com/functions/docs/troubleshooting for in-depth troubleshooting documentation."
Logs don't seem to show much that would indicate error in the code. I followed this guide: https://blog.thereportapi.com/automate-a-daily-etl-of-currency-rates-into-bigquery/
With the only difference environment variables and the endpoint I'm using.
Code is below, which is just a get request followed by a push of data into a table.
import requests
import json
import time;
import os;
from google.cloud import bigquery
# Set any default values for these variables if they are not found from Environment variables
PROJECT_ID = os.environ.get("PROJECT_ID", "xxxxxxxxxxxxxx")
EXCHANGERATESAPI_KEY = os.environ.get("EXCHANGERATESAPI_KEY", "xxxxxxxxxxxxxxx")
REGIONAL_ENDPOINT = os.environ.get("REGIONAL_ENDPOINT", "europe-west1")
DATASET_ID = os.environ.get("DATASET_ID", "currency_rates")
TABLE_NAME = os.environ.get("TABLE_NAME", "currency_rates")
BASE_CURRENCY = os.environ.get("BASE_CURRENCY", "SEK")
SYMBOLS = os.environ.get("SYMBOLS", "NOK,EUR,USD,GBP")
def hello_world(request):
latest_response = get_latest_currency_rates();
write_to_bq(latest_response)
return "Success"
def get_latest_currency_rates():
PARAMS={'access_key': EXCHANGERATESAPI_KEY , 'symbols': SYMBOLS, 'base': BASE_CURRENCY}
response = requests.get("https://api.exchangeratesapi.io/v1/latest", params=PARAMS)
print(response.json())
return response.json()
def write_to_bq(response):
# Instantiates a client
bigquery_client = bigquery.Client(project=PROJECT_ID)
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset(DATASET_ID)
table_ref = dataset_ref.table(TABLE_NAME)
table = bigquery_client.get_table(table_ref)
# get the current timestamp so we know how fresh the data is
timestamp = time.time()
jsondump = json.dumps(response) #Returns a string
# Ensure the Response is a String not JSON
rows_to_insert = [{"timestamp":timestamp,"data":jsondump}]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
print(errors)
assert errors == []
I tried just the part that does the get request with an offline editor and I can confirm a response works fine. I suspect it might have to do something with permissions or the way the script tries to access the database.

How to avoid header while exporting BigQuery table in to Google Storage

I have developed below code which is helping to export BigQuery table in to Google storage bucket. I want to merge files into single file with out header, so that next processes will use file with out any issue.
def export_bq_table_to_gcs(self, table_name):
client = bigquery.Client(project=project_name)
print("Exporting table {}".format(table_name))
dataset_ref = client.dataset(dataset_name,
project=project_name)
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table(table_name)
size_bytes = client.get_table(table_ref).num_bytes
# For tables bigger than 1GB uses Google auto split, otherwise export is forced in a single file.
if size_bytes > 10 ** 9:
destination_uris = [
'gs://{}/{}{}*.csv'.format(bucket_name,
f'{table_name}_temp', uid)]
else:
destination_uris = [
'gs://{}/{}{}.csv'.format(bucket_name,
f'{table_name}_temp', uid)]
extract_job = client.extract_table(table_ref, destination_uris) # API request
result = extract_job.result() # Waits for job to complete.
if result.state != 'DONE' or result.errors:
raise Exception('Failed extract job {} for table {}'.format(result.job_id, table_name))
else:
print('BQ table(s) export completed successfully')
storage_client = storage.Client(project=gs_project_name)
bucket = storage_client.get_bucket(gs_bucket_name)
blob_list = bucket.list_blobs(prefix=f'{table_name}_temp')
print('Merging shard files into single file')
bucket.blob(f'{table_name}.csv').compose(blob_list)
Can you please help me to find a way to skip header.
Thanks,
Raghunath.

We can avoid header by using jobConfig to set the print_header parameter to False. Sample code
job_config = bigquery.job.ExtractJobConfig(print_header=False)
extract_job = client.extract_table(table_ref, destination_uris,
job_config=job_config)
Thanks

You can use skipLeadingRows (https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#externalDataConfiguration.googleSheetsOptions.skipLeadingRows)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Streaming json into bigquery using python - python-3.x

Related

Ble dbus python: advertise ServiceUUIDs information

Azure python sdk service bus receive message

How to use google-cloud-os-config classes in python code?

Google cloud function (python) does not deploy - Function failed on loading user code

How to avoid header while exporting BigQuery table in to Google Storage

Categories

Resources