python - Cloud Firestore - append data to subcollection - python-3.x

Is it possible to append data to a subcollection in the cloud firestore database using the python firebase admin sdk? If so, what am i missing?
I am trying to append data into a subcollection of a specific document located in googles cloud firestore - Not the real time database. I have done a fair amount of research with this being the most prevalent resource so far:
add data to firestore which does not explicitly say it isn't possible
i am able to create documents, read documents, etc. i just cannot seem to append to or access subcollections without getting the whole document
the flow of my code goes as follows:
import time
import json
import firebase_admin
from firebase_admin import credentials
from firebase_admin import db
from firebase_admin import firestore
#authentication
cred = credentials.Certificate('key.json')
app = firebase_admin.initialize_app(cred)
cloudDb = firestore.client() #open connection. uses 'app' implicitly
doc = cloudDb.collection(collection).document(document)
data['date'] = firestore.SERVER_TIMESTAMP
data['payload'] = [0,1,2,3,4,5]
doc.collection("data").set(data) #*
#--testing (collection)
# |
#----docId (document)
# |
#------company (field)
#------accountinfo(field)
#------data (subcollection)
# |
#--------date (field)
#--------payload (array)
this however fails on the last line (line with asterisk) with the error:
('Cannot convert to a Firestore Value', <object object at 0xb6b07e60>, 'Invalid type', <class 'object'>)

ok so solved my own question thanks to Doug getting me to write a script which could be run on another machine. (ie a smaller script with less going on)
my issue was trying to set the collection as my data object instead of creating a document within the collection and setting that. ie:
doc.collection("data").set(data) #error
doc.collection("data").document().set(data) #success

Related

Loading multiple file from cloud storage to big query in different tables

I am new to GCP, I am able to get 1 file into GCS from my VM and then transfer it to bigquery.
How to I transfer multiple files from GCS to Bigquery. I know wildcard URi is the solution to it but what other changes are also needed in the code below?
def hello_gcs(event, context):
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "test_project.test_dataset.test_Table"
job_config = bigquery.LoadJobConfig(
autodetect=True,
skip_leading_rows=1,
# The source format defaults to CSV, so the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://test_bucket/*.csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print(f"Processing file: {file['name']}.")
As there could be multiple uploads so I cannot define the specific table name or file name? Is it possible to do this task automatically?
This function is triggered by PubSub whenever there is a new file in GCS bucket.
Thanks
To transfer multiple files from GCS to Bigquery, you can simply loop through all the files. A sample of the working code with comments is below.
I believe event and context (function arguments) are handled by Google cloud function by default, so no need to modify that part. Or you can simplify the code by leveraging event instead of a loop.
def hello_gcs(event, context):
import re
from google.cloud import storage
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
bq_client = bigquery.Client()
bucket = storage.Client().bucket("bucket-name")
for blob in bucket.list_blobs(prefix="folder-name/"):
if ".csv" in blob.name: #Checking for csv blobs as list_blobs also returns folder_name
job_config = bigquery.LoadJobConfig(
autodetect=True,
skip_leading_rows=1,
source_format=bigquery.SourceFormat.CSV,
)
csv_filename = re.findall(r".*/(.*).csv",blob.name) #Extracting file name for BQ's table id
bq_table_id = "project-name.dataset-name."+csv_filename[0] # Determining table name
try: #Check if the table already exists and skip uploading it.
bq_client.get_table(bq_table_id)
print("Table {} already exists. Not uploaded.".format(bq_table_id))
except NotFound: #If table is not found, upload it.
uri = "gs://bucket-name/"+blob.name
print(uri)
load_job = bq_client.load_table_from_uri(
uri, bq_table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = bq_client.get_table(bq_table_id) # Make an API request.
print("Table {} uploaded.".format(bq_table_id))
Correct me if I am wrong, I understand that your cloud function is triggered by a finalize event (Google Cloud Storage Triggers), when a new file (or object) appears in a storage bucket. It means that there is one event for each "new" object in the bucket. Thus, at least one invocation of the cloud function for every object.
The link above has an example of data which comes in the event dictionary. Plenty of information there including details of the object (file) to be loaded.
You might like to have some configuration with mapping between a file name pattern and a target BigQuery table for data loading, for example. Using that map you will be able to make a decision on which table should be used for loading. Or you may have some other mechanism for choosing the target table.
Some other things to think about:
Exception handling - what are you going to do with the file if the
data is not loaded (for any reason)? Who and how is to be informed?
What is to be done to (correct the source data or the target table
and) repeat the loading, etc.
What happens if the loading takes more time, than a cloud function
timeout (maximum 540 seconds at the present moment)?
What happens if the there are more than one cloud function
invocations from one finalize event, or from different events but
from semantically the same source file (repeated data, duplications,
etc.)
Don't answer to me, just think about such cases if you have not done it yet.
if your Data source is GCS and your destination is BQ you can use BigQuery Data Transfer Service to ETL your data in BQ. every Transfer job is for a certain Table and you can select if you want to append or overwrite data in a certain Table with Streaming mode.
You can schedule this job as well. Dialy, weekly,..etc.
To load multiple GCS files onto multiple BQ tables on a single Cloud Function invocation, you’d need to list those files and then iterate over them, creating a load job for each file, just as you have done for one. But doing all that work inside a single function call, kind of breaks the purpose of using Cloud Functions.
If your requirements do not force you to do so, you can leverage the power of Cloud Functions and let a single CF be triggered by each of those files once they are added to the bucket as it is an event driven function. Please refer https://cloud.google.com/functions/docs/writing/background#cloud-storage-example. It would be triggered every time there is a specified activity, for which there would be event metadata.
So, in your application rather than taking the entire bucket contents in the URI, we can take the name of the file which triggered the event and load only that file into a bigquery table as shown in the below code sample.
Here is how you can resolve the issue in your code. Try the following changes in your code.
You can extract the details about the event and detail about the file which triggered the event from the cloud function event dictionary. In your case, we can get the file name as event[‘name’] and update the “uri” variable.
Generate a new unique table_id (here as an example the table_id is the same as the file name). You can use other schemes to generate unique file names as required.
Refer the code below
def hello_gcs(event, context):
from google.cloud import bigquery
client = bigquery.Client() # Construct a BigQuery client object.
print(f"Processing file: {event['name']}.") #name of the file which triggers the function
if ".csv" in event['name']:
# bq job config
job_config = bigquery.LoadJobConfig(
autodetect=True,
skip_leading_rows=1,
source_format=bigquery.SourceFormat.CSV,
)
file_name = event['name'].split('.')
table_id = "<project_id>.<dataset_name>."+file_name[0] #[generating new id for each table]
uri = "gs://<bucket_name>/"+event['name']
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Table {} uploaded.".format(table_id))

Flask sqlalchemy updating multiple fields in a row

Recently moved to flask from expressjs.
I am creating a flask app using flask flask-sqlalchemy flask-wtf
It is a form heavy application. I expect to have about 30-50 forms, with each form having 20-100 fields.
Client side forms are using flask-wtf
I am able to create models and able to create a crud functionality. The problem is that with each form I have to manually do
IN CREATE
[...]
# after validation
someItem = SomeModel(someField1=form.someField1.data, ..., somefieldN = form.someFieldN.data)
db.session.add(someItem)
db.session.commit()
IN UPDATE
[....]
queryItem = SomeModel.query.filter_by(id=item_id)
queryItem.somefield1 = form.someField1.data
[...]
queryItem.somefieldN = form.someFieldN.data
db.session.commit()
As apparent, with lots of forms, it gets very tedious. Is there a way to
If you are able to suggest a library that will do this
I have searched online for the last few hours. The closest I got to was to create a dictionary and then pass it like
someDict = {'someField1': form.someField1.data, ....}
SomeModel.query.filter_by(id=item.id).update(someDict)
As you can see it is equally tedious
I am hoping to find a way to pass the form data directly to SomeModel for creating as well as updating.
I previously used expressjs + knex and I was simply able to pass req.body after validation, to knex.
Thanks for your time
Use 'populate_obj' (note: model field names must match form fields)
Create record:
someItem = SomeModel()
form.populate_obj(someItem)
db.session.add(someItem)
db.session.commit()
Update record:
queryItem = SomeModel.query.filter_by(id=item_id)
form.populate_obj(queryItem)
db.session.commit()

Function service_account.Credentials.from_service_account_info() not working

I'm writing an application based on GCP services and I need to access to an external project. I stored on my Firestore database the authentication file's informations of the other project I need to access to. I read this documentation and I tried to apply it but my code does not work. As the documentaion says, what I pass to the authentication method is a dictionary[str, str].
This is my code:
from googleapiclient import discovery
from google.oauth2 import service_account
from google.cloud import firestore
project_id = body['project_id']
user = body['user']
snap_id = body['snapshot_id']
debuggee_id = body['debuggee_id']
db = firestore.Client()
ref = db.collection(u'users').document(user).collection(u'projects').document(project_id)
if ref.get().exists:
service_account_info = ref.get().to_dict()
else:
return None, 411
credentials = service_account.Credentials.from_service_account_info(
service_account_info,
scopes=['https://www.googleapis.com/auth/cloud-platform'])
service = discovery.build('clouddebugger', 'v2', credentials=credentials)
body is just a dictionary containing all the informations of the other project. What I can't understand is why this doesn't work and instead using the method from_service_account_file it works.
The following code will give to that method the same informations of the previous code, but inside a json file instead of a dictionary. Maybe the order of the elements is different, but I think that it doesn't matter at all.
credentials = service_account.Credentials.from_service_account_file(
[PATH_TO_PROJECT_KEY_FILE],
scopes=['https://www.googleapis.com/auth/cloud-platform'])
Can you tell me what I'm doing wrong with the method from_service_account_info?
Problem solved. When I posted the question I manually inserted from the GCP Firestore Console all the info about the other project. Then I wrote the code to make it authomatically and it worked. Honestly I don't know why it didn't worked before, the informations put inside Firestore were the same and the format as well.

How can I update expiration of a document in Couchbase using Python 3?

We have a lot of docs in Couchbase with expiration = 0, which means that documents stay in Couchbase forever. I am aware that INSERT/UPDATE/DELETE isn't supported by N1QL.
We have 500,000,000 such docs and I would like to do this in parallel using chunks/bulks. How can I update the expiration field using Python 3?
I am trying this:
bucket.touch_multi(('000c4894abc23031eed1e8dda9e3b120', '000f311ea801638b5aba8c8405faea47'), ttl=10)
However I am getting an error like:
_NotFoundError_0xD (generated, catch NotFoundError): <Key=u'000c4894abc23031eed1e8dda9e3b120'
I just tried this:
from couchbase.cluster import Cluster
from couchbase.cluster import PasswordAuthenticator
cluster = Cluster('couchbase://localhost')
authenticator = PasswordAuthenticator('Administrator', 'password')
cluster.authenticate(authenticator)
cb = cluster.open_bucket('default')
keys = []
for i in range(10):
keys.append("key_{}".format(i))
for key in keys:
cb.upsert(key, {"some":"thing"})
print(cb.touch_multi(keys, ttl=5))
and I get no errors, just a dictionary of keys and OperationResults. And they do in fact expire soon thereafter. I'd guess some of your keys are not there.
However maybe you'd really rather set a bucket expiry? That will make all the documents expire in that time, regardless of what the expiry on the individual documents are. In addition to the above answer that mentions that, check out this for more details.
You can use Couchbase Python (Any) SDK Bucket.touch() method Described here https://docs.couchbase.com/python-sdk/current/document-operations.html#modifying-expiraton
If you don't know the document keys you can use N1QL Covered index get the document keys asynchronously inside your python SDK and use the above bucket touch API set expiration from your python SDK.
CREATE INDEX ix1 ON bucket(META().id) WHERE META().expiration = 0;
SELECT RAW META().id
FROM bucket WHERE META().expiration = 0 AND META().id LIKE "a%";
You can issue different SELECT's for different ranges and do in parallel.
Update Operation, You need to write one. As you get each key do (instead of update) bucket.touch(), which only updates document expiration without modifying the actual document. That saves get/put of whole document (https://docs.couchbase.com/python-sdk/current/core-operations.html#setting-document-expiration).

Web Service returns: sqlite3.OperationalError: no such table:

I'm trying to set up some simple web service with Python and Flask and SQlite3. It doesn't work.
The DB connection without web service works; the web service without DB connections works. Together they don't.
if I run this, it works:
import sqlite3
conn = sqlite3.connect('scuola.db')
sql = "SELECT matricola,cognome,nome FROM studenti"
cur = conn.cursor()
cur.execute(sql)
risultato = cur.fetchall()
conn.close()
print(risultato)
(so query is correct)
and if I run this, it works
import flask
app = flask.Flask(__name__)
def funzione():
return 'Applicazione Flask'
app.add_url_rule('/', 'funzione', funzione)
but if I run this...
from flask import Flask
import sqlite3
app = Flask(__name__)
#app.route('/',methods=['GET'])
def getStudenti():
conn = sqlite3.connect('scuola.db')
sql = "SELECT matricola,cognome,nome FROM studenti"
cur = conn.cursor()
cur.execute(sql)
risultato = cur.fetchall()
conn.close()
return risultato
It returns Internal Server Error in the browser, and
sqlite3.OperationalError: no such table: studenti
on the DOS prompt.
Thank you for your help!
You haven't provided the internal server error output - but my first guess is that you're trying to return the raw list object returned from fetchall.
When returning from a view function you need to send the results either by returning a template, or by jsonifying the output to make it a proper HTTP response that the browser can receive.
You need to add
from flask import jsonify
in your imports, then when returning;
return jsonify(risultato)
If you get errors like something is not JSON serializable if means you're trying to send an instance of a class or similar. You'll need to make sure you're returning only plain python data structures (e.g. list/dict/str etc).
For the command line problem, you need to make sure you've ran a CREATE TABLE command to first generate the table in the database, before you select from it. Also check you're accessing the correct sqlite database file with the table in it.
I'm not sure, but from the look of things I don't think you've configured the flask app to support the db you created There should be some sort of app.config() that integrates the db.

Resources