Cloud Schedule Cloud Function to read and write data to BigQuery fails - python-3.x

I am trying to schedule a read and write Cloud Function in GCP, but I keep getting a fail on the execution of the scheduling in Cloud Scheduler. My function (which b.t.w. is validated and activated by Cloud Functions) is given by
def geopos_test(request):
from flatten_json import flatten
import requests
import flatten_json
import pandas as pd
import os, json, sys,glob,pathlib
import seaborn as sns
from scipy import stats
import collections
try:
collectionsAbc = collections.abc
except AttributeError:
collectionsAbc = collections
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.ticker as ticker
import datetime
import seaborn as sns
from mpl_toolkits.axes_grid1 import make_axes_locatable
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
from matplotlib.lines import Line2D
import numpy as np
import math
from pandas.io.json import json_normalize
from operator import attrgetter
from datetime import date, timedelta
import pandas_gbq
import collections
from google.cloud import bigquery
client = bigquery.Client()
project = "<ProjectId>"
dataset_id = "<DataSet>"
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table('sectional_accuracy')
table = client.get_table(table_ref)
Sectional_accuracy = client.list_rows(table).to_dataframe()
sectional_accuracy = sectional_accuracy.drop_duplicates()
sectional_accuracy.sort_values(['Store'])
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("Store", bigquery.enums.SqlTypeNames.STRING),
bigquery.SchemaField("storeid", bigquery.enums.SqlTypeNames.STRING),
bigquery.SchemaField("storeIdstr", bigquery.enums.SqlTypeNames.STRING),
bigquery.SchemaField("Date", bigquery.enums.SqlTypeNames.TIMESTAMP),
bigquery.SchemaField("Sections", bigquery.enums.SqlTypeNames.STRING),
bigquery.SchemaField("Percentage", bigquery.enums.SqlTypeNames.FLOAT),
bigquery.SchemaField("devicePercentage", bigquery.enums.SqlTypeNames.FLOAT),
bigquery.SchemaField("distance", bigquery.enums.SqlTypeNames.STRING),
],)
NtableId = '<ProjectId>.<DataSet>.test'
job = client.load_table_from_dataframe(sectional_accuracy, Ntable_id, job_config=job_config)
This function only reads data from one table and writes it to a new one. The idea is to do a load of transformations between the reading and writing.
The Function is associated to the App Engine default service account for which I am the owner and I have added (probably overkill) The Cloud Run Invoker, Cloud Functions Invoker and Cloud Scheduler Job Runner.
Now, for the Cloud Scheduler:
I have defined it by HTTP with POSTmethod with an URL, AUth OIDC token with the same service account as that used by the function. As for the HTTP header, I have User-Agent with value Google-Cloud-Scheduler. Note that I have no other header as I am uncertain of what it should be.
Yet, it fails every single time with a PERMISSION DENIED message in the log.
What Have I tried:
Change geopos_test(request) to geopos_test(event, context)
Tried to change the HTTP header to (Content-Type, application/octet-stream) or (Content-Type, application/json)
Change service account
What I haven't tried is to give some value in body, since I do not know what it could be.
I am now out of ideas. Any help would be appreciated.
Update: Error message:
{
httpRequest: {1}
insertId: "********"
jsonPayload: {
#type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "******************"
status: "PERMISSION_DENIED"
targetType: "HTTP"
url: "*************"
}
logName: "*************/logs/cloudscheduler.googleapis.com%2Fexecutions"
receiveTimestamp: "2022-10-24T10:10:52.337822391Z"
resource: {2}
severity: "ERROR"
timestamp: "2022-10-24T10:10:52.337822391Z"

Related

List a sharepoint folder using python office365 containing more than 5000 files

from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
app_principal = {
'client_id': 'xxxxxxxxxxxxxxxxxxxxxxx',
'client_secret': 'xxxxxxxxxxxxxxxxxxxxxxx'
}
sp_site = 'https://<domain>.sharepoint.com/sites/TSM839/'
relative_url = "/sites/TSM839/Shared Documents"
client_credentials = ClientCredential(app_principal['client_id'], app_principal['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()
len(files)
use case : Fetch the names of all the files in SharePoint folder
This program works perfectly when the number of files in the SharePoint folder is < 5000, when it is > 5000, getting below exception

iCal4j Principles not found

We are trying to connect to our instance of the CalendarStore but we don't understand the exception, that we get back when executing the code.
The Error we're having:
org.codehaus.groovy.runtime.InvokerInvocationException: net.fortuna.ical4j.connector.ObjectStoreException: net.fortuna.ical4j.connector.FailedOperationException: Principals not found
We also know that the error occurs in line 62. Please note that I have edited the strings in the variables to not expose them to the public.
import com.github.caldav4j.CalDAVCollection;
import com.github.caldav4j.CalDAVConstants;
import com.github.caldav4j.exceptions.CalDAV4JException;
import com.github.caldav4j.methods.CalDAV4JMethodFactory;
import com.github.caldav4j.methods.HttpGetMethod;
import com.github.caldav4j.model.request.CalendarQuery;
import com.github.caldav4j.util.GenerateQuery;
import net.fortuna.ical4j.connector.ObjectStoreException;
import net.fortuna.ical4j.connector.dav.CalDavCalendarCollection;
import net.fortuna.ical4j.connector.dav.CalDavCalendarStore;
import net.fortuna.ical4j.connector.dav.PathResolver;
import org.apache.commons.codec.binary.Base64;
import org.apache.http.*;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.conn.routing.HttpRoute;
import org.apache.http.conn.routing.HttpRoutePlanner;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.conn.DefaultRoutePlanner;
import org.apache.http.impl.conn.DefaultSchemePortResolver;
import org.apache.http.protocol.HttpContext;
import net.fortuna.ical4j.model.Calendar;
import net.fortuna.ical4j.model.Component;
import net.fortuna.ical4j.model.ComponentList;
import net.fortuna.ical4j.model.component.VEvent;
import net.fortuna.ical4j.model.Date;
import net.fortuna.ical4j.data.CalendarBuilder;
import net.fortuna.ical4j.data.ParserException;
import net.fortuna.ical4j.connector.CalendarStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.impl.client.HttpClients;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.Iterator;
import java.util.List;
String USER = "User";
String PASS = "dont matter";
g_log.info("Start");
String uri = "our uri";
String prodId = "hm";
URL url = new URL("Our url");
PathResolver pathResolver = PathResolver.CHANDLER;
CalendarStore<CalDavCalendarCollection> calendarStore = new CalDavCalendarStore(prodId, url, pathResolver);
boolean testCon = calendarStore.connect(USER.toString(), PASS.toCharArray());
g_log.info("testCon: "+ testCon);

Not able to import name from root of the project

I am having a server.py file which is written in Falcon. Which looks like this.
try:
import falcon, logging
from os.path import dirname, realpath, join
from wsgiref import simple_server
from .config.config import server_config
from .middlewares.SQLAlchemySessionManager import SQLAlchemySessionManager
from .middlewares.GlobalInternalServerErrorManager import InternalServerErrorManager
from .lib.dbConnector import Session
from .routes import router
except ImportError as err:
falcon = None
raise err
serv_conf = server_config()
salescoachbot = falcon.API(middleware= [
SQLAlchemySessionManager(Session),
InternalServerErrorManager()
])
But when I am trying to import "salescoachbot" to other folder and files
like:
from ..server import salescoachbot
This gives me an error saying that the
from ..server import salescoachbot
ImportError: cannot import name 'salescoachbot'
The server.py is in the root of the project and has an init.py as well as the file which is trying to import the name.
What am I doing wrong here?

Unable to read Kinesis stream from SparkStreaming

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Milliseconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.DStream.toPairDStreamFunctions
import com.amazonaws.auth.AWSCredentials
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain
import com.amazonaws.auth.SystemPropertiesCredentialsProvider
import com.amazonaws.services.kinesis.AmazonKinesisClient
import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
import org.apache.spark.streaming.kinesis.KinesisInputDStream
import org.apache.spark.streaming.kinesis.KinesisInitialPositions.Latest
import org.apache.spark.streaming.kinesis.KinesisInitialPositions.TrimHorizon
import java.util.Date
val tStream = KinesisInputDStream.builder
.streamingContext(ssc)
.streamName(streamName)
.endpointUrl(endpointUrl)
.regionName(regionName)
.initialPosition(new TrimHorizon())
.checkpointAppName(appName)
.checkpointInterval(kinesisCheckpointInterval)
.storageLevel(StorageLevel.MEMORY_AND_DISK_2)
.build()
tStream.foreachRDD(rdd => if (rdd.count() > 0) rdd.saveAsTextFile("/user/hdfs/test/") else println("No record to read"))
Here, even though I see data coming into the stream, my above spark job isn't getting any records. I am sure that I am connecting to right stream with all credentials.
Please help me out.

import error while using db = SQLAlchemy(app)

I get an Import error while importing the thermos.py module in my models.py module.
C:\Users\sys\Thermos\thermos\thermos.py
C:\Users\sys\Thermos\thermos\models.py
Here is the relevant part of the thermos.py module.
import os
from datetime import datetime
from flask import Flask, render_template, url_for, request, redirect, flash
from flask_sqlalchemy import SQLAlchemy
basedir = os.path.abspath(os.path.dirname(__file__))
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join(basedir, 'thermos.db')
db = SQLAlchemy(app)
And, here is the relevant part of the models.py module.
from datetime import datetime
from thermos import db
Here is the image of the error I receive in CMD:
Kindly let me know what needs to be done to fix this issue.
Python can't decide if you're trying to import from the folder, or the file named thermos
You can rename
C:\Users\sys\Thermos\thermos\thermos.py
To
C:\Users\sys\Thermos\thermos\__init__.py
This makes the thermos directory a package with the db variable that can be used with an import
from . import db

Resources