Apache Airflow did not recognize error in xml and report success

Apache Airflow did not recognize error in xml and report success - python-3.x

Hello Community I have a question for you, regarding a Python function. in Diser Function I read a large XML into a JSON format. In doing so I want to check if there is an tag within the XML at the first or last position, if so then raise error etc.
Unfortunately I have the problem that sometimes the function doesn't seem to recognize this tag and Apache Airflow then reports a success back.
For this I have then built a second function in advance, which checks the xml in advance via a Beautifulsoup.
But now I always get a failed reported.
Can you explain to me why the "old" checked a success reports, but soup cancels ?
Should I combine the two or is there a more elegant solution in general ?
def parse_large_xml(xml_file_name, json_file_name, extra_fields=None, clean_keys=True):
"""
Converts SAP xml response to json.
- processes the xml file iteratively (one tag at a time)
- capable to deal with large files
Args:
xml_file_name: input filename
json_file_name: output filename
extra_fields: extra fields to add to the json
clean_keys: flag, if set remove prefixes from the response keys
"""
######### Extra check #####
# This is the new extra check
with open(xml_file_name, 'r') as xMl_File:
data = xMl_File.read()
if "error" in set(tag.name for tag in BeautifulSoup(data, 'xml').find_all()):
logging.info(tag.name)
errorMsg= f"error in response file \"{xml_file_name} (XML contains error tag)"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
########################################################################################
# This is the old Check
if extra_fields is None:
extra_fields = {}
with open(json_file_name, 'w') as json_file:
for event, elem in progressbar.progressbar(ET.iterparse(xml_file_name, events=('start', 'end'))):
if 'content' in elem.tag and event == 'end':
elem_dict = xmltodict.parse(tostring(elem))
if clean_keys:
elem_dict = clean_response_keys(elem_dict)
response_dict = {'raw_json_response': json.dumps(elem_dict)}
response_dict = {**extra_fields, **response_dict}
response_dict['hash'] = hashlib.sha256(
response_dict['raw_json_response'].encode()).hexdigest()
response_dict['date'] = get_scheduled_date()
json.dump(response_dict, json_file)
json_file.write('\n')
elem.clear()
elif 'error' in elem.tag and event == 'end':
errString = f"error in response file \"{xml_file_name}\":\n{tostring(elem)}"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
I´m using Apache Airflow 1.10.15 and composer: 1.16.10 as well as Python 3.
Here is an example error xml as it is returned but not recognized
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<code>
DBSQL_CONNECTION_NO_METADATA
</code>
<message>
Runtime Error: 'DBSQL_CONNECTION_NO_METADATA'. The OData request processing has been abnormal terminated. If "Runtime Error" is not initial, launch transaction ST22 for details and analysis. Otherwise, launch transaction SM21 for system log analysis.
</message>
<timestamp>
20220210031242
</timestamp>
</error>

Related

Pytube and the new # channel urls

I have been working on a python program for a little bit now that can take channel or video urls and convert them into a channel ID.
However my code doesn't seem to work with links that look like "http://youtube.com/#username"
if re.search ("/channel/", channelURL) or re.search ("#", channelURL) or re.search ("/user/", channelURL) or re.search ("/c/", channelURL):
# This code detects if the given URL is a channel. If the check comes back as True then it grabs the data using pytube.
c = Channel(channelURL)
channel_name = c.channel_name
channel_id = c.channel_id
channel_id_link = "http://youtube.com/channel/"+channel_id
print("Channel Name: "+channel_name)
print("Channel ID: "+channel_id)
print("Channel Link: "+channel_id_link)
You can see the full code here. https://github.com/flyinggoatman/YouTube-Link-Extractor/blob/master/QualityYouTube.py
What I expect, the code to be able pull the channel_name, channel_id and also the channel_id_link.
What happens?
The code runs but when I enter in a # youtube channel URL it returns the following
We have logged in as QualityYouTube Bot#2815
Using Discord channel: pending-channels
The bot has now fully booted up and may be used.
Please be advised this bot only supports one Discord server at a time. Future updates will allow for more than one server to be active at a time.
←[30;1m2022-12-30 02:20:50←[0m ←[31mERROR ←[0m ←[35mdiscord.client←[0m Ignoring exception in on_message
←[31mTraceback (most recent call last):
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 409, in _run_event
await coro(*args, **kwargs)
File "c:\Users[redacted]\test\QualityYouTube.py", line 101, in on_message
c = Channel(channelURL)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\contrib\channel.py", line 24, in init
self.channel_uri = extract.channel_name(url)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py", line 185, in channel_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: channel_name: could not find match for patterns←[0m
I understand I don't think the code will run with the current workings. However, can i somehow take the url and use the regex "#(.*)" to grab the username, and then use pytube to find a video made by that channel? I could then take the video URL and use that to get the information I need instead.

Note I fixed the issue by coding new lines of code inside "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py"
https://github.com/pytube/pytube/pull/1444
I added the following lines of code
- :samp:`https://youtube.com/#{channel_name}/*
r"(?:(#$[%\w\d_-]+)(.*)?)"
to this file inside the github for PyTube
https://github.com/pytube/pytube/blob/master/pytube/extract.py
The changes look like this...
def channel_name(url: str) -> str:
"""Extract the ``channel_name`` or ``channel_id`` from a YouTube url.
This function supports the following patterns:
- :samp:`https://youtube.com/c/{channel_name}/*`
- :samp:`https://youtube.com/channel/{channel_id}/*
- :samp:`https://youtube.com/u/{channel_name}/*`
- :samp:`https://youtube.com/user/{channel_id}/*
- :samp:`https://youtube.com/#{channel_name}/*`
:param str url:
A YouTube url containing a channel name.
:rtype: str
:returns:
YouTube channel name.
"""
patterns = [
r"(?:\/(c)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(channel)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:\/(u)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(user)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:(#[%\w\d_-]+)(.*)?)"
]
for pattern in patterns:
regex = re.compile(pattern)
function_match = regex.search(url)
if function_match:
logger.debug("finished regex search, matched: %s", pattern)
uri_style = function_match.group(1)
uri_identifier = function_match.group(2)
return f'/{uri_style}/{uri_identifier}'
raise RegexMatchError(
caller="channel_name", pattern="patterns"
)

vtkXMLParser some bug issues and how to solve them

I'm currently learning based on the example from the official VTK documentation, but I'm having some issues during calling vtkXMLDataParser in the I/O example
First, I'll give you the basic sample code
def main():
colors = vtkNamedColors()
#filename = get_program_parameters()
filename = "Torso.vtp"
# Read all the data from the file
reader = vtkXMLPolyDataReader()
reader.SetFileName(filename)
reader.Update()
# Visualize
mapper = vtkPolyDataMapper()
mapper.SetInputConnection(reader.GetOutputPort())
actor = vtkActor()
actor.SetMapper(mapper)
actor.GetProperty().SetColor(colors.GetColor3d('NavajoWhite'))
renderer = vtkRenderer()
renderWindow = vtkRenderWindow()
renderWindow.AddRenderer(renderer)
renderWindowInteractor = vtkRenderWindowInteractor()
renderWindowInteractor.SetRenderWindow(renderWindow)
renderer.AddActor(actor)
renderer.SetBackground(colors.GetColor3d('DarkOliveGreen'))
renderer.GetActiveCamera().Pitch(90)
renderer.GetActiveCamera().SetViewUp(0, 0, 1)
renderer.ResetCamera()
renderWindow.SetSize(600, 600)
renderWindow.Render()
renderWindow.SetWindowName('ReadPolyData')
renderWindowInteractor.Start()
Then, the example requires the user to change the filename parameter to their own file path, I put the test file (.vtp) in the root directory, of course, the code can also be indexed correctly.
But after I tried to run it, the output window did not show the corresponding picture, only the background. The output window is generated along with an error window. But I don't quite understand the cause of the error and how to fix it
enter image description hereenter image description here
The following are the error messages:
2023-01-20 00:01:06.185 ( 0.043s) [ ] vtkXMLParser.cxx:379 ERR| vtkXMLDataParser (000001C077E63030): Error parsing XML in stream at line 1, column 0, byte index 0: syntax error
2023-01-20 00:01:06.235 ( 0.093s) [ ] vtkXMLReader.cxx:521 ERR| vtkXMLPolyDataReader (000001C07A3DC8A0): Error parsing input file. ReadXMLInformation aborting.
2023-01-20 00:01:06.239 ( 0.097s) [ ] vtkExecutive.cxx:741 ERR| vtkCompositeDataPipeline (000001C077E82410): Algorithm vtkXMLPolyDataReader (000001C07A3DC8A0) returned failure for request: vtkInformation (000001C07A6B6930)
Debug: Off
Modified Time: 98
Reference Count: 1
Registered Events: (none)
Request: REQUEST_INFORMATION
FORWARD_DIRECTION: 0
ALGORITHM_AFTER_FORWARD: 1
First, what is the cause of the error and how to solve it?
Second, I have never touched the configuration of the XML file of the VTK library, I just introduced the operation of the VTK library (version 9.2), how to solve the problem of this XML file?

Google cloud function (python) does not deploy - Function failed on loading user code

I'm calling a simple python function in google cloud but cannot get it to save. It shows this error:
"Function failed on loading user code. This is likely due to a bug in the user code. Error message: Error: please examine your function logs to see the error cause: https://cloud.google.com/functions/docs/monitoring/logging#viewing_logs. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging. Please visit https://cloud.google.com/functions/docs/troubleshooting for in-depth troubleshooting documentation."
Logs don't seem to show much that would indicate error in the code. I followed this guide: https://blog.thereportapi.com/automate-a-daily-etl-of-currency-rates-into-bigquery/
With the only difference environment variables and the endpoint I'm using.
Code is below, which is just a get request followed by a push of data into a table.
import requests
import json
import time;
import os;
from google.cloud import bigquery
# Set any default values for these variables if they are not found from Environment variables
PROJECT_ID = os.environ.get("PROJECT_ID", "xxxxxxxxxxxxxx")
EXCHANGERATESAPI_KEY = os.environ.get("EXCHANGERATESAPI_KEY", "xxxxxxxxxxxxxxx")
REGIONAL_ENDPOINT = os.environ.get("REGIONAL_ENDPOINT", "europe-west1")
DATASET_ID = os.environ.get("DATASET_ID", "currency_rates")
TABLE_NAME = os.environ.get("TABLE_NAME", "currency_rates")
BASE_CURRENCY = os.environ.get("BASE_CURRENCY", "SEK")
SYMBOLS = os.environ.get("SYMBOLS", "NOK,EUR,USD,GBP")
def hello_world(request):
latest_response = get_latest_currency_rates();
write_to_bq(latest_response)
return "Success"
def get_latest_currency_rates():
PARAMS={'access_key': EXCHANGERATESAPI_KEY , 'symbols': SYMBOLS, 'base': BASE_CURRENCY}
response = requests.get("https://api.exchangeratesapi.io/v1/latest", params=PARAMS)
print(response.json())
return response.json()
def write_to_bq(response):
# Instantiates a client
bigquery_client = bigquery.Client(project=PROJECT_ID)
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset(DATASET_ID)
table_ref = dataset_ref.table(TABLE_NAME)
table = bigquery_client.get_table(table_ref)
# get the current timestamp so we know how fresh the data is
timestamp = time.time()
jsondump = json.dumps(response) #Returns a string
# Ensure the Response is a String not JSON
rows_to_insert = [{"timestamp":timestamp,"data":jsondump}]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
print(errors)
assert errors == []
I tried just the part that does the get request with an offline editor and I can confirm a response works fine. I suspect it might have to do something with permissions or the way the script tries to access the database.

It is possible to upload a YouTube video description file with the Google python script?

I am using the Google python script to upload videos.
#!/usr/bin/python
import http.client #httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
MAX_RETRIES = 10
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, http.client.NotConnected,
http.client.IncompleteRead, http.client.ImproperConnectionState,
http.client.CannotSendRequest, http.client.CannotSendHeader,
http.client.ResponseNotReady, http.client.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# https://console.developers.google.com/.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows an application to upload files to the
# authenticated user's YouTube channel, but doesn't allow other types of access.
YOUTUBE_UPLOAD_SCOPE = "https://www.googleapis.com/auth/youtube.upload"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the Developers Console
https://console.developers.google.com/
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
VALID_PRIVACY_STATUSES = ("public", "private", "unlisted")
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,
scope=YOUTUBE_UPLOAD_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
return build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
http=credentials.authorize(httplib2.Http()))
def initialize_upload(youtube, options):
tags = None
if options.keywords:
tags = options.keywords.split(",")
body=dict(
snippet=dict(
title=options.title,
description=options.description,
tags=tags,
categoryId=options.category
),
status=dict(
privacyStatus=options.privacyStatus
)
)
# Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
part=",".join(body.keys()),
body=body,
# The chunksize parameter specifies the size of each chunk of data, in
# bytes, that will be uploaded at a time. Set a higher value for
# reliable connections as fewer chunks lead to faster uploads. Set a lower
# value for better recovery on less reliable connections.
#
# Setting "chunksize" equal to -1 in the code below means that the entire
# file will be uploaded in a single HTTP request. (If the upload fails,
# it will still be retried where it left off.) This is usually a best
# practice, but if you're using Python older than 2.6 or if you're
# running on App Engine, you should set the chunksize to something like
# 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(options.file, chunksize=-1, resumable=True)
)
resumable_upload(insert_request)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request):
response = None
error = None
retry = 0
while response is None:
try:
print ("Uploading file...")
status, response = insert_request.next_chunk()
if 'id' in response:
print ("Video id '%s' was successfully uploaded." % response['id'])
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError as e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS as e:
error = "A retriable error occurred: %s" % e
if error is not None:
print (error)
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print ("Sleeping %f seconds and then retrying..." % sleep_seconds)
time.sleep(sleep_seconds)
if __name__ == '__main__':
argparser.add_argument("--file", required=True, help="Video file to upload")
argparser.add_argument("--title", help="Video title", default="Test Title")
argparser.add_argument("--description", help="Video description",
default="Test Description")
argparser.add_argument("--category", default="22",
help="Numeric video category. " +
"See https://developers.google.com/youtube/v3/docs/videoCategories/list")
argparser.add_argument("--keywords", help="Video keywords, comma separated",
default="")
argparser.add_argument("--privacyStatus", choices=VALID_PRIVACY_STATUSES,
default=VALID_PRIVACY_STATUSES[0], help="Video privacy status.")
args = argparser.parse_args()
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
initialize_upload(youtube, args)
except HttpError as e:
print ("An HTTP error %d occurred:\n%s" % (e.resp.status, e.content))
The problem is the --description parameter. Only allow put one text line. And i need to put several lines with line jumps ('\n'). ¿it is possible to do this from another way?
Will be wonderful if this parameter (or other param) would allow a file text path to upload the description, like the "--file" parameter does.
There is something i can i do to solve this?
Or maybe one place where i'll can to contact with google developers to ask them if is posible to reimplement the initialize_upload(youtube, args) function to get it works like i say?

Yes it is possible!!
We have to add the --description-file option.
Google please, do a complete manual of your API!!!

Allow user to download ZIP from Django view

My main task is to have the user press a Download button and download file "A.zip" from the query directory.
The reason I have a elif request.POST..... is because I have another condition checking if the "Execute" button was pressed. This execute button runs a script. Both POST actions work, and the dir_file is C:\Data\Folder.
I followed and read many tutorials and responses as to how to download a file from Django, and I cannot figure out why my simple code does not download a file.
What am I missing? The code does not return any errors. Does anybody have any documentation that can explain what I am doing wrong?
I am expecting an automatic download of the file, but does not occur.
elif request.POST['action'] == 'Download':
query = request.POST['q']
dir_file = query + "A.zip"
zip_file = open(dir_file, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=%s' % 'foo_zip'
zip_file.close()

I found out my answer.
After reading through many documentation about this, I left out the most important aspect of this feature which is the url.
Basically, the function download_zip is called by the POST and runs script where the zip is downloaded.
Here is what I ended up doing:
elif request.POST['action'] == 'Download':
return(HttpResponseRedirect('/App/download'))
Created a view:
def download_zip(request):
zip_path = root + "A.zip"
zip_file = open(zip_path, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=%s' % 'A.zip'
response['Content-Length'] = os.path.getsize(zip_path)
zip_file.close()
return response
Finally in urls.py:
url(r'^download/$', views.download_zip, name='download_zip'),

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Apache Airflow did not recognize error in xml and report success - python-3.x

Related

Pytube and the new # channel urls

vtkXMLParser some bug issues and how to solve them

Google cloud function (python) does not deploy - Function failed on loading user code

It is possible to upload a YouTube video description file with the Google python script?

Allow user to download ZIP from Django view

Categories

Resources