Hello Community I have a question for you, regarding a Python function. in Diser Function I read a large XML into a JSON format. In doing so I want to check if there is an tag within the XML at the first or last position, if so then raise error etc.
Unfortunately I have the problem that sometimes the function doesn't seem to recognize this tag and Apache Airflow then reports a success back.
For this I have then built a second function in advance, which checks the xml in advance via a Beautifulsoup.
But now I always get a failed reported.
Can you explain to me why the "old" checked a success reports, but soup cancels ?
Should I combine the two or is there a more elegant solution in general ?
def parse_large_xml(xml_file_name, json_file_name, extra_fields=None, clean_keys=True):
"""
Converts SAP xml response to json.
- processes the xml file iteratively (one tag at a time)
- capable to deal with large files
Args:
xml_file_name: input filename
json_file_name: output filename
extra_fields: extra fields to add to the json
clean_keys: flag, if set remove prefixes from the response keys
"""
######### Extra check #####
# This is the new extra check
with open(xml_file_name, 'r') as xMl_File:
data = xMl_File.read()
if "error" in set(tag.name for tag in BeautifulSoup(data, 'xml').find_all()):
logging.info(tag.name)
errorMsg= f"error in response file \"{xml_file_name} (XML contains error tag)"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
########################################################################################
# This is the old Check
if extra_fields is None:
extra_fields = {}
with open(json_file_name, 'w') as json_file:
for event, elem in progressbar.progressbar(ET.iterparse(xml_file_name, events=('start', 'end'))):
if 'content' in elem.tag and event == 'end':
elem_dict = xmltodict.parse(tostring(elem))
if clean_keys:
elem_dict = clean_response_keys(elem_dict)
response_dict = {'raw_json_response': json.dumps(elem_dict)}
response_dict = {**extra_fields, **response_dict}
response_dict['hash'] = hashlib.sha256(
response_dict['raw_json_response'].encode()).hexdigest()
response_dict['date'] = get_scheduled_date()
json.dump(response_dict, json_file)
json_file.write('\n')
elem.clear()
elif 'error' in elem.tag and event == 'end':
errString = f"error in response file \"{xml_file_name}\":\n{tostring(elem)}"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
I´m using Apache Airflow 1.10.15 and composer: 1.16.10 as well as Python 3.
Here is an example error xml as it is returned but not recognized
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<code>
DBSQL_CONNECTION_NO_METADATA
</code>
<message>
Runtime Error: 'DBSQL_CONNECTION_NO_METADATA'. The OData request processing has been abnormal terminated. If "Runtime Error" is not initial, launch transaction ST22 for details and analysis. Otherwise, launch transaction SM21 for system log analysis.
</message>
<timestamp>
20220210031242
</timestamp>
</error>
Related
I have been working on a python program for a little bit now that can take channel or video urls and convert them into a channel ID.
However my code doesn't seem to work with links that look like "http://youtube.com/#username"
if re.search ("/channel/", channelURL) or re.search ("#", channelURL) or re.search ("/user/", channelURL) or re.search ("/c/", channelURL):
# This code detects if the given URL is a channel. If the check comes back as True then it grabs the data using pytube.
c = Channel(channelURL)
channel_name = c.channel_name
channel_id = c.channel_id
channel_id_link = "http://youtube.com/channel/"+channel_id
print("Channel Name: "+channel_name)
print("Channel ID: "+channel_id)
print("Channel Link: "+channel_id_link)
You can see the full code here. https://github.com/flyinggoatman/YouTube-Link-Extractor/blob/master/QualityYouTube.py
What I expect, the code to be able pull the channel_name, channel_id and also the channel_id_link.
What happens?
The code runs but when I enter in a # youtube channel URL it returns the following
We have logged in as QualityYouTube Bot#2815
Using Discord channel: pending-channels
The bot has now fully booted up and may be used.
Please be advised this bot only supports one Discord server at a time. Future updates will allow for more than one server to be active at a time.
←[30;1m2022-12-30 02:20:50←[0m ←[31mERROR ←[0m ←[35mdiscord.client←[0m Ignoring exception in on_message
←[31mTraceback (most recent call last):
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 409, in _run_event
await coro(*args, **kwargs)
File "c:\Users[redacted]\test\QualityYouTube.py", line 101, in on_message
c = Channel(channelURL)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\contrib\channel.py", line 24, in init
self.channel_uri = extract.channel_name(url)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py", line 185, in channel_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: channel_name: could not find match for patterns←[0m
I understand I don't think the code will run with the current workings. However, can i somehow take the url and use the regex "#(.*)" to grab the username, and then use pytube to find a video made by that channel? I could then take the video URL and use that to get the information I need instead.
Note I fixed the issue by coding new lines of code inside "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py"
https://github.com/pytube/pytube/pull/1444
I added the following lines of code
- :samp:`https://youtube.com/#{channel_name}/*
r"(?:(#$[%\w\d_-]+)(.*)?)"
to this file inside the github for PyTube
https://github.com/pytube/pytube/blob/master/pytube/extract.py
The changes look like this...
def channel_name(url: str) -> str:
"""Extract the ``channel_name`` or ``channel_id`` from a YouTube url.
This function supports the following patterns:
- :samp:`https://youtube.com/c/{channel_name}/*`
- :samp:`https://youtube.com/channel/{channel_id}/*
- :samp:`https://youtube.com/u/{channel_name}/*`
- :samp:`https://youtube.com/user/{channel_id}/*
- :samp:`https://youtube.com/#{channel_name}/*`
:param str url:
A YouTube url containing a channel name.
:rtype: str
:returns:
YouTube channel name.
"""
patterns = [
r"(?:\/(c)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(channel)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:\/(u)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(user)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:(#[%\w\d_-]+)(.*)?)"
]
for pattern in patterns:
regex = re.compile(pattern)
function_match = regex.search(url)
if function_match:
logger.debug("finished regex search, matched: %s", pattern)
uri_style = function_match.group(1)
uri_identifier = function_match.group(2)
return f'/{uri_style}/{uri_identifier}'
raise RegexMatchError(
caller="channel_name", pattern="patterns"
)
I'm currently learning based on the example from the official VTK documentation, but I'm having some issues during calling vtkXMLDataParser in the I/O example
First, I'll give you the basic sample code
def main():
colors = vtkNamedColors()
#filename = get_program_parameters()
filename = "Torso.vtp"
# Read all the data from the file
reader = vtkXMLPolyDataReader()
reader.SetFileName(filename)
reader.Update()
# Visualize
mapper = vtkPolyDataMapper()
mapper.SetInputConnection(reader.GetOutputPort())
actor = vtkActor()
actor.SetMapper(mapper)
actor.GetProperty().SetColor(colors.GetColor3d('NavajoWhite'))
renderer = vtkRenderer()
renderWindow = vtkRenderWindow()
renderWindow.AddRenderer(renderer)
renderWindowInteractor = vtkRenderWindowInteractor()
renderWindowInteractor.SetRenderWindow(renderWindow)
renderer.AddActor(actor)
renderer.SetBackground(colors.GetColor3d('DarkOliveGreen'))
renderer.GetActiveCamera().Pitch(90)
renderer.GetActiveCamera().SetViewUp(0, 0, 1)
renderer.ResetCamera()
renderWindow.SetSize(600, 600)
renderWindow.Render()
renderWindow.SetWindowName('ReadPolyData')
renderWindowInteractor.Start()
Then, the example requires the user to change the filename parameter to their own file path, I put the test file (.vtp) in the root directory, of course, the code can also be indexed correctly.
But after I tried to run it, the output window did not show the corresponding picture, only the background. The output window is generated along with an error window. But I don't quite understand the cause of the error and how to fix it
enter image description hereenter image description here
The following are the error messages:
2023-01-20 00:01:06.185 ( 0.043s) [ ] vtkXMLParser.cxx:379 ERR| vtkXMLDataParser (000001C077E63030): Error parsing XML in stream at line 1, column 0, byte index 0: syntax error
2023-01-20 00:01:06.235 ( 0.093s) [ ] vtkXMLReader.cxx:521 ERR| vtkXMLPolyDataReader (000001C07A3DC8A0): Error parsing input file. ReadXMLInformation aborting.
2023-01-20 00:01:06.239 ( 0.097s) [ ] vtkExecutive.cxx:741 ERR| vtkCompositeDataPipeline (000001C077E82410): Algorithm vtkXMLPolyDataReader (000001C07A3DC8A0) returned failure for request: vtkInformation (000001C07A6B6930)
Debug: Off
Modified Time: 98
Reference Count: 1
Registered Events: (none)
Request: REQUEST_INFORMATION
FORWARD_DIRECTION: 0
ALGORITHM_AFTER_FORWARD: 1
First, what is the cause of the error and how to solve it?
Second, I have never touched the configuration of the XML file of the VTK library, I just introduced the operation of the VTK library (version 9.2), how to solve the problem of this XML file?
I'm calling a simple python function in google cloud but cannot get it to save. It shows this error:
"Function failed on loading user code. This is likely due to a bug in the user code. Error message: Error: please examine your function logs to see the error cause: https://cloud.google.com/functions/docs/monitoring/logging#viewing_logs. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging. Please visit https://cloud.google.com/functions/docs/troubleshooting for in-depth troubleshooting documentation."
Logs don't seem to show much that would indicate error in the code. I followed this guide: https://blog.thereportapi.com/automate-a-daily-etl-of-currency-rates-into-bigquery/
With the only difference environment variables and the endpoint I'm using.
Code is below, which is just a get request followed by a push of data into a table.
import requests
import json
import time;
import os;
from google.cloud import bigquery
# Set any default values for these variables if they are not found from Environment variables
PROJECT_ID = os.environ.get("PROJECT_ID", "xxxxxxxxxxxxxx")
EXCHANGERATESAPI_KEY = os.environ.get("EXCHANGERATESAPI_KEY", "xxxxxxxxxxxxxxx")
REGIONAL_ENDPOINT = os.environ.get("REGIONAL_ENDPOINT", "europe-west1")
DATASET_ID = os.environ.get("DATASET_ID", "currency_rates")
TABLE_NAME = os.environ.get("TABLE_NAME", "currency_rates")
BASE_CURRENCY = os.environ.get("BASE_CURRENCY", "SEK")
SYMBOLS = os.environ.get("SYMBOLS", "NOK,EUR,USD,GBP")
def hello_world(request):
latest_response = get_latest_currency_rates();
write_to_bq(latest_response)
return "Success"
def get_latest_currency_rates():
PARAMS={'access_key': EXCHANGERATESAPI_KEY , 'symbols': SYMBOLS, 'base': BASE_CURRENCY}
response = requests.get("https://api.exchangeratesapi.io/v1/latest", params=PARAMS)
print(response.json())
return response.json()
def write_to_bq(response):
# Instantiates a client
bigquery_client = bigquery.Client(project=PROJECT_ID)
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset(DATASET_ID)
table_ref = dataset_ref.table(TABLE_NAME)
table = bigquery_client.get_table(table_ref)
# get the current timestamp so we know how fresh the data is
timestamp = time.time()
jsondump = json.dumps(response) #Returns a string
# Ensure the Response is a String not JSON
rows_to_insert = [{"timestamp":timestamp,"data":jsondump}]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
print(errors)
assert errors == []
I tried just the part that does the get request with an offline editor and I can confirm a response works fine. I suspect it might have to do something with permissions or the way the script tries to access the database.
I am using the Google python script to upload videos.
#!/usr/bin/python
import http.client #httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
MAX_RETRIES = 10
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, http.client.NotConnected,
http.client.IncompleteRead, http.client.ImproperConnectionState,
http.client.CannotSendRequest, http.client.CannotSendHeader,
http.client.ResponseNotReady, http.client.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# https://console.developers.google.com/.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows an application to upload files to the
# authenticated user's YouTube channel, but doesn't allow other types of access.
YOUTUBE_UPLOAD_SCOPE = "https://www.googleapis.com/auth/youtube.upload"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the Developers Console
https://console.developers.google.com/
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
VALID_PRIVACY_STATUSES = ("public", "private", "unlisted")
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,
scope=YOUTUBE_UPLOAD_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
return build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
http=credentials.authorize(httplib2.Http()))
def initialize_upload(youtube, options):
tags = None
if options.keywords:
tags = options.keywords.split(",")
body=dict(
snippet=dict(
title=options.title,
description=options.description,
tags=tags,
categoryId=options.category
),
status=dict(
privacyStatus=options.privacyStatus
)
)
# Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
part=",".join(body.keys()),
body=body,
# The chunksize parameter specifies the size of each chunk of data, in
# bytes, that will be uploaded at a time. Set a higher value for
# reliable connections as fewer chunks lead to faster uploads. Set a lower
# value for better recovery on less reliable connections.
#
# Setting "chunksize" equal to -1 in the code below means that the entire
# file will be uploaded in a single HTTP request. (If the upload fails,
# it will still be retried where it left off.) This is usually a best
# practice, but if you're using Python older than 2.6 or if you're
# running on App Engine, you should set the chunksize to something like
# 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(options.file, chunksize=-1, resumable=True)
)
resumable_upload(insert_request)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request):
response = None
error = None
retry = 0
while response is None:
try:
print ("Uploading file...")
status, response = insert_request.next_chunk()
if 'id' in response:
print ("Video id '%s' was successfully uploaded." % response['id'])
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError as e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS as e:
error = "A retriable error occurred: %s" % e
if error is not None:
print (error)
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print ("Sleeping %f seconds and then retrying..." % sleep_seconds)
time.sleep(sleep_seconds)
if __name__ == '__main__':
argparser.add_argument("--file", required=True, help="Video file to upload")
argparser.add_argument("--title", help="Video title", default="Test Title")
argparser.add_argument("--description", help="Video description",
default="Test Description")
argparser.add_argument("--category", default="22",
help="Numeric video category. " +
"See https://developers.google.com/youtube/v3/docs/videoCategories/list")
argparser.add_argument("--keywords", help="Video keywords, comma separated",
default="")
argparser.add_argument("--privacyStatus", choices=VALID_PRIVACY_STATUSES,
default=VALID_PRIVACY_STATUSES[0], help="Video privacy status.")
args = argparser.parse_args()
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
initialize_upload(youtube, args)
except HttpError as e:
print ("An HTTP error %d occurred:\n%s" % (e.resp.status, e.content))
The problem is the --description parameter. Only allow put one text line. And i need to put several lines with line jumps ('\n'). ¿it is possible to do this from another way?
Will be wonderful if this parameter (or other param) would allow a file text path to upload the description, like the "--file" parameter does.
There is something i can i do to solve this?
Or maybe one place where i'll can to contact with google developers to ask them if is posible to reimplement the initialize_upload(youtube, args) function to get it works like i say?
Yes it is possible!!
We have to add the --description-file option.
Google please, do a complete manual of your API!!!
My main task is to have the user press a Download button and download file "A.zip" from the query directory.
The reason I have a elif request.POST..... is because I have another condition checking if the "Execute" button was pressed. This execute button runs a script. Both POST actions work, and the dir_file is C:\Data\Folder.
I followed and read many tutorials and responses as to how to download a file from Django, and I cannot figure out why my simple code does not download a file.
What am I missing? The code does not return any errors. Does anybody have any documentation that can explain what I am doing wrong?
I am expecting an automatic download of the file, but does not occur.
elif request.POST['action'] == 'Download':
query = request.POST['q']
dir_file = query + "A.zip"
zip_file = open(dir_file, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=%s' % 'foo_zip'
zip_file.close()
I found out my answer.
After reading through many documentation about this, I left out the most important aspect of this feature which is the url.
Basically, the function download_zip is called by the POST and runs script where the zip is downloaded.
Here is what I ended up doing:
elif request.POST['action'] == 'Download':
return(HttpResponseRedirect('/App/download'))
Created a view:
def download_zip(request):
zip_path = root + "A.zip"
zip_file = open(zip_path, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=%s' % 'A.zip'
response['Content-Length'] = os.path.getsize(zip_path)
zip_file.close()
return response
Finally in urls.py:
url(r'^download/$', views.download_zip, name='download_zip'),