Pytube and the new # channel urls - python-3.x

I have been working on a python program for a little bit now that can take channel or video urls and convert them into a channel ID.
However my code doesn't seem to work with links that look like "http://youtube.com/#username"
if re.search ("/channel/", channelURL) or re.search ("#", channelURL) or re.search ("/user/", channelURL) or re.search ("/c/", channelURL):
# This code detects if the given URL is a channel. If the check comes back as True then it grabs the data using pytube.
c = Channel(channelURL)
channel_name = c.channel_name
channel_id = c.channel_id
channel_id_link = "http://youtube.com/channel/"+channel_id
print("Channel Name: "+channel_name)
print("Channel ID: "+channel_id)
print("Channel Link: "+channel_id_link)
You can see the full code here. https://github.com/flyinggoatman/YouTube-Link-Extractor/blob/master/QualityYouTube.py
What I expect, the code to be able pull the channel_name, channel_id and also the channel_id_link.
What happens?
The code runs but when I enter in a # youtube channel URL it returns the following
We have logged in as QualityYouTube Bot#2815
Using Discord channel: pending-channels
The bot has now fully booted up and may be used.
Please be advised this bot only supports one Discord server at a time. Future updates will allow for more than one server to be active at a time.
←[30;1m2022-12-30 02:20:50←[0m ←[31mERROR ←[0m ←[35mdiscord.client←[0m Ignoring exception in on_message
←[31mTraceback (most recent call last):
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 409, in _run_event
await coro(*args, **kwargs)
File "c:\Users[redacted]\test\QualityYouTube.py", line 101, in on_message
c = Channel(channelURL)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\contrib\channel.py", line 24, in init
self.channel_uri = extract.channel_name(url)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py", line 185, in channel_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: channel_name: could not find match for patterns←[0m
I understand I don't think the code will run with the current workings. However, can i somehow take the url and use the regex "#(.*)" to grab the username, and then use pytube to find a video made by that channel? I could then take the video URL and use that to get the information I need instead.

Note I fixed the issue by coding new lines of code inside "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py"
https://github.com/pytube/pytube/pull/1444
I added the following lines of code
- :samp:`https://youtube.com/#{channel_name}/*
r"(?:(#$[%\w\d_-]+)(.*)?)"
to this file inside the github for PyTube
https://github.com/pytube/pytube/blob/master/pytube/extract.py
The changes look like this...
def channel_name(url: str) -> str:
"""Extract the ``channel_name`` or ``channel_id`` from a YouTube url.
This function supports the following patterns:
- :samp:`https://youtube.com/c/{channel_name}/*`
- :samp:`https://youtube.com/channel/{channel_id}/*
- :samp:`https://youtube.com/u/{channel_name}/*`
- :samp:`https://youtube.com/user/{channel_id}/*
- :samp:`https://youtube.com/#{channel_name}/*`
:param str url:
A YouTube url containing a channel name.
:rtype: str
:returns:
YouTube channel name.
"""
patterns = [
r"(?:\/(c)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(channel)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:\/(u)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(user)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:(#[%\w\d_-]+)(.*)?)"
]
for pattern in patterns:
regex = re.compile(pattern)
function_match = regex.search(url)
if function_match:
logger.debug("finished regex search, matched: %s", pattern)
uri_style = function_match.group(1)
uri_identifier = function_match.group(2)
return f'/{uri_style}/{uri_identifier}'
raise RegexMatchError(
caller="channel_name", pattern="patterns"
)

Related

Apache Airflow did not recognize error in xml and report success

Hello Community I have a question for you, regarding a Python function. in Diser Function I read a large XML into a JSON format. In doing so I want to check if there is an tag within the XML at the first or last position, if so then raise error etc.
Unfortunately I have the problem that sometimes the function doesn't seem to recognize this tag and Apache Airflow then reports a success back.
For this I have then built a second function in advance, which checks the xml in advance via a Beautifulsoup.
But now I always get a failed reported.
Can you explain to me why the "old" checked a success reports, but soup cancels ?
Should I combine the two or is there a more elegant solution in general ?
def parse_large_xml(xml_file_name, json_file_name, extra_fields=None, clean_keys=True):
"""
Converts SAP xml response to json.
- processes the xml file iteratively (one tag at a time)
- capable to deal with large files
Args:
xml_file_name: input filename
json_file_name: output filename
extra_fields: extra fields to add to the json
clean_keys: flag, if set remove prefixes from the response keys
"""
######### Extra check #####
# This is the new extra check
with open(xml_file_name, 'r') as xMl_File:
data = xMl_File.read()
if "error" in set(tag.name for tag in BeautifulSoup(data, 'xml').find_all()):
logging.info(tag.name)
errorMsg= f"error in response file \"{xml_file_name} (XML contains error tag)"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
########################################################################################
# This is the old Check
if extra_fields is None:
extra_fields = {}
with open(json_file_name, 'w') as json_file:
for event, elem in progressbar.progressbar(ET.iterparse(xml_file_name, events=('start', 'end'))):
if 'content' in elem.tag and event == 'end':
elem_dict = xmltodict.parse(tostring(elem))
if clean_keys:
elem_dict = clean_response_keys(elem_dict)
response_dict = {'raw_json_response': json.dumps(elem_dict)}
response_dict = {**extra_fields, **response_dict}
response_dict['hash'] = hashlib.sha256(
response_dict['raw_json_response'].encode()).hexdigest()
response_dict['date'] = get_scheduled_date()
json.dump(response_dict, json_file)
json_file.write('\n')
elem.clear()
elif 'error' in elem.tag and event == 'end':
errString = f"error in response file \"{xml_file_name}\":\n{tostring(elem)}"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
I´m using Apache Airflow 1.10.15 and composer: 1.16.10 as well as Python 3.
Here is an example error xml as it is returned but not recognized
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<code>
DBSQL_CONNECTION_NO_METADATA
</code>
<message>
Runtime Error: 'DBSQL_CONNECTION_NO_METADATA'. The OData request processing has been abnormal terminated. If "Runtime Error" is not initial, launch transaction ST22 for details and analysis. Otherwise, launch transaction SM21 for system log analysis.
</message>
<timestamp>
20220210031242
</timestamp>
</error>

I can't seem to make the google.cloud.texttospeech to work

Im using Python 3.8 and i copy pasted this code as a test.
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
This is the code that is shown by google as can be seen here : GOOGLE LINK
Now my problem is that i get this error
PS C:\Users\User\Desktop> & C:/Users/User/AppData/Local/Programs/Python/Python38/python.exe "c:/Users/User/Desktop/from google.cloud import texttospeech.py"
Traceback (most recent call last):
File "c:/Users/User/Desktop/from google.cloud import texttospeech.py", line 7, in <module>
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")
AttributeError: module 'google.cloud.texttospeech' has no attribute 'types'
PS C:\Users\User\Desktop>
I tried changeing this to add the credentials inside the code but the problem persists.
This is the line i changed:
client = texttospeech.TextToSpeechClient(credentials="VoiceAutomated-239f1c05600c.json")
I could solve this error by downgrading the library:
pip3 install "google-cloud-texttospeech<2.0.0"
I got the same error when running that script, i checked the source code and the interface has changed, basically you need to delete all "enums" and "types". It will look similar to this:
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL)
# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(input=synthesis_input, voice=voice, audio_config=audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
I debug the code and to get it to work i had to write enums and types when needed. Taking the text to speech google documentation example and including some little adjusments:
"""Synthesizes speech from the input string of text or ssml.
Note: ssml must be well-formed according to:
https://www.w3.org/TR/speech-synthesis/
"""
from google.cloud import texttospeech
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./config/credentials.json"
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL
)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3
)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input_=synthesis_input, voice=voice, audio_config=audio_config
)
# The response's audio_content is binary.
with open("./output_tts/output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
hope this works for you
It will work Python 3.6 but it won't work with Python 3.7 with latest update of google-cloud-texttospeech. If you want us it with Python 3.7 Try the below code.
from google.cloud import texttospeech
def foo():
client = texttospeech.TextToSpeechClient(credentials=your_google_creds_here)
translated_text = Text
synthesis_input = texttospeech.types.SynthesisInput(text=translated_text)
pitch = 1
speaking_rate = 1
lang_code = 'en-us' # your_lang_code_hear
gender = 'male'
gender_data = {
'NEUTRAL': texttospeech.enums.SsmlVoiceGender.NEUTRAL,
'FEMALE': texttospeech.enums.SsmlVoiceGender.FEMALE,
'MALE': texttospeech.enums.SsmlVoiceGender.MALE
}
voice = texttospeech.types.VoiceSelectionParams(language_code=lang_code, ssml_gender=gender_data[gender.upper()])
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3, speaking_rate=float(speaking_rate), pitch=float(pitch)
)
print('Voice config and Audio config : ', voice, audio_config)
response = client.synthesize_speech(
synthesis_input, voice, audio_config)
You need to migrate to version 2.0 visit the site below for details on the changes you need to make since you most likely followed a tutorial using an older version of texttospeech:
https://googleapis.dev/python/texttospeech/2.0.0/UPGRADING.html
I will also include an example using the beta version of 2.0.0.
import google.cloud.texttospeech_v1beta1 as ts
import time
nm = "en-US-Wavenet-I"
hz = 48000
def useTextToSpeech(speaking, lang, speed,stinger):
client = ts.TextToSpeechClient()
synthesis_input = ts.SynthesisInput(text=speaking)
voice = ts.VoiceSelectionParams(
language_code=lang,
ssml_gender=ts.SsmlVoiceGender.MALE,
name=nm,
)
audio_config = ts.AudioConfig(
audio_encoding=ts.AudioEncoding.OGG_OPUS,
speaking_rate=speed,
pitch = 1.2,
sample_rate_hertz=hz,
effects_profile_id=['headphone-class-device' ],
)
response = client.synthesize_speech(
request={
"input": synthesis_input,
"voice":voice,
"audio_config":audio_config
}
)
with open((stinger+'.opus'), 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file as "'+stinger+'.opus"')
from playsound import playsound
import os
#playsound(os.path.abspath((stinger+'.opus')))
output = str("Make sure when you follow tutorials they are using the most up to date version of the Api!")
useTextToSpeech(output, "en-US-Wavenet-I",1.0,("example"+str(1)))

Trouble sending a batch create entity request in dialogflow

I have defined the following function. The purpose is to make batch create entity request with dialogflow client. I am using this method after sending many individual tests did not scale well.
The problem seems to be the line that defines EntityType. Seems like "entityType" is not valid but that is what is in the dialogflow v2 documentation which is the current version I am using.
Any ideas on what the issue is?
def create_batch_entity_types(self):
client = self.get_entity_client()
print(DialogFlowClient.batch_list)
EntityType = {
"entityTypes": DialogFlowClient.batch_list
}
response = client.batch_update_entity_types(parent=AGENT_PATH, entity_type_batch_inline=EntityType)
def callback(operation_future):
# Handle result.
result = operation_future.result()
print(result)
response.add_done_callback(callback)
After running the function I received this error
Traceback (most recent call last):
File "df_client.py", line 540, in <module>
create_entity_types_from_database()
File "df_client.py", line 426, in create_entity_types_from_database
df.create_batch_entity_types()
File "/Users/andrewflorial/Documents/PROJECTS/curlbot/dialogflow/dialogflow_accessor.py", line 99, in create_batch_entity_types
response = client.batch_update_entity_types(parent=AGENT_PATH, entity_type_batch_inline=EntityType)
File "/Users/andrewflorial/Documents/PROJECTS/curlbot/venv/lib/python3.7/site-packages/dialogflow_v2/gapic/entity_types_client.py", line 767, in batch_update_entity_types
update_mask=update_mask,
ValueError: Protocol message EntityTypeBatch has no "entityTypes" field.
The argument for entity_type_batch_inline must have the same form as EntityTypeBatch.
Look how that type looks like: https://dialogflow-python-client-v2.readthedocs.io/en/latest/gapic/v2/types.html#dialogflow_v2.types.EntityTypeBatch
It has to have entity_types field, not entityTypes.

twilio/python/flask: Timeouts for TWIMLET-implemented voice mail?

I'm using a python3 flask REST-ful application to control my twilio-based phone services. Everything is working very well, but I have a question for which I haven't been able to find an answer.
When I want to redirect a caller to voicemail, I call the following voice_mail function from my REST interface, and I manage the voicemail via a twimlet, as follows ...
def voice_mail():
vmbase = 'http://twimlets.com/voicemail?Email=USER#DOMAIN.COM&Transcribe=False'
vmurl = '... URL pointing to an mp3 with my voicemail greeting ...'
return redirect(
'{}&Message={}'.format(vmbase, vmurl),
302
)
This works fine, but there doesn't seem to be any way to control how long the caller's voicemail message can last. I'd like to put an upper limit on that duration.
Is there any way via this twimlet (or perhaps a different twimlet) to force the voicemail recording to be cut off after a configurable amount of time?
If not, is there a default maximum duration of this twimlet-based voicemail recording?
And if neither of these are available, can someone point me to a python-based way of using other twilio features to ...
Route the caller to voice mail
Play a specified recording
Capture the caller's message.
Cut off the caller's recording after a configurable number of seconds.
Email a notification of the call and the URL of the recorded message to a specified email address.
I know that the existing twimlet performs items 1, 2, 3, and 5, but I don't know how to implement item 4.
Thank you.
Given that the voicemail twimlet doesn't allow a recording time limit to be specified, I was able to solve this without a twimlet in the following manner. And the sending of email turns out to be simple via the flask_mail package.
The following code fragments show how I did it ...
import phonenumbers
from flask import Flask, request, Response, url_for, send_file
from flask_mail import Mail, Message
app = Flask(__name__)
mail_settings = {
'MAIL_SERVER' : 'mailserver.example.com',
'MAIL_PORT' : 587,
'MAIL_USE_TLS' : False,
'MAIL_USE_SSL' : False,
'MAIL_USERNAME' : 'USERNAME',
'MAIL_PASSWORD' : 'PASSWORD'
}
app.config.update(mail_settings)
email = Mail(app)
# ... etc. ...
def voice_mail():
vmurl = '... URL pointing to an mp3 with my voicemail greeting ...'
resp = VoiceResponse()
resp.play(vmurl, loop=1)
resp.record(
timeout=5,
action=url_for('vmdone'),
method='GET',
maxLength=30, # maximum recording length
playBeep=True
)
return Response(str(resp), 200, mimetype='application/xml')
#app.route('/vmdone', methods=['GET', 'POST'])
def vmdone():
resp = VoiceResponse()
rcvurl = request.args.get('RecordingUrl', None)
rcvtime = request.args.get('RecordingDuration', None)
rcvfrom = request.args.get('From', None)
if not rcvurl or not rcvtime or not rcvfrom:
resp.hangup()
return Response(str(resp), 200, mimetype='application/xml')
rcvurl = '{}.mp3'.format(rcvurl)
rcvfrom = phonenumbers.format_number(
phonenumbers.parse(rcvfrom, None),
phonenumbers.PhoneNumberFormat.NATIONAL
)
msg = Message(
'Voicemail',
sender='sender#example.com',
recipients=['recipient#example.com']
)
msg.html = '''
<html>
<body>
<p>Voicemail from {0}</p>
<p>Duration: {1} sec</p>
<p>Message: {2}</p>
</body>
</html>
'''.format(rcvfrom, rcvtime, rcvurl)
email.send(msg)
return Response(str(resp), 200, mimetype='application/xml')
Also, it would be nice if someone could add a parameter to the voicemail twimlet for specifying the maximum recording duration. It should be as simple as using that parameter's value for setting maxLength in the arguments to the record() verb.
If someone could point me to the twimlets source code, I'm willing to write that logic, myself.

youtube-dl taking only first letter of URL?

StackOverflow!
I am having some problems with youtube-dl. I had this working recently, but I made some changes to it, and it now refuses to work. Here is my code:
import youtube_dl
import os
class InvalidURL(Exception):
pass
class SongExists(Exception):
pass
def download(url):
try:
options = {
'format': 'bestaudio/best',
'quiet': False,
'extractaudio': True, # only keep the audio
'audioformat': "wav", # convert to wav
'outtmpl': '%(id)s.wav', # name the file the ID of the video
'noplaylist': True, # only download single song, not playlist
}
with youtube_dl.YoutubeDL(options) as ydl:
r = ydl.extract_info(url ,download=False)
if os.path.isfile(str(r["id"])):
raise SongExists('This song has already been requested.')
print("Downloading", r["title"])
print(str(url))
ydl.download(url)
print("Downloaded", r["title"])
return r["title"], r["id"]
except youtube_dl.utils.DownloadError:
raise InvalidURL('This URL is invalid.')
if __name__ == "__main__":
download("https://www.youtube.com/watch?v=nvHyII4Dq-A")
As far as I can see, it appears that my script is taking the first letter out of my URL. Does anyone know why? As a "Side-quest", does anyone know how to take a search instead of a URL?
Here is my output:
[youtube] nvHyII4Dq-A: Downloading webpage
[youtube] nvHyII4Dq-A: Downloading video info webpage
[youtube] nvHyII4Dq-A: Extracting video information
WARNING: unable to extract uploader nickname
[youtube] nvHyII4Dq-A: Downloading MPD manifest
Downloading MagnusTheMagnus - Area
https://www.youtube.com/watch?v=nvHyII4Dq-A
ERROR: 'h' is not a valid URL. Set --default-search "ytsearch" (or run youtube-dl "ytsearch:h" ) to search YouTube
Traceback (most recent call last):
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 776, in extract_info
ie_result = ie.extract(url)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\extractor\common.py", line 433, in extract
ie_result = self._real_extract(url)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\extractor\generic.py", line 1993, in _real_extract
% (url, url), expected=True)
youtube_dl.utils.ExtractorError: 'h' is not a valid URL. Set --default-search "ytsearch" (or run youtube-dl "ytsearch:h" ) to search YouTube
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/tyler/PycharmProjects/PythonSpeakerThing/downloader.py", line 26, in download
ydl.download(url)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 1958, in download
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 799, in extract_info
self.report_error(compat_str(e), e.format_traceback())
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 604, in report_error
self.trouble(error_message, tb)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 574, in trouble
raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: 'h' is not a valid URL. Set --default-search "ytsearch" (or run youtube-dl "ytsearch:h" ) to search YouTube
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/tyler/PycharmProjects/PythonSpeakerThing/downloader.py", line 33, in <module>
download("https://www.youtube.com/watch?v=nvHyII4Dq-A")
File "C:/Users/tyler/PycharmProjects/PythonSpeakerThing/downloader.py", line 30, in download
raise InvalidURL('This URL is invalid.')
__main__.InvalidURL: This URL is invalid.
As documented, the ydl.download function takes a list of URLs. So instead of
ydl.download(url)
you want to call
ydl.download([url])
To run a search, first, look up the keyword by running youtube-dl --extractor-descriptions | grep search. For instance, the keyword for Soundcloud search is scsearch and the keyword for YouTube default search is ytsearch.
Then, simply pass the keyword and search terms, separated by a colon (:) as the URL.
For example, a URL of ytsearch:fluffy bunnies will find you the top video of fluffy bunnies on YouTube with the default search criteria.
Here is the sample :
sound_list = []
# bike sound
sound_list.append('https://www.youtube.com/watch?v=sRdRwHPjJPk')
# car sound
sound_list.append('https://www.youtube.com/watch?v=PPdNb-XQXR8')
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(sound_list)

Resources