youtube-dl taking only first letter of URL? - python-3.x

StackOverflow!
I am having some problems with youtube-dl. I had this working recently, but I made some changes to it, and it now refuses to work. Here is my code:
import youtube_dl
import os
class InvalidURL(Exception):
pass
class SongExists(Exception):
pass
def download(url):
try:
options = {
'format': 'bestaudio/best',
'quiet': False,
'extractaudio': True, # only keep the audio
'audioformat': "wav", # convert to wav
'outtmpl': '%(id)s.wav', # name the file the ID of the video
'noplaylist': True, # only download single song, not playlist
}
with youtube_dl.YoutubeDL(options) as ydl:
r = ydl.extract_info(url ,download=False)
if os.path.isfile(str(r["id"])):
raise SongExists('This song has already been requested.')
print("Downloading", r["title"])
print(str(url))
ydl.download(url)
print("Downloaded", r["title"])
return r["title"], r["id"]
except youtube_dl.utils.DownloadError:
raise InvalidURL('This URL is invalid.')
if __name__ == "__main__":
download("https://www.youtube.com/watch?v=nvHyII4Dq-A")
As far as I can see, it appears that my script is taking the first letter out of my URL. Does anyone know why? As a "Side-quest", does anyone know how to take a search instead of a URL?
Here is my output:
[youtube] nvHyII4Dq-A: Downloading webpage
[youtube] nvHyII4Dq-A: Downloading video info webpage
[youtube] nvHyII4Dq-A: Extracting video information
WARNING: unable to extract uploader nickname
[youtube] nvHyII4Dq-A: Downloading MPD manifest
Downloading MagnusTheMagnus - Area
https://www.youtube.com/watch?v=nvHyII4Dq-A
ERROR: 'h' is not a valid URL. Set --default-search "ytsearch" (or run youtube-dl "ytsearch:h" ) to search YouTube
Traceback (most recent call last):
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 776, in extract_info
ie_result = ie.extract(url)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\extractor\common.py", line 433, in extract
ie_result = self._real_extract(url)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\extractor\generic.py", line 1993, in _real_extract
% (url, url), expected=True)
youtube_dl.utils.ExtractorError: 'h' is not a valid URL. Set --default-search "ytsearch" (or run youtube-dl "ytsearch:h" ) to search YouTube
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/tyler/PycharmProjects/PythonSpeakerThing/downloader.py", line 26, in download
ydl.download(url)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 1958, in download
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 799, in extract_info
self.report_error(compat_str(e), e.format_traceback())
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 604, in report_error
self.trouble(error_message, tb)
File "C:\Users\tyler\AppData\Local\Programs\Python\Python36-32\lib\site-packages\youtube_dl\YoutubeDL.py", line 574, in trouble
raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: 'h' is not a valid URL. Set --default-search "ytsearch" (or run youtube-dl "ytsearch:h" ) to search YouTube
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/tyler/PycharmProjects/PythonSpeakerThing/downloader.py", line 33, in <module>
download("https://www.youtube.com/watch?v=nvHyII4Dq-A")
File "C:/Users/tyler/PycharmProjects/PythonSpeakerThing/downloader.py", line 30, in download
raise InvalidURL('This URL is invalid.')
__main__.InvalidURL: This URL is invalid.

As documented, the ydl.download function takes a list of URLs. So instead of
ydl.download(url)
you want to call
ydl.download([url])
To run a search, first, look up the keyword by running youtube-dl --extractor-descriptions | grep search. For instance, the keyword for Soundcloud search is scsearch and the keyword for YouTube default search is ytsearch.
Then, simply pass the keyword and search terms, separated by a colon (:) as the URL.
For example, a URL of ytsearch:fluffy bunnies will find you the top video of fluffy bunnies on YouTube with the default search criteria.

Here is the sample :
sound_list = []
# bike sound
sound_list.append('https://www.youtube.com/watch?v=sRdRwHPjJPk')
# car sound
sound_list.append('https://www.youtube.com/watch?v=PPdNb-XQXR8')
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(sound_list)

Related

Pytube and the new # channel urls

I have been working on a python program for a little bit now that can take channel or video urls and convert them into a channel ID.
However my code doesn't seem to work with links that look like "http://youtube.com/#username"
if re.search ("/channel/", channelURL) or re.search ("#", channelURL) or re.search ("/user/", channelURL) or re.search ("/c/", channelURL):
# This code detects if the given URL is a channel. If the check comes back as True then it grabs the data using pytube.
c = Channel(channelURL)
channel_name = c.channel_name
channel_id = c.channel_id
channel_id_link = "http://youtube.com/channel/"+channel_id
print("Channel Name: "+channel_name)
print("Channel ID: "+channel_id)
print("Channel Link: "+channel_id_link)
You can see the full code here. https://github.com/flyinggoatman/YouTube-Link-Extractor/blob/master/QualityYouTube.py
What I expect, the code to be able pull the channel_name, channel_id and also the channel_id_link.
What happens?
The code runs but when I enter in a # youtube channel URL it returns the following
We have logged in as QualityYouTube Bot#2815
Using Discord channel: pending-channels
The bot has now fully booted up and may be used.
Please be advised this bot only supports one Discord server at a time. Future updates will allow for more than one server to be active at a time.
←[30;1m2022-12-30 02:20:50←[0m ←[31mERROR ←[0m ←[35mdiscord.client←[0m Ignoring exception in on_message
←[31mTraceback (most recent call last):
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 409, in _run_event
await coro(*args, **kwargs)
File "c:\Users[redacted]\test\QualityYouTube.py", line 101, in on_message
c = Channel(channelURL)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\contrib\channel.py", line 24, in init
self.channel_uri = extract.channel_name(url)
File "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py", line 185, in channel_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: channel_name: could not find match for patterns←[0m
I understand I don't think the code will run with the current workings. However, can i somehow take the url and use the regex "#(.*)" to grab the username, and then use pytube to find a video made by that channel? I could then take the video URL and use that to get the information I need instead.
Note I fixed the issue by coding new lines of code inside "C:\Users[redacted]\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\extract.py"
https://github.com/pytube/pytube/pull/1444
I added the following lines of code
- :samp:`https://youtube.com/#{channel_name}/*
r"(?:(#$[%\w\d_-]+)(.*)?)"
to this file inside the github for PyTube
https://github.com/pytube/pytube/blob/master/pytube/extract.py
The changes look like this...
def channel_name(url: str) -> str:
"""Extract the ``channel_name`` or ``channel_id`` from a YouTube url.
This function supports the following patterns:
- :samp:`https://youtube.com/c/{channel_name}/*`
- :samp:`https://youtube.com/channel/{channel_id}/*
- :samp:`https://youtube.com/u/{channel_name}/*`
- :samp:`https://youtube.com/user/{channel_id}/*
- :samp:`https://youtube.com/#{channel_name}/*`
:param str url:
A YouTube url containing a channel name.
:rtype: str
:returns:
YouTube channel name.
"""
patterns = [
r"(?:\/(c)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(channel)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:\/(u)\/([%\d\w_\-]+)(\/.*)?)",
r"(?:\/(user)\/([%\w\d_\-]+)(\/.*)?)",
r"(?:(#[%\w\d_-]+)(.*)?)"
]
for pattern in patterns:
regex = re.compile(pattern)
function_match = regex.search(url)
if function_match:
logger.debug("finished regex search, matched: %s", pattern)
uri_style = function_match.group(1)
uri_identifier = function_match.group(2)
return f'/{uri_style}/{uri_identifier}'
raise RegexMatchError(
caller="channel_name", pattern="patterns"
)

Cannot use custom key words with pvporcupine

I have already created an account in picovoice and recieved an access key, but when I try to put the path of my ppn file, it shows an error:
`
[ERROR] loading keyword file at 'C:\Users\Priyam\Desktop\hey keyazip' failed with 'IO_ERROR'
Traceback (most recent call last):
File "e:\Personal Project\import struct.py", line 13, in <module>
porcupine = pvporcupine.create(access_key='access key',
File "C:\Users\Priyam\AppData\Roaming\Python\Python310\site-packages\pvporcupine\__init__.py", line 77, in create
return Porcupine(
File "C:\Users\Priyam\AppData\Roaming\Python\Python310\site-packages\pvporcupine\porcupine.py", line 158, in __init__
raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]()
pvporcupine.porcupine.PorcupineIOError
the code is:
`
porcupine=None
paud=None
audio_stream=None
try:
access_key="access key"
porcupine = pvporcupine.create(access_key='access key',
keyword_paths=['C:\\Users\Priyam\Desktop\hey keyazip'],keywords=['hey keya']) #pvporcupine.KEYWORDS for all keywords
paud=pyaudio.PyAudio()
audio_stream=paud.open(rate=porcupine.sample_rate,channels=1,format=pyaudio.paInt16,input=True,frames_per_buffer=porcupine.frame_length)
while True:
keyword=audio_stream.read(porcupine.frame_length)
keyword=struct.unpack_from("h"*porcupine.frame_length,keyword)
keyword_index=porcupine.process(keyword)
if keyword_index>=0:
print("hotword detected")
finally:
if porcupine is not None:
porcupine.delete()
if audio_stream is not None:
audio_stream.close()
if paud is not None:
paud.terminate()
`
I tried the code above and the code provided by picovoice itself, yet I am facing the same issues
It looks like Porcupine is having trouble finding your keyword file: [ERROR] loading keyword file at 'C:\Users\Priyam\Desktop\hey keyazip' failed with 'IO_ERROR'.
The Picovoice console provides the keyword inside a compressed .zip file. You will need to decompress the file and in your code update the path to lead to the .ppn file located inside. For example: C:\\Users\Priyam\Desktop\hey-keya_en_windows_v2_1_0\hey-keya_en_windows_v2_1_0.ppn

Pandas - Suppress call to _validate_archive? - Python 3.9.13/Pandas 1.4.2/Openpyxl 3.0.10

I have been digging through questions/answers for the BadZipFile exception when calling read_excel() using the openpyxl engine. I looked at my error stack and dug into the Python files and it looks like ZipFile.py is being very strict when validating an archive. It is looking for an EOCD (end of central directory) signature in my XLSX archive file.
When unzipping an archive, if the EOCD cannot be found or validated, there is supposed to be an error thrown when calling unzip in Linux, but I am not seeing it. I am unsure whether the EOCD is there/correct or not (anyone know of a tool to check?).
However, from looking through my stack (below) I am examining what is happening in openpyxl/reader/excel.py. At line 67, the _validate_archive function is defined. I am wondering about the examination for a "file like object".
My use case is an AWS Lambda function which has an HTTP endpoint. I POST an Excel file (I am testing with Postman and using the binary body for the request where I select my Excel file) to the endpoint. The function needs to handle both CSV and XLSX. I include a custom header in which I specify the original file name. I split the filename, look at the extension, and either call read_csv or read_excel. read_csv is working great.
Either way, the file is coming in as base64. For an XLSX file, Pandas handles this OK - up until we get to _validate_archive... What I am unsure of is how the "file like object" check at line 76...
is_file_like = hasattr(filename, 'read')
... interacts with the type by which the Base64 is handled. I am trying straight string (event["body"]), the bytes() object, the BytesIO class, and the StringIO class, all to the same BadZipFile exception.
So... is it possible in Pandas/Openpyxl to suppress the validation of the archive? I want to be able to call read_excel() but not have the archive validated and see what happens.
My error stack:
"Error: (<class 'zipfile.BadZipFile'>, BadZipFile('File is not a zip file'),
<traceback object at 0x7f1019589dc0>)\r\n<class 'zipfile.BadZipFile'>\r\n
File is not a zip file\r\nTraceback (most recent call last):\n
File "/var/task/lambda_function.py", line 20, in lambda_handler\n inv = pd.read_excel( bufferedString, engine='openpyxl', index_col=0 )\n
File "/opt/python/pandas/util/_decorators.py", line 311, in wrapper\n return func(*args, **kwargs)\n
File "/opt/python/pandas/io/excel/_base.py", line 457, in read_excel\n io = ExcelFile(io, storage_options=storage_options, engine=engine)\n
File "/opt/python/pandas/io/excel/_base.py", line 1419, in init\n self._reader = self._engines[engine](self._io, storage_options=storage_options)\n
File "/opt/python/pandas/io/excel/_openpyxl.py", line 525, in init\n super().init(filepath_or_buffer, storage_options=storage_options)\n
File "/opt/python/pandas/io/excel/_base.py", line 518, in init\n self.book = self.load_workbook(self.handles.handle)\n
File "/opt/python/pandas/io/excel/_openpyxl.py", line 536, in load_workbook\n return load_workbook(\n
File "/opt/python/openpyxl/reader/excel.py", line 315, in load_workbook\n reader = ExcelReader(filename, read_only, keep_vba,\n
File "/opt/python/openpyxl/reader/excel.py", line 124, in init\n self.archive = _validate_archive(fn)\n
File "/opt/python/openpyxl/reader/excel.py", line 96, in _validate_archive\n archive = ZipFile(filename, 'r')\n
File "/var/lang/lib/python3.9/zipfile.py", line 1264, in init\n self._RealGetContents()\n
File "/var/lang/lib/python3.9/zipfile.py", line 1331, in _RealGetContents\n
raise BadZipFile("File is not a zip file")\n
zipfile.BadZipFile: File is not a zip file\n"

Python image save error - raise ValueError("unknown file extension: {}".format(ext)) from e ValueError: unknown file extension:

Am just having four weeks of experience in Python. Creating a tool using Tkinter to paste a new company logo on top of the existing images.
The Below method is to, get all images in the given directory and paste the new logo on the initial level. Existing image, edited image, x-position, y-position, a preview of the image and few data's are store in global instance self.images_all_arr.
def get_img_copy(self):
self.images_all_arr = []
existing_img_fldr = self.input_frame.input_frame_data['existing_img_folder']
for file in os.listdir(existing_img_fldr):
img_old = Image.open(os.path.join(existing_img_fldr, file))
img_new_copy = img_old.copy()
self.pasteImage(img_new_copy, initpaste=True) #process to paste new logo.
view_new_img = ImageTk.PhotoImage(img_new_copy)
fname, fext = file.split('.')
formObj = {
"fname": fname,
"fext": fext,
"img_old": img_old,
"img_new": img_new_copy,
"img_new_view": view_new_img,
"add_logo": 1,
"is_default": 1,
"is_opacityImg": 0,
"pos_x": self.defult_x.get(),
"pos_y": self.defult_y.get()
}
self.images_all_arr.append(formObj)
After previewing each image in Tkinter screen, doing some adjustment in position x and y(updating pos_x and pos_y in the list self.images_all_arr) depends upon the necessity.
Well, once all done. Need to save the edited images. Below method to save images, iterating the list self.images_all_arr and call save method as img['img_new'].save(dir_output) since img['img_new'] has updated image.
def generate_imgae(self):
if len(self.images_all_arr):
dir_output = 'xxxxx'
for img in self.images_all_arr:
print(img['img_new'])
img['img_new'].save(dir_output)
print('completed..')
But it returns below error,
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Program Files (x86)\Python38-32\lib\site-packages\PIL\Image.py", line 2138, in save
format = EXTENSION[ext]
KeyError: ''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python38-32\lib\tkinter_init_.py", line 1883, in call
return self.func(*args)
File "C:\Users\662828\WORKSPACE\AAA_python_projects\AMI_automation_poc2\position_and_images.py", line 241, in generate_imgae
img['img_new'].save(dir_output)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\PIL\Image.py", line 2140, in save
raise ValueError("unknown file extension: {}".format(ext)) from e
ValueError: unknown file extension:
dir_output doesn't contain the file extension, its just xxxxx. You need to specify what image file format you want. the error tells us this by saying "unknown file format".
Basically, you either need to include the extension in the file name, or pass it as the next parameter in image.save. You can check out the documentation here
eg.
image.save('image.png')
or
image.save('image', 'png')
The below code solved my issue. By giving the exact directory, filename and extension of the image as param to the method image.save(). Here the result of opfile is, C:\Users\WORKSPACE\AAA_python_projects\test\valid.png.
def generate_imgae(self):
if len(self.images_all_arr):
dir_output = r"C:\Users\WORKSPACE\AAA_python_projects\test"
if not os.path.isdir(dir_output):
os.mkdir(dir_output)
for img in self.images_all_arr:
opfile = os.path.join(dir_output, '{}.{}'.format(img['fname'], img['fext'] ))
img['img_new'].save(opfile)
print('completed..')
Thanks #dantechguy

Trouble sending a batch create entity request in dialogflow

I have defined the following function. The purpose is to make batch create entity request with dialogflow client. I am using this method after sending many individual tests did not scale well.
The problem seems to be the line that defines EntityType. Seems like "entityType" is not valid but that is what is in the dialogflow v2 documentation which is the current version I am using.
Any ideas on what the issue is?
def create_batch_entity_types(self):
client = self.get_entity_client()
print(DialogFlowClient.batch_list)
EntityType = {
"entityTypes": DialogFlowClient.batch_list
}
response = client.batch_update_entity_types(parent=AGENT_PATH, entity_type_batch_inline=EntityType)
def callback(operation_future):
# Handle result.
result = operation_future.result()
print(result)
response.add_done_callback(callback)
After running the function I received this error
Traceback (most recent call last):
File "df_client.py", line 540, in <module>
create_entity_types_from_database()
File "df_client.py", line 426, in create_entity_types_from_database
df.create_batch_entity_types()
File "/Users/andrewflorial/Documents/PROJECTS/curlbot/dialogflow/dialogflow_accessor.py", line 99, in create_batch_entity_types
response = client.batch_update_entity_types(parent=AGENT_PATH, entity_type_batch_inline=EntityType)
File "/Users/andrewflorial/Documents/PROJECTS/curlbot/venv/lib/python3.7/site-packages/dialogflow_v2/gapic/entity_types_client.py", line 767, in batch_update_entity_types
update_mask=update_mask,
ValueError: Protocol message EntityTypeBatch has no "entityTypes" field.
The argument for entity_type_batch_inline must have the same form as EntityTypeBatch.
Look how that type looks like: https://dialogflow-python-client-v2.readthedocs.io/en/latest/gapic/v2/types.html#dialogflow_v2.types.EntityTypeBatch
It has to have entity_types field, not entityTypes.

Resources