Cannot read ID3 tag from mutagen in Tag&Rename - id3

I have a weird problem writing an ID3 tag to an MP3 file with mutagen.
More specifically the album sort tag (TSOA).
The code I am using:
from mutagen.id3 import ID3, TSOA
audio = ID3("sample.mp3")
audio.add(TSOA(encoding=1, text=u"Great Album, A"))
audio.save()
The result:
>> import mutagen
>> print(mutagen.File("sample.mp3"))
{'TSOA': TSOA(encoding=<Encoding.UTF16: 1>, text=['Great Album, A'])}
However, if I open the file in Tag&Rename 3.9.15 (latest), the field is empty.
Strangely, even if I write the tag with Tag&Rename and inspect the file with mutagen the result is exactly the same as with the Python code I used. That is {'TSOA': TSOA(encoding=<Encoding.UTF16: 1>, text=['Great Album, A'])}.
Is this a flaw in the Tag&Rename software, or am I missing something? Foobar2000 seems to be able to get the value. I did manage to get most of the other tags to work this way.

Related

PyTube downloads audio file in mp4 format. How to fix?

Good Afternoon. I'm beginner in python. So, I was trying to make YouTube Video/Audio Downloader with PyTube (For Educational Purpose only). I have seen many videos on youtube and I was trying to make this tool better. So I have added video/audio choosing option and resulation/quality choosing option. The Good thing is I have successfully made Video downloader. But I got a problem in the Audio Downloader. The problem is, PyTube downloads the audio file in MP4 format. I have searched on google and youtube. But I could not find any solution. I want to rename the from mp4 to mp3 (cause the file is OK, but the format is wrong). as a beginner, I don't know how to save a downloading file in somewhere else (temporary folder), and rename it then transfer it to output folder. I tried to add filename=link.title+'mp3'. But It returns this error: OSError: [Errno 22] Invalid argument: 'G:/Downloaded Videos/Latest English Ringtone | Turkish Bgm Ringtone 2021 | Bad Boy | Attitude Tone | Villain Ringtone.mp3'
here is my Code:
from pytube import YouTube
link='https://www.youtube.com/watch?v=KrhPrPK2owA'
link=YouTube(link)
print('Title:',link.title+'\n'+'Views:',link.views)
streams=link.streams.filter(type='audio')
kbps_list=[]
itag_list=[]
print('Available Kbps: ',end='')
for s in streams:
i=s.itag
s=s.abr
if s not in kbps_list:
kbps_list.append(s)
itag_list.append(i)
print(s,end=' ')
reso=input('\nEnter Kbps to download: ')
if reso not in kbps_list:
print('This Kbps is not available')
from sys import exit
exit()
reso=kbps_list.index(reso)
final=streams.get_by_itag(itag_list[reso])
print('Downloading...')
final.download('G:/Downloaded Videos/')
# final.download('G:/Downloaded Videos/', filename=link.title+'mp3')======================
# if I add custom filename, It returns the error ========================================
print('Successfully Downloaded!')
Need to set the file name and can download in mp3 as you tried but it will still be in the mp4a codec. Not sure if that matters to you.
Your title is all weird and it includes the file path. Probably why link.title is not working. Try the below code to strip the title and the file path.
import os
head tail = os.path.split(link.title)
final=streams.get_by_itag(itag_list[reso]).download(filename=tail.strip(" | ") + ".mp3")
You can also set output directory by passing the argument output_path="some location".
It would look like:
final=streams.get_by_itag(itag_list[reso]).download(output_path="some location", filename=tail.strip(" | ") + ".mp3")

Reading from a binary file and decoding using Python

I have a binary file from a mainframe which I'm trying to read using Python and produce a human readable text file. I'm still gathering more information about the file. What I do know is that the file serves as input to COBOL programs.
I try to read the file into python like this:
with open('P_MF.DAT', mode='rb') as f:
file_content = f.read(500)
When I print(file_content) I get something like:
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00########\x00\x00###\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0####\x00\x00\x00\x00\x00\x00\x00\x00###\x00\x00\x00\x00######\xf0\xf0\xf0\xf4\x00\x00\x00\x00\x08\x02\x00\x00Q\x08c\x18\x1f\xc5###\x00\x00\x000\x00\x00\x0f\x00\x00\x00\x01\x11?\x00\x00\x10\x02F\x17o##\xd5#\xc9\xd5\xc7\xd3\xc9\xe2#\xd4\xc1\xd9\xe8#############################\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00'
Then I tried this using the codecs module which also gives me gibberish:
import codecs
file_content1 = codecs.decode(file_content, 'cp500')
But I can see a few readable characters in the output when I print(file_content1):
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00 000000000000000 \x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00 0004\x00\x00\x00\x00\x97\x02\x00\x00é\x97Ä\x18\x1fE \x00\x00\x00\x90\x00\x00\x0f\x00\x00\x00\x01\x11\x1a\x00\x00\x10\x02ã\x87? N INGLIS MARY \x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00'
I've been googling around for a couple of days. Tried a number of things like this - Python read a binary file and decode
I feel like I'm getting nowhere with this problem. I also plan to ask how this file looks if read in a mainframe. I'd appreciate any info/help/advice at this point.
​

Unknown encoding of files in a resulting Beautiful Soup txt file

I downloaded 13 000 files (10-K reports from different companies) and I need to extract a specific part of these files (section 1A- Risk factors). The problem is that I can open these files in Word easily and they are perfect, while as I open them in a normal txt editor, the document appear to be an HTML with tons of encrypted string in the end (EDIT: I suspect this is due to XBRL format of these files). Same happens as a result of using BeautifulSoup.
I've tried using online decoder, because I thought that maybe this is connected to Base64 encoding, but it seems that none of the known encoding could help me. I saw that at the beginning of some files, there is something like: "created with Certent Disclosure Management 6.31.0.1" and other programs, I thought maybe this causes the encoding. Nevertheless Word is able to open these files, so I guess there must be a known key to it. This is a sample encoded data:
M1G2RBE#MN)T='1,SC4,]%$$Q71T3<XU#[AHMB9#*E1=E_U5CKG&(77/*(LY9
ME$N9MY/U9DC,- ZY:4Z0EWF95RMQY#J!ZIB8:9RWF;\"S+1%Z*;VZPV#(MO
MUCHFYAJ'V#6O8*[R9L<VI8[I8KYQB7WSC#DMFGR[E6+;7=2R)N)1Q\24XQ(K
MYQDS$>UJ65%MV4+(KBRHJ3HFIAR76#G/F$%=*9FOU*DM-6TSTC$Q\[C$YC$/
And a sample file from the 13 000 that I downloaded.
Below I insert the BeautifulSoup that I use to extract text. It does its' job, but I need to find a clue to this encoded string and somehow decode it in the Python code below.
from bs4 import BeautifulSoup
with open("98752-TOROTEL INC-10-K-2019-07-23", "r") as f:
contents = f.read()
soup = BeautifulSoup(contents, 'html.parser')
print(soup.getText())
with open("extracted_test.txt", "w", encoding="utf-8") as f:
f.write(soup.getText())
f.close()
What I want to achieve is decoding of this dummy string in the end of the file.
Ok, this is going to be somewhat messy, but will get you close enough to what you are looking for, without using regex (which is notoriously problematic with html). The fundamental problem you'll be facing is that EDGAR filings are VERY inconsistent in their formatting, so what may work for one 10Q (or 10K or 8K) filing may not work with a similar filing (even from the same filer...) For example, the word 'item' may appear in either lower or uppercase (or mixed), hence the use of the string.lower() method, etc. So there's going to be some cleanup, under all circumstances.
Having said that, the code below should get you the RISK FACTORS sections from both filings (including the one which has none):
url = [one of these two]
from bs4 import BeautifulSoup as bs
response = requests.get(url)
soup = bs(response.content, 'html.parser')
risks = soup.find_all('a')
for risk in risks:
if 'item' in str(risk.attrs).lower() and '1a' in str(risk.attrs).lower():
for i in risk.findAllNext():
if 'item' in str(i.attrs).lower():
break
else:
print(i.text.strip())
Good luck with your project!

Extracting title from pdf using pypdf2 not working

I'm trying to extract the title of PDF files using pyPDF2. The output is either none or a wrong title. I tried using PDFminer as well, still the same result. I tried using 3 different pdf files. Is there a better way to extract the title with better accuracy?
This is the code I used:
from PyPDF2 import PdfFileReader
def get_pdf_title(pdf_file_path):
pdf_reader = PdfFileReader(open(pdf_file_path, "rb"))
return pdf_reader.getDocumentInfo().title
title = get_pdf_title('C:/PythonPrograms/Test.pdf')
print(title)
Your code is working, at least for me on python 3.5.2. Check in the PDF properties that he indeed has a title.
PDF's title is part of its metadata, that needs to be set. It is not mandatory, not related to its content (other than by the will of the person writing it), nor with its filename.
If you use your snippet on a file with no title, it's output will be an empty string.

Linux script to transfer (ID3) tags from FLAC to MP3

For my media server, I am looking for ways to transfer tags from my FLAC files to MP3.
In a bash script, I can extract tags using metaflac to local vars, but when tagging mp3 with id3v2, I seem to lose national characters (guess it must be unicode?)
Also I need to be able to set replay gain tags, and album art (all present in the FLAC's).
I am looking for a scripted solution to run unattended.
If you are interested in a Python solution, the mutagen library looks really good.
It could be as easy as:
from mutagen.flac import FLAC
from mutagen.easyid3 import EasyID3
flacfile = FLAC("flacfile.flac")
mp3file = EasyID3("mp3file.mp3")
for tag in flacfile:
if tag in EasyID3.valid_keys.keys():
mp3file[tag] = flacfile[tag]
mp3file.save()
I found this solution for copying mp3 id3 tags into FLAC files.
Try this tool eyed3. It supports album art embedding, text encoding in latin1, utf8, utf16-BE and utf16-LE. However the replay gain is not supported. As far as I understand it is not widely supported.
Victor's solution showed me the way. It may fail, however, if copying tags to a file you've just converted, for example, from flac to mp3. That is, it will fail if the file you are copying tags to doesn't already have any tags.
So you may need to prime the destination file first, giving it the means to have tags.
from mutagen import File
from mutagen.flac import FLAC
from mutagen.easyid3 import EasyID3
from mutagen.id3 import ID3, ID3NoHeaderError
def convert_tags(f1,f2):
# f1: full path to file copying tags from
# f2: full path to file copying tags to
# http://stackoverflow.com/questions/8873364/linux-script-to-transfer-id3-tags-from-flac-to-mp3
# http://stackoverflow.com/a/18369606/2455413
try:
meta = EasyID3(f2)
except ID3NoHeaderError:
meta = File(f2, easy=True)
meta.add_tags()
meta.save()
from_f = FLAC(f1)
to_f = EasyID3(f2)
for tag in from_f:
if tag in EasyID3.valid_keys.keys(): to_f[tag] = from_f[tag]
to_f.save()
return
Here is another solution using ffmpeg.
Eg. just define a bash function in $HOME/.bashrc:
flac2mp3()
{
ffmpeg -i "$1" -ab 320k -map_metadata 0 -id3v2_version 3 "$(basename "$1" flac)mp3"
}

Resources