Create and use WAV file as an object Python - python-3.x

I am creating a personal assistant in Python. I am using Snowboy to record audio, and it works very well. Snowboy has a saveMessage() method that creates and writes a wav file to the disk. This wav file is later read and used as an AudioFile object by Speech_Recognition. I find it very inefficient that the program has to write and read wav files to the disk. I would much rather have the wav file be passed around as an object without EVER saving it to the disk.
Here is the snowboy saveMessage() module that I would like to reweite.
def saveMessage(self):
"""
Save the message stored in self.recordedData to a timestamped file.
"""
filename = 'output' + str(int(time.time())) + '.wav'
data = b''.join(self.recordedData)
#use wave to save data
wf = wave.open(filename, 'wb')
wf.setnchannels(1)
wf.setsampwidth(self.audio.get_sample_size(
self.audio.get_format_from_width(
self.detector.BitsPerSample() / 8)))
wf.setframerate(self.detector.SampleRate())
wf.writeframes(data)
wf.close()
logger.debug("finished saving: " + filename)
return filename #INSTEAD OF RETURNING filename I WANT THIS TO RETURN THE wav file object
Please note that the AudioFile class requires that the path for the wave file OR a "file-like" object must be passed into it. I am not sure what a "file-like" object is, so I will provide the AudioFile assert statement for the wav file argument:
assert isinstance(filename_or_fileobject, (type(""), type(u""))) or hasattr(filename_or_fileobject, "read"), "Given audio file must be a filename string or a file-like object"
I have tried to use an instance of BytesIO to save the wav data, BytesIO is apparently not a file-like object. Here was what I tried:
def saveMessage(self):
filename = 'output' + str(int(time.time())) + '.wav'
data = b''.join(self.recordedData)
#use wave to save data
with io.BytesIO() as wav_file:
wav_writer = wave.open(wav_file, "wb")
try:
wav_writer.setnchannels(1)
wav_writer.setsampwidth(self.audio.get_sample_size(
self.audio.get_format_from_width(
self.detector.BitsPerSample() / 8)))
wav_writer.setframerate(self.detector.SampleRate())
wav_writer.writeframes(data)
wav_data = wav_file.getvalue()
finally:
wav_writer.close()
logger.debug("finished saving: " + filename)
return wav_data
The error I got was: AssertionError: Given audio file must be a filename string or a file-like object
I am using python 3.7 on a Raspberry PI 3B+ running Raspbian Buster Lite kernel version 4.19.36.
If I can provide any additional information or clarify anything, please ask.
Thanks so much!

Something like this should work:
from speech_recognition import AudioData
def saveMessage(self):
filename = 'output' + str(int(time.time())) + '.wav'
data = b''.join(self.recordedData)
ad = AudioData(data, 16000, 2)
result = recognizer.recognize_google(ad)
Note that speech_recognition.listen can invoke snowboy internally, so you probably don't have to use external snowboy, you can just use listen with parameter snowboy_configuration.

Related

Passing base64 .docx to docx.Document results in BadZipFile exception

I'm writing an Azure function in Python 3.9 that needs to accept a base64 string created from a known .docx file which will serve as a template. My code will decode the base64, pass it to a BytesIO instance, and pass that to docx.Document(). However, I'm receiving an exception BadZipFile: File is not a zip file.
Below is a slimmed down version of my code. It fails on document = Document(bytesIODoc). I'm beginning to think it's an encoding/decoding issue, but I don't know nearly enough about it to get to the solution.
from docx import Document
from io import BytesIO
import base64
var = {
'template': 'Some_base64_from_docx_file',
'data': {'some': 'data'}
}
run_stuff = ParseBody(body=var)
output = run_stuff.run()
class ParseBody():
def __init__(self, body):
self.template = str(body['template'])
self.contents = body['data']
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesIODoc = BytesIO(b64Doc)
document = Document(bytesIODoc)
def run(self):
self.document = self._decode_template()
I've also tried the following change to _decode_template and am getting the same exception. This is running base64.decodebytes() on the b64Doc object and passing that to BytesIO instead of directly passing b64Doc.
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc = BytesIO(bytesDoc)
I have successfully tried the following on the same exact .docx file to be sure that this is possible. I can open the document in Python, base64 encode it, decode into bytes, pass that to a BytesIO instance, and pass that to docx.Document successfully.
file = r'WordTemplate.docx'
doc = open(file, 'rb').read()
b64Doc = base64.b64encode(doc)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc= BytesIO(bytesDoc)
newDoc = Document(bytesIODoc)
I've tried countless other solutions to no avail that have lead me further away from a resolution. This is the closest I've gotten. Any help is greatly appreciated!
The answer to the question linked below actually helped me resolve my own issue. How to generate a DOCX in Python and save it in memory?
All I had to do was change document = Document(bytesIODoc) to the following:
document = Document()
document.save(bytesIODoc)

How to get filename from a file object in PYTHON?

I am using below code where I am using PUT api from POSTMAN to send a file to a machine hosting the api using python script
#app.route('/uploadFIle', methods=['PUT'])
def uploadFile():
chunk_size = 4096
with open("/Users/xyz/Documents/filename", 'wb') as f:
while True:
chunk = request.stream.read(chunk_size)
if len(chunk) == 0:
break
f.write(chunk)
return jsonify({"success":"File transfer initiated"})
Is there a way to get the original filename so that I can use the same while saving the file ?
Can do as below by passing name from PUT api itself, but is it the best solution ?
#app.route('/uploadFIle/<string:filename>', methods=['PUT'])
def uploadFile(filename):
Below is how I achieved it using flask -
Choose form-data under body in POSTMAN
You can give any key, i used 'file' as key, then choose option 'file' from drop down arrow in key column
Attach file under 'value' column and use below code to get the file name -
from flask import request
file = request.files['file']
file_name = file.filename

Download pdf file(Not restricted) from google drive through URL

import os
import requests
def download_file(download_url: str, filename: str):
"""
Download resume pdf file from storage
#param download_url: URL of reusme to be downloaded
#type download_url: str
#param filename: Name and location of file to be stored
#type filename: str
#return: None
#rtype: None
"""
file_request = requests.get(download_url)
with open(f'{filename}.pdf', 'wb+') as file:
file.write(file_request.content)
cand_id = "101"
time_current = "801"
file_location = f"{cand_id}_{time_current}"
download_file("https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf", file_location)
cand_id = "201"
time_current = "901"
download_file("https://drive.google.com/file/d/0B1HXnM1lBuoqMzVhZjcwNTAtZWI5OS00ZDg3LWEyMzktNzZmYWY2Y2NhNWQx/view?hl=en&resourcekey=0-5DqnTtXPFvySMiWstuAYdA", file_location)
----------
First file is working perfectly fine (i.e. 101_801.pdf)
But Second one is not able to open in any pdf reader(i.e.
201_901.pdf)(Error: We can't open this file).
What I understood is I'm not able properly read and write for file
from drive which is open for all. How to read that file and write?
I can use google drive API but can we have better solution without
using that ?
I tried out the code and couldnt open the PDF file as well. I suggest trying out gdown package. It is easy to use and you can download even large files from google drive. I used it in my class to download .sql db files (+-20Gb) for my assignments.
If you want to build more on this code, then you should probably check out Drive API. It is a well documented fast API.
I was able to find the solution for it through wget in python. Answering it so that it could help someone in the future.
import os
import wget
def download_candidate_resume(email: str, resume_url: str):
"""
This function is used to download resume from google drive and store on the local system
#param email: candidate email
#type email: str
#param resume_url: url of resume on google drive
#type resume_url: str
"""
file_extension = "pdf"
current_time = datetime.now()
file_name = f'{email}_{int(current_time.timestamp())}.{file_extension}'
temp_file_path = os.path.join(
os.getcwd(),
f'{email}_{int(current_time.timestamp())}.{file_extension}',
)
downloadable_resume_url = re.sub(
r"https://drive\.google\.com/file/d/(.*?)/.*?\?usp=sharing",
r"https://drive.google.com/uc?export=download&id=\1",
resume_url,
)
wget.download(downloadable_resume_url, out=temp_file_path)

User wand by python to convert pdf to jepg, raise wand.exceptions.WandRuntimeError in docker

I want to convert the first page of pdf to an image. And my below code is working well in my local environment: Ubuntu 18. But when I run in the docker environment, it fails and raises:
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but
did raise ImageMagick exception. This can occurs when a delegate is
missing, or returns EXIT_SUCCESS without generating a raster.
Am I missing a dependency? Or something else? I don't know what it's referring to as 'delegate'.
I saw the source code, it fails in here: wand/image.py::7873lines
if blob is not None:
if not isinstance(blob, abc.Iterable):
raise TypeError('blob must be iterable, not ' +
repr(blob))
if not isinstance(blob, binary_type):
blob = b''.join(blob)
r = library.MagickReadImageBlob(self.wand, blob, len(blob))
elif filename is not None:
filename = encode_filename(filename)
r = library.MagickReadImage(self.wand, filename)
if not r:
self.raise_exception()
msg = ('MagickReadImage returns false, but did raise ImageMagick '
'exception. This can occurs when a delegate is missing, or '
'returns EXIT_SUCCESS without generating a raster.')
raise WandRuntimeError(msg)
The line r = library.MagickReadImageBlob(self.wand, blob, len(blob)) returns true in my local environment, but in the docker it returns false. Moreover, the args blob and len(blob) is same.
def pdf2img(fp, page=0):
"""
convert pdf to jpeg image
:param fp: a file-like object
:param page:
:return: (Bool, File) if False, mean the `fp` is not pdf, if True, then the `File` is a file-like object
contain the `jpeg` format data
"""
try:
reader = PdfFileReader(fp, strict=False)
except Exception as e:
fp.seek(0)
return False, None
else:
bytes_in = io.BytesIO()
bytes_out = io.BytesIO()
writer = PdfFileWriter()
writer.addPage(reader.getPage(page))
writer.write(bytes_in)
bytes_in.seek(0)
im = Image(file=bytes_in, resolution=120)
im.format = 'jpeg'
im.save(file=bytes_out)
bytes_out.seek(0)
return True, bytes_out
I don't know what it's referring to as 'delegate'.
With ImageMagick, a 'delegate' refers to any shared library, utility, or external program that does the actual encoding & decoding of file type. Specifically, a file format to a raster.
Am I missing a dependency?
Most likely. For PDF, you would need a ghostscript installed on the docker instance.
Or something else?
Possible, but hard to determine without an error message. The "WandRuntimeError" exception is a catch-all. It exists because a raster could not be generated from the PDF, and both Wand & ImageMagick can not determine why. Usually there would be an exception if the delegate failed, security policy message, or an OS error.
Best thing would be to run a few gs commands to see if ghostscript is working correctly.
gs -sDEVICE=pngalpha -o page-%03d.png -r120 input.pdf
If the above works, then try again just with ImageMagick
convert -density 120 input.pdf page-%03d.png

Validate and save a ZIP file in Flask

I'm writing an app using Flask and one of the things I want in it is the ability to upload a ZIP that conforms to a specific form; as such I have a small function that takes a FileStorage object from the form and first unzips it, checks the contents, and then tries to save. There's a problem, however - apparently unzipping it "breaks" the FileStorage object, as the following function:
def upload_modfile(modfile):
if not modfile.filename.endswith('.zip'):
raise ModError('Incorrect filename')
mod_path = join(get_mods_path(), secure_filename(modfile.filename))
if isfile(mod_path):
raise ModError('File ' + modfile.filename + ' already exists')
modzip = ZipFile(modfile)
base_filename = modfile.filename[:-4]
modzip_contents = modzip.namelist()
if join(base_filename, 'info.json') not in modzip_contents:
raise ModError('Could not validate file')
modfile.save(mod_path)
return True
saves modfile as a text file saying Archive: <filename>.zip. If I comment out the entire ZipFile bit (i.e. everything involving modzip), the file is saved just fine.
I'm pretty much brand new to Python and am a little confused as to what to do in this case, save for saving the file in /tmp. Should I somehow clone modfile by way of some stream? Is there a way of "rewinding" the stream pointer within FileStorage that I'm missing?

Resources