unable to read the .WAV file using python

unable to read the .WAV file using python - python-3.x

I am trying to read a FM signal which is recorded as WAV file using GNU radio Companion, using python. I am attaching the .grc file used.
I can clearly hear the recorded signal but reading the data gives a null ([] ) value.
The python code
import soundfile as sf
data, fs = sf.read('/home/fm_record_RSM_10_01_2019_dat.wav')
for i in data:
print(i)
this gives
data
array([], dtype=float64)
fs
96000
When the following code is used,
import wave
input_wave_file= wave.open('/home/fm_record_RSM_10_01_2019_dat.wav', 'r')
nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
another error is raised as given below
Error Traceback (most recent call last)
<ipython-input-3-5009fe3506e7> in <module>()
1 import wave
2
----> 3 input_wave_file= wave.open('/home/fm_record_RSM_10_01_2019_dat.wav', 'r')
4 nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
5 frame_data = input_wave_file.readframes(5)
~/anaconda3/lib/python3.7/wave.py in open(f, mode)
508 mode = 'rb'
509 if mode in ('r', 'rb'):
--> 510 return Wave_read(f)
511 elif mode in ('w', 'wb'):
512 return Wave_write(f)
~/anaconda3/lib/python3.7/wave.py in __init__(self, f)
162 # else, assume it is an open file object already
163 try:
--> 164 self.initfp(f)
165 except:
166 if self._i_opened_the_file:
~/anaconda3/lib/python3.7/wave.py in initfp(self, file)
131 raise Error('file does not start with RIFF id')
132 if self._file.read(4) != b'WAVE':
--> 133 raise Error('not a WAVE file')
134 self._fmt_chunk_read = 0
135 self._data_chunk = None
Error: not a WAVE file
Could someone help me to find what the problem could be? Is it because of any mistake in the setting of record wav block in .grc file or any mistake in python file? Kindly help
Thanks a lot
Msr

#! /usr/bin/env python3
import soundfile as sf
import wave
import sys
if len(sys.argv) < 2:
print("Expected filename.wav on cmdline")
quit(1)
data, fs = sf.read(sys.argv[1])
for i in range(10):
print(i)
print('...')
input_wave_file= wave.open(sys.argv[1], 'r')
nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
print('nc', nc)
print('sw', sw)
print('fr', fr)
print('nf', nf)
print('ct', ct)
print('cn', cn)
chunk = 1024
data = input_wave_file.readframes(chunk)
print('data[0:10] =', data[0:10])
print('data[0:10] =', end='')
for i in range(10):
print(data[i],' ',end='')
print('')
In a linux environment, I put the above into a file named playsound.py .
Then I executed (at the cmdline prompt)
$ chmod +x playsound.py
$ ./playsound.py file.wav
[ 0.06454468 0.05557251]
[ 0.06884766 0.05664062]
[ 0.0552063 0.06777954]
[ 0.04733276 0.0708313 ]
[ 0.05505371 0.065979 ]
[ 0.05358887 0.06677246]
[ 0.05621338 0.06045532]
[ 0.04891968 0.06298828]
[ 0.04986572 0.06817627]
[ 0.05410767 0.06661987]
...
nc 2
sw 2
fr 44100
nf 32768
ct NONE
cn not compressed
data[0:10] = b'C\x08\x1d\x07\xd0\x08#\x07\x11\x07'
data[0:10] =67 8 29 7 208 8 64 7 17 7
file.wav was some existing .wav file I had handy.
I previously tried
for i in data:
print(i)
as you had done, that worked also, but the output was too much.
I think you should check that the filename you are supplying points to a valid WAV file.
For instance, the path you list is "/home/filename.wav" .
Usually it will be at least "/home/username/filename.wav"

Related

AttributeError: 'float' object has no attribute 'co_names'

simple inspect script for pyc file
there is a problem with co_name function
the script is work well until the marshal module is load then fall.
magic: 160d0d0a
mod_time: 1493965574
source_size: 231
code:
Traceback (most recent call last):
File "/home/ubuntu/Downloads/book-resources-master/chapter4/code-exec-eg/python/inspect.py", line 24, in <module>
inspect_code(code)
File "/home/ubuntu/Downloads/book-resources-master/chapter4/code-exec-eg/python/inspect.py", line 8, in inspect_code
print('{}{}(line:{})'.format(indent, code.co_names, code.co_firstlineno))
AttributeError: 'float' object has no attribute 'co_names'
if anyone can help !
thanks
import marshal
import types
def to_long(s):
return s[0] + (s[1] << 8) + (s[2] << 16) + (s[3] << 24)
def inspect_code(code, indent=' '):
print('{}{}(line:{})'.format(indent, code.co_names, code.co_firstlineno))
for c in code.co_consts:
if isinstance(c, types.CodeType):
inspect_code(c, indent + ' ')
f = open('__pycache__/add.cpython-39.pyc', 'rb')
magic = f.read(4)
print('magic: {}'.format(magic.hex()))
mod_time = to_long(f.read(4))
print('mod_time: {}'.format(mod_time))
source_size = to_long(f.read(4))
print('source_size: {}'.format(source_size))
print('\ncode:')
code = marshal.load(f)
inspect_code(code)
f.close()
import dis
dis.disassemble(code)

I'm not familiar with marshal module nor pyc content, but when I try your code with Python 3.9,
I got error for reading code value. And the format of file seems different for my sample pyc built with Python 3.9.
magic: 610d0d0a
mod_time: 0 # <---- Unknown
source_size: 1621462747 # <---- Must be mode time
When I read 4 more bytes before reading value for code, I got this:
magic: 610d0d0a
mod_time: 0
source_size: 1621462747
4 more bytes: 14 # <----- Must be source size
Then I could read code value:
code:
('print',)(line:1)
1 0 LOAD_NAME 0 (print)
2 LOAD_CONST 0 ('Hello')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
I'm not sure why you're able to run marshal.load() without error but may you try to read 4 more or fewer bytes before calling marshal.load(f)?

How to read number of bytes in the ibd file (subprocess.check_output return code)

I want to know why I received error running my command in order to read number of bytes in the ibd file. What might be wrong in my code? Thanks a lot in advance.
I want to read my dataset which is in the format of imzML including another complementary file of ibd. More info can be ontained from http://psi.hupo.org/ms/mzml .
python
import subprocess
nbytes_str = subprocess.check_output(['wc -c < \"' + fname + '.ibd\"'], shell=True)
nbytes = int(nbytes_str)
nbytes # number of bytes in the ibd file
my error is:
python
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-7-381047b77c3f> in <module>
----> 1 nbytes_str = subprocess.check_output(['wc -c < \"' + fname + '.ibd\"'], shell=True)
2 nbytes = int(nbytes_str)
3 nbytes # number of bytes in the ibd file
~\.conda\envs\MSI\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
354
355 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 356 **kwargs).stdout
357
358
~\.conda\envs\MSI\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
436 if check and retcode:
437 raise CalledProcessError(retcode, process.args,
--> 438 output=stdout, stderr=stderr)
439 return CompletedProcess(process.args, retcode, stdout, stderr)
440
CalledProcessError: Command '['wc -c < "P1 lipids pos mode Processed Norm.ibd"']' returned non-zero exit status 1.

First of all, as the exception says: Your command returned non-zero exit status 1. It means the called command is not correct (and failed). So you should fix your wc -c < "P1 lipids pos mode Processed Norm.ibd" command to make your code working.
On the other hand you can get the number of bytes:
my_str = "hello world" # In your case : my_str = subprocess.check_output([...
my_str_as_bytes = str.encode(my_str) # Convert string to byte type
type(my_str_as_bytes) # ensure it is byte representation
len(my_str_as_bytes) # Lenght (number) of bytes
BUT in Python3 the subprocess.check_output return bytes by default so the conversion is not needed only get the len of returned value.
For example:
import subprocess
nbytes_byte = subprocess.check_output(['wc -c < test.txt'], shell=True)
print(type(nbytes_byte))
print(len(nbytes_byte))
Content of test.txt:
Hello World
Output:
>>> python3 test.py
<class 'bytes'>
3
Furthermore here is a similar question: Python : Get size of string in bytes
EDIT:
I recommend to define the path of the IDB file based on your Python file path.
For example:
Your Python file path: /home/user/script/my_script.py
You IDB file path: /home/user/idb_files/P1 lipids pos mode Processed Norm.ibd
In the above case you should define the IDB file path:
import os
idb_file_path = os.path.join(os.path.dirname(__file__), "..", "idb_files", "P1 lipids pos mode Processed Norm.ibd")
Here is the complete example:
import os
import subprocess
# The "os.path.join" joins the path from the inputs
# The "os.path.dirname(__file__)" returns the path of the directory of the current script
idb_file_path = os.path.join(os.path.dirname(__file__), "..", "idb_files", "P1 lipids pos mode Processed Norm.ibd")
nbytes_byte = subprocess.check_output(['wc -c < "{}"'.format(idb_file_path)], shell=True)
print(type(nbytes_byte))
print(len(nbytes_byte))

Pulls last 24 hours of logs and clean them python 3.x

I have three file: 2 .gz files and 1 .log file. These files are pretty big. Below I have a sample copy of my original data. I want to extract the entries that correspond to the last 24 hours.
a.log.1.gz
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
a.log.2.gz
2018/03/26-20:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/26-24:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/27-00:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/27-10:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/27-20:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/27-24:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/28-00:08:51.066968 1 7FE9BDC91700 std:ZMD:
a.log
2018/03/28-10:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/28-20:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
I am getting the below result but it is not cleaned.
result.txt
2018/03/27-20:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/27-24:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/28-00:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/28-10:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/28-20:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
Below code pulls the last 24 hours of line.
from datetime import datetime, timedelta
import glob
import gzip
from pathlib import Path
import shutil
def open_file(path):
if Path(path).suffix == '.gz':
return gzip.open(path, mode='rt', encoding='utf-8')
else:
return open(path, encoding='utf-8')
def parsed_entries(lines):
for line in lines:
yield line.split(' ', maxsplit=1)
def earlier():
return (datetime.now() - timedelta(hours=24)).strftime('%Y/%m/%d-%H:%M:%S')
def get_files():
return ['a.log'] + list(reversed(sorted(glob.glob('a.log.*'))))
output = open('output.log', 'w', encoding='utf-8')
files = get_files()
cutoff = earlier()
for i, path in enumerate(files):
with open_file(path) as f:
lines = parsed_entries(f)
# Assumes that your files are not empty
date, line = next(lines)
if cutoff <= date:
# Skip files that can just be appended to the output later
continue
for date, line in lines:
if cutoff <= date:
# We've reached the first entry of our file that should be
# included
output.write(line)
break
# Copies from the current position to the end of the file
shutil.copyfileobj(f, output)
break
else:
# In case ALL the files are within the last 24 hours
i = len(files)
for path in reversed(files[:i]):
with open_file(path) as f:
# Assumes that your files have trailing newlines.
shutil.copyfileobj(f, output)
# Cleanup, it would get closed anyway when garbage collected or process exits.
output.close()
I want to use below function to clean the lines:
def _clean_logs(line):
# noinspection SpellCheckingInspection
lemmatizer = WordNetLemmatizer()
clean_line = clean_line.strip()
clean_line = clean_line.lstrip('0123456789.- ')
cleaned_log = " ".join(
[lemmatizer.lemmatize(word) for word in nltk.word_tokenize(clean_line)])
cleaned_log = cleaned_log.replace('"', ' ')
return cleaned_log
Now, I want to use the able clean function which will clean the dirty data. I am not sure how to use it while pulling the last 24 hours. I wanna make it memory efficient as well as fast.

To answer the "make it memory efficient" part of your question, you can use the re module in stead of replace:
for line in lines:
line = re.sub('[T:-]', '', line)
This will reduce the complexity of your code and give better performance

I want to merge two codes but keep giving me an error

Hi I am new to python and I am trying to record the sound when there is loudness is over a certain value (in the code 200).
Initially, I had two different pieces of code. one for recording and one for the detecting loudness. I want to merge together and record the sound when the loudness is over 200.
But This keeps giving me an error so I am wondering which part am I missing.
I would very appreciate if some one help me to figure this out.
import time
import grovepi
import os
import sys
sys.path.insert(0,'.')
from audio import soundrecord
loudness_sensor = 0
while True:
try:
# Read the sound level
sensor_value = grovepi.analogRead(loudness_sensor)
print("sensor_value = %d" %sensor_value)
time.sleep(.5)
if sensor_value> 200:
soundrecord()
time.sleep(10)
except IOError:
print ("Error")
I defined the below recording code as a soundrecord function and put it in the same directory
import pyaudio
import wave
def soundrecord():
form_1 = pyaudio.paInt16 # 16-bit resolution
chans = 1 # 1 channel
samp_rate = 44100 # 44.1kHz sampling rate
chunk = 4096 # 2^12 samples for buffer
record_secs = 3 # seconds to record
dev_index = 2 # device index found by p.get_device_info_by_index(ii)
wav_output_filename = 'test1.wav' # name of .wav file
audio = pyaudio.PyAudio() # create pyaudio instantiation
# create pyaudio stream
stream = audio.open(format = form_1,rate = samp_rate,channels = chans, \
input_device_index = dev_index,input = True, \
frames_per_buffer=chunk)
print("recording")
frames = []
# loop through stream and append audio chunks to frame array
for ii in range(0,int((samp_rate/chunk)*record_secs)):
data = stream.read(chunk)
frames.append(data)
print("finished recording")
# stop the stream, close it, and terminate the pyaudio instantiation
stream.stop_stream()
stream.close()
audio.terminate()
# save the audio frames as .wav file
wavefile = wave.open(wav_output_filename,'wb')
wavefile.setnchannels(chans)
wavefile.setsampwidth(audio.get_sample_size(form_1))
wavefile.setframerate(samp_rate)
wavefile.writeframes(b''.join(frames))
wavefile.close()
expecting: Record the sound when the loudness is over 200
actual :
sensor_value = 75
sensor_value = 268
Error
sensor_value = 360
Error
sensor_value = 48
sensor_value = 39
sensor_value = 79

Python PYPDF2 : 'utf-8' codec can't decode byte 0x80 in position 395: invalid start byte

I'm using a tutorial to create a corpus of pdf files. I have the following code:
import nltk
import PyPDF2
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
from PyPDF2 import PdfFileReader
def getTextPDF(pdfFileName):
pdf_file = open(pdfFileName, 'rb')
readpdf = PdfFileReader(pdf_file)
text = []
for i in range(0,readpdf.getNumPages()):
text.append(readpdf.getPage(i).extractText())
return '\n'.join(text)
corpusDir = 'reports/'
jun15 = getTextPDF('reports/June2015.pdf')
dec15 = getTextPDF('reports/December2015.pdf')
jun16 = getTextPDF('reports/June2016.pdf')
dec16 = getTextPDF('reports/December2016.pdf')
jun17 = getTextPDF('reports/June2017.pdf')
dec17 = getTextPDF('reports/December2017.pdf')
files = [jun15,dec15,jun16,dec16,jun17,dec17]
for idx, f in enumerate(files):
with open (corpusDir+str(idx)+'.txt','w') as output:
output.write(f)
corpus = PlaintextCorpusReader(corpusDir, '.*')
print (corpus.words())
UnicodeDecodeError Traceback (most recent call
last) in ()
----> 1 print (corpus.words())
/anaconda3/lib/python3.6/site-packages/nltk/collections.py in
repr(self)
224 pieces = []
225 length = 5
--> 226 for elt in self:
227 pieces.append(repr(elt))
228 length += len(pieces[-1]) + 2
/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/util.py in
iterate_from(self, start_tok)
400
401 # Get everything we can from this piece.
--> 402 for tok in piece.iterate_from(max(0, start_tok-offset)):
403 yield tok
404
/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/util.py in
iterate_from(self, start_tok)
294 self._current_toknum = toknum
295 self._current_blocknum = block_index
--> 296 tokens = self.read_block(self._stream)
297 assert isinstance(tokens, (tuple, list, AbstractLazySequence)), (
298 'block reader %s() should return list or tuple.' %
/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/plaintext.py
in _read_word_block(self, stream)
120 words = []
121 for i in range(20): # Read 20 lines at a time.
--> 122 words.extend(self._word_tokenizer.tokenize(stream.readline()))
123 return words
124
/anaconda3/lib/python3.6/site-packages/nltk/data.py in readline(self,
size) 1166 while True: 1167 startpos =
self.stream.tell() - len(self.bytebuffer)
-> 1168 new_chars = self._read(readsize) 1169 1170 # If we're at a '\r', then read one extra character, since
/anaconda3/lib/python3.6/site-packages/nltk/data.py in _read(self,
size) 1398 1399 # Decode the bytes into unicode
characters
-> 1400 chars, bytes_decoded = self._incr_decode(bytes) 1401 1402 # If we got bytes but couldn't decode any, then
read further.
/anaconda3/lib/python3.6/site-packages/nltk/data.py in
_incr_decode(self, bytes) 1429 while True: 1430 try:
-> 1431 return self.decode(bytes, 'strict') 1432 except UnicodeDecodeError as exc: 1433 # If the
exception occurs at the end of the string,
/anaconda3/lib/python3.6/encodings/utf_8.py in decode(input, errors)
14
15 def decode(input, errors='strict'):
---> 16 return codecs.utf_8_decode(input, errors, True)
17
18 class IncrementalEncoder(codecs.IncrementalEncoder):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position
395: invalid start byte
I've been looking at different posts, but I still can't tell if the problem is that I'm using the wrong methods or that I have to encode or decode something. Ifd it's the latter I don't know where. Any ideas would be appreciated.

It would be best to see the whole error message but I'm guessing you are using python 2 and your reports have some utf-8 in them. First of, try to specify the encoding at the beginning and when you open your files:
#!/usr/bin/python
#-*- coding:utf-8 -*-
import nltk
import PyPDF2
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
from PyPDF2 import PdfFileReader
import codecs
def getTextPDF(pdfFileName):
pdf_file = codecs.open(pdfFileName, 'rb', encoding='utf8')
readpdf = PdfFileReader(pdf_file)
text = []
for i in range(0,readpdf.getNumPages()):
text.append(readpdf.getPage(i).extractText())
return '\n'.join(text)
corpusDir = 'reports/'
jun15 = getTextPDF('reports/June2015.pdf')
dec15 = getTextPDF('reports/December2015.pdf')
jun16 = getTextPDF('reports/June2016.pdf')
dec16 = getTextPDF('reports/December2016.pdf')
jun17 = getTextPDF('reports/June2017.pdf')
dec17 = getTextPDF('reports/December2017.pdf')
files = [jun15,dec15,jun16,dec16,jun17,dec17]
for idx, f in enumerate(files):
with codecs.open(corpusDir+str(idx)+'.txt','w', encoding='utf8') as output:
output.write(f)
corpus = PlaintextCorpusReader(corpusDir, '.*')
print (corpus.words())
if that doen't work, you can try bodging your strings, but it's not ideal:
def toUtf8(stringOrUnicode):
'''
Returns the argument in utf-8 encoding
'''
typeArg = type(stringOrUnicode)
if typeArg is unicode:
return stringOrUnicode.encode('utf8').decode('utf8')
elif typeArg is str:
return stringOrUnicode.decode('utf8')
Otherwise, show us the message error to try and detect exactly where is the problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

unable to read the .WAV file using python - python-3.x

Related

AttributeError: 'float' object has no attribute 'co_names'

How to read number of bytes in the ibd file (subprocess.check_output return code)

Pulls last 24 hours of logs and clean them python 3.x

I want to merge two codes but keep giving me an error

Python PYPDF2 : 'utf-8' codec can't decode byte 0x80 in position 395: invalid start byte

Categories

Resources