ParameterError : data must be floating-point (librosa) - audio

Refernce : https://github.com/librosa/librosa/blob/master/examples/LibROSA%20demo.ipynb
Code :
import librosa
S = librosa.feature.melspectrogram(samples, sr=sample_rate, n_mels=128)
log_S = librosa.power_to_db(S, ref=np.max)
plt.figure(figsize=(12,4))
librosa.display.specshow(log_S, sr=sample_rate, x_axis='time', y_axis='mel')
plt.title('mel power spectrogram')
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()
Erorr I am getting :

The parameter -- > samples in below method is not correct.
S = librosa.feature.melspectrogram(samples, sr=sample_rate, n_mels=128)
We are getting samples from wavfile read.
sample_rate, samples = wavfile.read(str(train_audio_path) + filename)
problem is specified in here wave file read wrong
So use the following line of code for getting samples in correct dtype.
samples, sample_rate = librosa.load(str(train_audio_path)+filename)
Reference : librosa.github.io

Related

What is the output of predict.coxph() using type = "survival"?

I am trying to learn what the various outputs of predict.coxph() mean. I am currently attempting to fit a cox model on a training set then use the resulting coefficients from the training set to make predictions in a test set (new set of data).
I see from the predict.coxph() help page that I could use type = "survival" to extract and individual's survival probability-- which is equal to exp(-expected).
Here is a code block of what I have attempted so far, using the ISLR2 BrainCancer data.
set.seed(123)
n.training = round(nrow(BrainCancer) * 0.70) # 70:30 split
idx = sample(1:nrow(BrainCancer), size = n.training)
d.training = BrainCancer[idx, ]
d.test = BrainCancer[-idx, ]
# fit a model using the training set
fit = coxph(Surv(time, status) ~ sex + diagnosis + loc + ki + gtv + stereo, data = d.training)
# get predicted survival probabilities for the test set
pred = predict(fit, type = "survival", newdata = d.test)
The predictions generated:
predict(fit, type = "survival", newdata = d.test)
[1] 0.9828659 0.8381164 0.9564982 0.2271862 0.2883800 0.9883625 0.9480138 0.9917512 1.0000000 0.9974775 0.7703657 0.9252100 0.9975044 0.9326234 0.8718161 0.9850815 0.9545622 0.4381646 0.8236644
[20] 0.2455676 0.7289031 0.9063336 0.9126897 0.9988625 0.4399697 0.9360874
Are these survival probabilities associated with a specific time point? From the help page, it sounds like these are survival probabilities at the follow-up times in the newdata argument. Is this correct?
Additional questions:
How is the baseline hazard estimated in predict.coxph? Is it using the Breslow estimator?
If type = "expected" is used, are these values the cumulative hazard? If yes, what are the relevant time points for these?
Thank you!

Importation of method EF 3.0 - trouble with results

I wrote a script to import the characterization factors of the LCIA method EF 3.0 (adapated) on Brightway. I think it works fine as I see the right characterization factors on the Activity Browser (ex for the Climate Change method : but when I run calculations with the method, the results are not the same as on Simapro (where I got the CSV Import File from) : And for instance the result is 0 for the Climate Change method. Do you know what can be the issue ?
It seems that the units are different but it is the same for the other methods that are available on Brightway.
Besides, I saw on another question that there would be a method implemented to import the EF 3.0 method, is it available yet ?
Thank you very much for your help.
Code of the importation script :
import brightway2 as bw
import csv
import uuid
from bw2data import mapping
from bw2data.utils import recursive_str_to_unicode
class import_method_EF:
'''Class for importing the EF method from Simapro export CSV file to Brightway. '''
def __init__(
self,
project_name,
name_file,
):
self.project_name = project_name
self.name_file = name_file
self.db_biosphere = bw.Database('biosphere3')
#Definition of the dictionnary for the correspondance between the Simapro and the ecoinvent categories
self.dict_categories = {'high. pop.' : 'urban air close to ground',
'low. pop.' : 'low population density, long-term',
'river' : 'surface water',
'in water' : 'in water',
'(unspecified)' : '',
'ocean' : 'ocean',
'indoor' : 'indoor',
'stratosphere + troposphere' : 'lower stratosphere + upper troposphere',
'low. pop., long-term' : 'low population density, long-term',
'groundwater, long-term' : 'ground-, long-term',
'agricultural' : 'agricultural',
'industrial' : 'industrial',
}
#Definition of the dictionnary of the ecoinvent units abreviations
self.dict_units = {'kg' : 'kilogram',
'kWh' : 'kilowatt hour',
'MJ' : 'megajoule',
'p':'p',
'unit':'unit',
'km':'kilometer',
'my' : 'meter-year',
'tkm' : 'ton kilometer',
'm3' : 'cubic meter',
'm2' :'square meter',
'kBq' : 'kilo Becquerel',
'm2a' : 'm2a', #à modifier
}
def importation(self) :
"""
Makes the importation from the Simapro CSV file to Brightway.
"""
#Set the current project
bw.projects.set_current(self.project_name)
self.data = self.open_CSV(self.name_file, [])
list_methods = []
new_flows = []
for i in range(len(self.data)) :
#print(self.data[i])
if self.data[i] == ['Name'] :
name_method = self.data[i+1][0]
if self.data[i] == ['Impact category'] :
list_flows = []
j = 4
while len(self.data[i+j])>1 :
biosphere_code = self.get_biosphere_code(self.data[i+j][2],self.data[i+j][1],self.data[i+j][0].lower())
if biosphere_code == 0 :
if self.find_if_already_new_flow(i+j, new_flows)[0] :
code = self.find_if_already_new_flow(i+j, new_flows)[1]
list_flows.append((('biosphere3', code),float(self.data[i+j][4].replace(',','.'))))
else :
code = str(uuid.uuid4())
while (self.db_biosphere.name, code) in mapping:
code = str(uuid.uuid4())
new_flows.append({'amount' : float(self.data[i+j][4].replace(',','.')),
'CAS number' : self.data[i+j][3],
'categories' : (self.data[i+j][0].lower(), self.dict_categories[self.data[i+j][1]]),
'name' : self.data[i+j][2],
'unit' : self.dict_units[self.data[i+j][5]],
'type' : 'biosphere',
'code' : code})
list_flows.append((('biosphere3', code),float(self.data[i+j][4].replace(',','.'))))
else :
list_flows.append((('biosphere3', biosphere_code),float(self.data[i+j][4].replace(',','.'))))
j+=1
list_methods.append({'name' : self.data[i+1][0],
'unit' : self.data[i+1][1],
'flows' : list_flows})
new_flows = recursive_str_to_unicode(dict([self._format_flow(flow) for flow in new_flows]))
if new_flows :
print('new flows :',len(new_flows))
self.new_flows = new_flows
biosphere = bw.Database(self.db_biosphere.name)
biosphere_data = biosphere.load()
biosphere_data.update(new_flows)
biosphere.write(biosphere_data)
print('biosphere_data :',len(biosphere_data))
for i in range(len(list_methods)) :
method = bw.Method((name_method,list_methods[i]['name']))
method.register(**{'unit':list_methods[i]['unit'],
'description':''})
method.write(list_methods[i]['flows'])
print(method.metadata)
method.load()
def open_CSV(self, CSV_file_name, list_rows):
'''
Opens a CSV file and gets a list of the rows.
: param : CSV_file_name = str, name of the CSV file (must be in the working directory)
: param : list_rows = list, list to get the rows
: return : list_rows = list, list of the rows
'''
#Open the CSV file and read it
with open(CSV_file_name, 'rt') as csvfile:
data = csv.reader(csvfile, delimiter = ';')
#Write every row in the list
for row in data:
list_rows.append(row)
return list_rows
def get_biosphere_code(self, simapro_name, simapro_cat, type_biosphere):
"""
Gets the Brightway code of a biosphere process given in a Simapro format.
: param : simapro_name = str, name of the biosphere process in a Simapro format.
: param : simapro_cat = str, category of the biosphere process (ex : high. pop., river, etc)
: param : type_biosphere = str, type of the biosphere process (ex : Emissions to water, etc)
: return : 0 if the process is not found in biosphere, the code otherwise
"""
if 'GLO' in simapro_name or 'RER' in simapro_name :
simapro_name = simapro_name[:-5]
if '/m3' in simapro_name :
simapro_name = simapro_name[:-3]
#Search in the biosphere database, depending on the category
if simapro_cat == '' :
act_biosphere = self.db_biosphere.search(simapro_name, filter={'categories' : (type_biosphere,)})
else :
act_biosphere = self.db_biosphere.search(simapro_name, filter={'categories' : (type_biosphere, self.dict_categories[simapro_cat])})
#Pourquoi j'ai fait ça ? ...
for act in act_biosphere :
if simapro_cat == '' :
if act['categories'] == (type_biosphere, ):
return act['code']
else :
if act['categories'] == (type_biosphere, self.dict_categories[simapro_cat]):
return act['code']
return 0
def _format_flow(self, cf):
# TODO
return (self.db_biosphere.name, cf['code']), {
'exchanges': [],
'categories': cf['categories'],
'name': cf['name'],
'type': ("resource" if cf["categories"][0] == "resource"
else "emission"),
'unit': cf['unit'],
}
def find_if_already_new_flow(self, n, new_flows) :
"""
"""
for k in range(len(new_flows)) :
if new_flows[k]['name'] == self.data[n][2] :
return True, new_flows[k]['code']
return False, 0
Edit : I made a modification in the get_biosphere_code method and it works better (it was not finding some biosphere flows) but I still have important differences between the results I get on Brightway and the results I get on Simapro. My investigations led me to the following observations :
there are some differences in ecoinvent activities and especially in the lists of biosphere flows (should be a sink of differences in result), some are missing in Brightway and also in the ecoSpold data that was used for the importation compared to the data in Simapro
it seems that the LCA calculation doesn't work the same way as regards the subcategories : for example, the biosphere flow Carbon dioxide, fossil (air,) is in the list of caracterization factors for the Climate Change method and when looking at the inventory in the Simapro LCA results, it appears that all the Carbon dioxide, fossil flows to air participate in the Climate Change impact, no matter what their subcategory is. But Brightway does not work this way and only takes into account the flows that are exactly the same, so it leads to important differences in the results.
In LCA there's no agreement on elementary flows and archetypical emission scenarios / context (https://doi.org/10.1007/s11367-017-1354-3), and implementations of the impact assessment methods differ (https://www.lifecycleinitiative.org/portfolio_category/lcia/).
It is not unusual that the same activity and same impact assessment method returns different results in different software. There are some attempts to improve the current practices (see e.g , https://github.com/USEPA/LCIAformatter).

Doing feature generation in serving_input_fn for Tensorflow model

I've been playing around with BERT and TensorFlow following the example here and have a trained working model.
I then wanted to save and deploy the model, so used the export_saved_model function, which requires you build a serving_input_fn to handle any incoming requests when the model is reloaded.
I wanted to be able to pass a single string for sentiment analysis to the deployed model, rather than having a theoretical client side application do the tokenisation and feature generation etc, so tried to write an input function that would handle that and pass the constructed features to the model. Is this possible? I wrote the following which I feel should do what I want:
import json
import base64
def plain_text_serving_input_fn():
input_string = tf.placeholder(dtype=tf.string, shape=None, name='input_string_text')
# What format to expect input in.
receiver_tensors = {'input_text': input_string}
input_examples = [run_classifier.InputExample(guid="", text_a = str(input_string), text_b = None, label = 0)] # here, "" is just a dummy label
input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
variables = {}
for i in input_features:
variables["input_ids"] = i.input_ids
variables["input_mask"] = i.input_mask
variables["segment_ids"] = i.segment_ids
variables["label_id"] = i.label_id
feature_spec = {
"input_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"input_mask" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"segment_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"label_ids" : tf.FixedLenFeature([], tf.int64)
}
string_variables = json.dumps(variables)
encode_input = base64.b64encode(string_variables.encode('utf-8'))
encode_string = base64.decodestring(encode_input)
features_to_input = tf.parse_example([encode_string], feature_spec)
return tf.estimator.export.ServingInputReceiver(features_to_input, receiver_tensors)
I would expect that this would allow me to call predict on my deployed model with
variables = {"input_text" : "This is some test input"}
predictor.predict(variables)
I've tried a range of variations of this (putting it in an array, converting to base 64 etc), but I get a range of errors either telling me
"error": "Failed to process element: 0 of 'instances' list. Error: Invalid argument: JSON Value: {\n \"input_text\": \"This is some test input\"\n} not formatted correctly for base64 data" }"
or
Object of type 'bytes' is not JSON serializable
I suspect I'm formatting my requests incorrectly, but I also can't find any examples of something similar being done in a serving_input_fn, so has anyone ever done something similar?

load .npy file from google cloud storage with tensorflow

i'm trying to load .npy files from my google cloud storage to my model i followed this example here Load numpy array in google-cloud-ml job
but i get this error
'utf-8' codec can't decode byte 0x93 in
position 0: invalid start byte
can you help me please ??
here is sample from the code
Here i read the file
with file_io.FileIO(metadata_filename, 'r') as f:
self._metadata = [line.strip().split('|') for line in f]
and here i start processing on it
if self._offset >= len(self._metadata):
self._offset = 0
random.shuffle(self._metadata)
meta = self._metadata[self._offset]
self._offset += 1
text = meta[3]
if self._cmudict and random.random() < _p_cmudict:
text = ' '.join([self._maybe_get_arpabet(word) for word in text.split(' ')])
input_data = np.asarray(text_to_sequence(text, self._cleaner_names), dtype=np.int32)
f = StringIO(file_io.read_file_to_string(
os.path.join('gs://path',meta[0]))
linear_target = tf.Variable(initial_value=np.load(f), name='linear_target')
s = StringIO(file_io.read_file_to_string(
os.path.join('gs://path',meta[1])))
mel_target = tf.Variable(initial_value=np.load(s), name='mel_target')
return (input_data, mel_target, linear_target, len(linear_target))
and this is a sample from the data sample
This is likely because your file doesn't contain utf-8 encoded text.
Its possible, you may need to initialize the file_io.FileIO instance as a binary file using mode = 'rb', or set binary_mode = True in the call to read_file_to_string.
This will cause data that is read to be returned as a sequence of bytes, rather than a string.

is it possible to get exactly the same results from tensorflow mfcc and librosa mfcc?

I'm trying to make tensorflow mfcc give me the same results as python lybrosa mfcc
i have tried to match all the default parameters that are used by librosa
in my tensorflow code and got a different result
this is the tensorflow code that i have used :
waveform = contrib_audio.decode_wav(
audio_binary,
desired_channels=1,
desired_samples=sample_rate,
name='decoded_sample_data')
sample_rate = 16000
transwav = tf.transpose(waveform[0])
stfts = tf.contrib.signal.stft(transwav,
frame_length=2048,
frame_step=512,
fft_length=2048,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False),
pad_end=True)
spectrograms = tf.abs(stfts)
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 0.0,8000.0, 128
linear_to_mel_weight_matrix =
tf.contrib.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz)
mel_spectrograms = tf.tensordot(
spectrograms,
linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
log_mel_spectrograms = tf.log(mel_spectrograms + 1e-6)
mfccs = tf.contrib.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :20]
the equivalent in librosa:
libr_mfcc = librosa.feature.mfcc(wav, 16000)
the following are the graphs of the results:
I'm the author of tf.signal. Sorry for not seeing this post sooner, but you can get librosa and tf.signal.stft to match if you center-pad the signal before passing it to tf.signal.stft. See this GitHub issue for more details.
I spent a whole 1 day trying to make them match. Even the rryan's solution didn't work for me (center=False in librosa), but I finally found out, that TF and librosa STFT's match only for the case win_length==n_fft in librosa and frame_length==fft_length in TF. That's why rryan's colab example is working, but you can try that if you set frame_length!=fft_length, the amplitudes are very different (although visually, after plotting, the patterns look similar). Typical example - if you choose some win_length/frame_length and then you want to set n_fft/fft_length to the smallest power of 2 greater than win_length/frame_length, then the results will be different. So you need to stick with the inefficient FFT given by your window size... I don't know why it is so, but that's how it is, hopefully it will be helpful for someone.
The output of contrib_audio.decode_wav should be DecodeWav with { audio, sample_rate } and audio shape is (sample_rate, 1), so what is the purpose for getting first item of waveform and do transpose?
transwav = tf.transpose(waveform[0])
No straight forward way, since librosa stft uses center=True which does not comply with tf stft.
Had it been center=False, stft tf/librosa would give near enough results. see colab sniff
But even though, trying to import the librosa code into tf is a big headache. Here is what I started and gave up. Near but not near enough.
def pow2db_tf(X):
amin=1e-10
top_db=80.0
ref_value = 1.0
log10 = 2.302585092994046
log_spec = (10.0/log10) * tf.log(tf.maximum(amin, X))
log_spec -= (10.0/log10) * tf.log(tf.maximum(amin, ref_value))
pow2db = tf.maximum(log_spec, tf.reduce_max(log_spec) - top_db)
return pow2db
def librosa_feature_like_tf(x, sr=16000, n_fft=2048, n_mfcc=20):
mel_basis = librosa.filters.mel(sr, n_fft).astype(np.float32)
mel_basis = mel_basis.reshape(1, int(n_fft/2+1), -1)
tf_stft = tf.contrib.signal.stft(x, frame_length=n_fft, frame_step=hop_length, fft_length=n_fft)
print ("tf_stft", tf_stft.shape)
tf_S = tf.matmul(tf.abs(tf_stft), mel_basis);
print ("tf_S", tf_S.shape)
tfdct = tf.spectral.dct(pow2db_tf(tf_S), norm='ortho'); print ("tfdct", tfdct.shape)
print ("tfdct before cut", tfdct.shape)
tfdct = tfdct[:,:,:n_mfcc];
print ("tfdct afer cut", tfdct.shape)
#tfdct = tf.transpose(tfdct,[0,2,1]);print ("tfdct afer traspose", tfdct.shape)
return tfdct
x = tf.placeholder(tf.float32, shape=[None, 16000], name ='x')
tf_feature = librosa_feature_like_tf(x)
print("tf_feature", tf_feature.shape)
mfcc_rosa = librosa.feature.mfcc(wav, sr).T
print("mfcc_rosa", mfcc_rosa.shape)
For anyone still looking for this: I had a similar problem some time ago: Matching librosa's mel filterbanks/mel spectrogram to a tensorflow implementation. The solution was to use a different windowing approach for the spectrogram and librosa's mel matrix as constant tensor. See here and here.

Resources