Broken / combined colormaps - colors

I'm plotting a 2D scalar field with imshow, and I'd like to clearly contrast negative values from positive ones. Is there a way to implement a colormap composed of two others (e.g. jet for example, hot for positive and cool for negative)?

You can read the colors from the existing cmaps and just add them, thats fairly simple but has as few drawbacks. If the original colormaps have a different number of colors, the 'edge' of both will not be centered.
If they do have the same number, the resulting cmap will be symmetric, but the 'edge' will only be at zero if the positive value equals the negative value, eg -2 & 2 or -4 & 4, etc.
This can be done like:
import matplotlib.pyplot as plt
import numpy as np
cool = plt.cm.cool
hot = plt.cm.hot
cool_vals = [cool(i) for i in range(cool.N)]
hot_vals = [hot(i) for i in range(hot.N)]
comb_vals = cool_vals + hot_vals
# random hue with constant sat and value
new_cmap = matplotlib.colors.ListedColormap(comb_vals)
plt.imshow(np.arange(20*20).reshape(20,20)-199., interpolation='none', cmap=new_cmap)
plt.colorbar()
Im not aware of very fancy methods in Matplotlib. There is a brand new Python module 'TrollImage' which has a really nice implementation of working with colormaps. Its aimed at satellite images but the colormap part of course applies to any kind of image.

Related

Detecting duplicate audio files

I have snippets of audio that are almost the same that I want to group together (samples 5 and 3 below). There are other portions that are similar, but differ (3 and 4, there is a double drum hit at the end for 3) and completely different ones (sample 8).
How can I group together samples that are almost the same? I tried taking the difference (attempting to minimize it), but that does not work since they are not aligned. I also tried to take audio features like pitch distribution, but since the sounds are similar in pitch those don't get separated well.
The files are available here: https://drive.google.com/drive/folders/14UQQDfIBUNRO_1Pv8bkPf9noi86M7lKd
Here's something that appears to work for the data you are using but may (likely does) have weaknesses when it comes to other data or other sorts of data. But maybe it will be helpful nonetheless.
The basic idea of this solution is to compute the MFCCs of each of the samples to get feature vectors and then find a distance (here just using basic Euclidean distance) between those feature sets with the assumption (which seems to be true for your data) that the least similar samples will have a large distance and the closest will have the least. Here's the code:
import librosa
import scipy
import matplotlib.pyplot as plt
sample3, rate = librosa.load('sample3.wav', sr=None)
sample4, rate = librosa.load('sample4.wav', sr=None)
sample5, rate = librosa.load('sample5.wav', sr=None)
sample8, rate = librosa.load('sample8.wav', sr=None)
# cut the longer sounds to same length as the shortest
len5 = len(sample5)
sample3 = sample3[:len5]
sample4 = sample4[:len5]
sample8 = sample8[:len5]
mf3 = librosa.feature.mfcc(sample3, sr=rate)
mf4 = librosa.feature.mfcc(sample4, sr=rate)
mf5 = librosa.feature.mfcc(sample5, sr=rate)
mf8 = librosa.feature.mfcc(sample8, sr=rate)
# average across the frames. dubious?
amf3 = mf3.mean(axis=0)
amf4 = mf4.mean(axis=0)
amf5 = mf5.mean(axis=0)
amf8 = mf8.mean(axis=0)
f_list = [amf3, amf4, amf5, amf8]
results = []
for i, features_a in enumerate(f_list):
results.append([])
for features_b in f_list:
result = scipy.spatial.distance.euclidean(features_a,
features_b)
results[i].append(result)
plt.ion()
fig, ax = plt.subplots()
ax.imshow(results, cmap='gray_r', interpolation='nearest')
spots = [0, 1, 2, 3]
labels = ['s3', 's4', 's5', 's8']
ax.set_xticks(spots)
ax.set_xticklabels(labels)
ax.set_yticks(spots)
ax.set_yticklabels(labels)
The code plots a heatmap of the distances across all the samples. The code is lazy so it both re-computes the elements that are symmetric across the diagonal, which are the same, and the diagonal itself (which should be zero distance) but those are sort of sanity checks as it is nice to see white down the diagonal and that the matrix is symmetric.
The real information is that clip 8 is black against all the other clips (i.e. furthest from them) and clip 3 and clip 5 are the least distant from one another.
This basic idea could be done with a feature vector generated in a different sort of way (e.g. instead of MFCCs, you could use the embeddings from something like YAMNet) or with a different way of finding a distance between the feature vectors.
For the grouping part of what you want to do, you could experimentally work out a threshold on the distance metric below which you would consider a clip to be in the same group as another. With more clips, you could compute all these distances and then hand that distance matrix over to a clustering algorithm (like HDBSCAN) to cluster the clips.

Unable to see numbers on y axis for a barplot with seaborn [duplicate]

I've been trying to suppress scientific notation in pyplot for a few hours now. After trying multiple solutions without success, I would like some help.
plt.plot(range(2003,2012,1),range(200300,201200,100))
# several solutions from other questions have not worked, including
# plt.ticklabel_format(style='sci', axis='x', scilimits=(-1000000,1000000))
# ax.get_xaxis().get_major_formatter().set_useOffset(False)
plt.show()
Is ticklabel_format broken? does not resolve the issue of actually removing the offset.
plt.plot(np.arange(1e6, 3 * 1e7, 1e6))
plt.ticklabel_format(useOffset=False)
In your case, you're actually wanting to disable the offset. Using scientific notation is a separate setting from showing things in terms of an offset value.
However, ax.ticklabel_format(useOffset=False) should have worked (though you've listed it as one of the things that didn't).
For example:
fig, ax = plt.subplots()
ax.plot(range(2003,2012,1),range(200300,201200,100))
ax.ticklabel_format(useOffset=False)
plt.show()
If you want to disable both the offset and scientific notaion, you'd use ax.ticklabel_format(useOffset=False, style='plain').
Difference between "offset" and "scientific notation"
In matplotlib axis formatting, "scientific notation" refers to a multiplier for the numbers show, while the "offset" is a separate term that is added.
Consider this example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1000, 1001, 100)
y = np.linspace(1e-9, 1e9, 100)
fig, ax = plt.subplots()
ax.plot(x, y)
plt.show()
The x-axis will have an offset (note the + sign) and the y-axis will use scientific notation (as a multiplier -- No plus sign).
We can disable either one separately. The most convenient way is the ax.ticklabel_format method (or plt.ticklabel_format).
For example, if we call:
ax.ticklabel_format(style='plain')
We'll disable the scientific notation on the y-axis:
And if we call
ax.ticklabel_format(useOffset=False)
We'll disable the offset on the x-axis, but leave the y-axis scientific notation untouched:
Finally, we can disable both through:
ax.ticklabel_format(useOffset=False, style='plain')

Is it possible to describe with 1 parameter when a wave is sinusoidal or square in Python?

I am using scipy and I managed to filter the data with the fft package cutting the high frequencies, but that is only useful to transform the data, instead of that I want to get just 1 parameter after the analysis.
Let's have a look at some simple code to explain what I mean:
from scipy import fftpack
import numpy as np
import pandas as pd
from scipy import signal
t = np.linspace(0, 2*np.pi, 100, endpoint=True)
sq1 = signal.square(np.pi*t)
sin1 = np.sin(np.pi*t)
fft_sq1 = fftpack.dct(sq1,norm="ortho")
fft_sin1 = fftpack.dct(sin1, norm="ortho")
After applying the fast fourier transform (direct cosine) I get fft_sq1 and fft_sin1, which are arrays 100 elements long. Manipulating those coefficientes I can use later the fftpack.idct() and obtain a curve that does not contain noise.
The problem with this is that I get too much frequencies, I get 100 parameters I have to filter and after that I get again the curve.
Instead of that I am interested in a filter that returns me just 1 value:
0 if the curve is completely square
1 if the curve is exactly like a sinusoid
Does something comes to your mind?
Obviously there are infinite curves in between, if the periodic signal is more flat the number will be closer to 0 and if the curve is more round the number will be closer to 1.
Thanks!!

np.fft.fft not working properly

I'm trying to use python 3.x to do an fft from some data. But when I plot I get my original data (?) not the data's fft. I'm using matlab so I can compare the results.
I've already tried many examples from this site but nothing seems to work. I'm not used to work with python. How can I get a plot similar to matlab's? I don't care if I get -f/2 to f/2 or 0 to f/2 spectrum.
My data
import scipy.io
import numpy as np
import matplotlib.pyplot as plt
mat = scipy.io.loadmat('sinal2.mat')
sinal2 = mat['sinal2']
Fs = 1000
L = 1997
T = 1.0/1000.0
fsig = np.fft.fft(sinal2)
freq = np.fft.fftfreq(len(sinal2), 1/Fs)
plt.figure()
plt.plot( freq, np.abs(fsig))
plt.figure()
plt.plot(freq, np.angle(fsig))
plt.show()
FFT from python:
FFT from matlab:
The imported signal sinal2 has a size (1997,1). In case of 2 dimensional arrays like this, numpy.fft.fft by default computes the FFT along the last axis. In this case that means computing 1997 FFTs of size 1. As you may know a 1-point FFT is an identity mapping (meaning the FFT of a single value gives the same value), hence the resulting 2D array is identical to the original array.
To avoid this, you can either specify the other axis explicitly:
fsig = np.fft.fft(sinal2, axis=0)
Or otherwise convert the data to a single dimensional array, then compute the FFT of a 1D array:
sinal2 = singal2[:,0]
fsig = np.fft.fft(sinal2)
On a final note, you FFT plot shows a horizontal line connecting the upper and lower halfs of the frequency spectrum. See my answer to another question to address this problem. Since you mention that you really only need half the spectrum, you could also truncate the result to the first N//2+1 points:
plt.plot( freq[0:len(freq)//2+1], np.abs(fsig[0:len(fsig)//2+1]))

Doubts in histogram in python: frequency graph [duplicate]

I have count data (a 100 of them), each correspond to a bin (0 to 99). I need to plot these data as histogram. However, histogram count those data and does not plot correctly because my data is already binned.
import random
import matplotlib.pyplot as plt
x = random.sample(range(1000), 100)
xbins = [0, len(x)]
#plt.hist(x, bins=xbins, color = 'blue')
#Does not make the histogram correct. It counts the occurances of the individual counts.
plt.plot(x)
#plot works but I need this in histogram format
plt.show()
If I'm understanding what you want to achieve correctly then the following should give you what you want:
import matplotlib.pyplot as plt
plt.bar(range(0,100), x)
plt.show()
It doesn't use hist(), but it looks like you've already put your data into bins so there's no need.
The problem is with your xbins. You currently have
xbins = [0, len(x)]
which will give you the list [0, 100]. This means you will only see 1 bin (not 2) bounded below by 0 and above by 100. I am not totally sure what you want from your histogram. If you want to have 2 unevenly spaced bins, you can use
xbins = [0, 100, 1000]
to show everything below 100 in one bin, and everything else in the other bin. Another option would be to use an integer value to get a certain number of evenly spaced bins. In other words do
plt.hist(x, bins=50, color='blue')
where bins is the number of desired bins.
On a side note, whenever I can't remember how to do something with matplotlib, I will usually just go to the thumbnail gallery and find an example that looks more or less what I am trying to accomplish. These examples all have accompanying source code so they are quite helpful. The documentation for matplotlib can also be very handy.
Cool, thanks! Here's what I think the OP wanted to do:
import random
import matplotlib.pyplot as plt
x=[x/1000 for x in random.sample(range(100000),100)]
xbins=range(0,len(x))
plt.hist(x, bins=xbins, color='blue')
plt.show()
I am fairly sure that your problem is the bins. It is not a list of limits but rather a list of bin edges.
xbins = [0,len(x)]
returns in your case a list containing [0, 100] Indicating that you want a bin edge at 0 and one at 100. So you get one bin from 0 to 100.
What you want is:
xbins = [x for x in range(len(x))]
Which returns:
[0,1,2,3, ... 99]
Which indicates the bin edges you want.
You can achieve this using matplotlib's hist as well, no need for numpy. You have essentially already created the bins as xbins. In this case x will be your weights.
plt.hist(xbins,weights=x)
Have a look at the histogram examples in the matplotlib documentation. You should use the hist function. If it by default does not yield the result you expect, then play around with the arguments to hist and prepare/transform/modify your data before providing it to hist. It is not really clear to me what you want to achieve, so I cannot help at this point.

Resources