Doubts in histogram in python: frequency graph [duplicate] - python-3.x

I have count data (a 100 of them), each correspond to a bin (0 to 99). I need to plot these data as histogram. However, histogram count those data and does not plot correctly because my data is already binned.
import random
import matplotlib.pyplot as plt
x = random.sample(range(1000), 100)
xbins = [0, len(x)]
#plt.hist(x, bins=xbins, color = 'blue')
#Does not make the histogram correct. It counts the occurances of the individual counts.
plt.plot(x)
#plot works but I need this in histogram format
plt.show()

If I'm understanding what you want to achieve correctly then the following should give you what you want:
import matplotlib.pyplot as plt
plt.bar(range(0,100), x)
plt.show()
It doesn't use hist(), but it looks like you've already put your data into bins so there's no need.

The problem is with your xbins. You currently have
xbins = [0, len(x)]
which will give you the list [0, 100]. This means you will only see 1 bin (not 2) bounded below by 0 and above by 100. I am not totally sure what you want from your histogram. If you want to have 2 unevenly spaced bins, you can use
xbins = [0, 100, 1000]
to show everything below 100 in one bin, and everything else in the other bin. Another option would be to use an integer value to get a certain number of evenly spaced bins. In other words do
plt.hist(x, bins=50, color='blue')
where bins is the number of desired bins.
On a side note, whenever I can't remember how to do something with matplotlib, I will usually just go to the thumbnail gallery and find an example that looks more or less what I am trying to accomplish. These examples all have accompanying source code so they are quite helpful. The documentation for matplotlib can also be very handy.

Cool, thanks! Here's what I think the OP wanted to do:
import random
import matplotlib.pyplot as plt
x=[x/1000 for x in random.sample(range(100000),100)]
xbins=range(0,len(x))
plt.hist(x, bins=xbins, color='blue')
plt.show()

I am fairly sure that your problem is the bins. It is not a list of limits but rather a list of bin edges.
xbins = [0,len(x)]
returns in your case a list containing [0, 100] Indicating that you want a bin edge at 0 and one at 100. So you get one bin from 0 to 100.
What you want is:
xbins = [x for x in range(len(x))]
Which returns:
[0,1,2,3, ... 99]
Which indicates the bin edges you want.

You can achieve this using matplotlib's hist as well, no need for numpy. You have essentially already created the bins as xbins. In this case x will be your weights.
plt.hist(xbins,weights=x)

Have a look at the histogram examples in the matplotlib documentation. You should use the hist function. If it by default does not yield the result you expect, then play around with the arguments to hist and prepare/transform/modify your data before providing it to hist. It is not really clear to me what you want to achieve, so I cannot help at this point.

Related

Unable to see numbers on y axis for a barplot with seaborn [duplicate]

I've been trying to suppress scientific notation in pyplot for a few hours now. After trying multiple solutions without success, I would like some help.
plt.plot(range(2003,2012,1),range(200300,201200,100))
# several solutions from other questions have not worked, including
# plt.ticklabel_format(style='sci', axis='x', scilimits=(-1000000,1000000))
# ax.get_xaxis().get_major_formatter().set_useOffset(False)
plt.show()
Is ticklabel_format broken? does not resolve the issue of actually removing the offset.
plt.plot(np.arange(1e6, 3 * 1e7, 1e6))
plt.ticklabel_format(useOffset=False)
In your case, you're actually wanting to disable the offset. Using scientific notation is a separate setting from showing things in terms of an offset value.
However, ax.ticklabel_format(useOffset=False) should have worked (though you've listed it as one of the things that didn't).
For example:
fig, ax = plt.subplots()
ax.plot(range(2003,2012,1),range(200300,201200,100))
ax.ticklabel_format(useOffset=False)
plt.show()
If you want to disable both the offset and scientific notaion, you'd use ax.ticklabel_format(useOffset=False, style='plain').
Difference between "offset" and "scientific notation"
In matplotlib axis formatting, "scientific notation" refers to a multiplier for the numbers show, while the "offset" is a separate term that is added.
Consider this example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1000, 1001, 100)
y = np.linspace(1e-9, 1e9, 100)
fig, ax = plt.subplots()
ax.plot(x, y)
plt.show()
The x-axis will have an offset (note the + sign) and the y-axis will use scientific notation (as a multiplier -- No plus sign).
We can disable either one separately. The most convenient way is the ax.ticklabel_format method (or plt.ticklabel_format).
For example, if we call:
ax.ticklabel_format(style='plain')
We'll disable the scientific notation on the y-axis:
And if we call
ax.ticklabel_format(useOffset=False)
We'll disable the offset on the x-axis, but leave the y-axis scientific notation untouched:
Finally, we can disable both through:
ax.ticklabel_format(useOffset=False, style='plain')

Is it possible to describe with 1 parameter when a wave is sinusoidal or square in Python?

I am using scipy and I managed to filter the data with the fft package cutting the high frequencies, but that is only useful to transform the data, instead of that I want to get just 1 parameter after the analysis.
Let's have a look at some simple code to explain what I mean:
from scipy import fftpack
import numpy as np
import pandas as pd
from scipy import signal
t = np.linspace(0, 2*np.pi, 100, endpoint=True)
sq1 = signal.square(np.pi*t)
sin1 = np.sin(np.pi*t)
fft_sq1 = fftpack.dct(sq1,norm="ortho")
fft_sin1 = fftpack.dct(sin1, norm="ortho")
After applying the fast fourier transform (direct cosine) I get fft_sq1 and fft_sin1, which are arrays 100 elements long. Manipulating those coefficientes I can use later the fftpack.idct() and obtain a curve that does not contain noise.
The problem with this is that I get too much frequencies, I get 100 parameters I have to filter and after that I get again the curve.
Instead of that I am interested in a filter that returns me just 1 value:
0 if the curve is completely square
1 if the curve is exactly like a sinusoid
Does something comes to your mind?
Obviously there are infinite curves in between, if the periodic signal is more flat the number will be closer to 0 and if the curve is more round the number will be closer to 1.
Thanks!!

How do I remove the vertical line that arises from plotting discontinuous data?

I have spent a tiresome amount of time trying to figure out, how to remove the vertical line arising when plotting discontinuous data. In my case I'm trying to plot some data, which diverges towards infinity at a given point. I'm using Python 3.6 with matplotlib's pyplot package.
This code produces the same unpleasant flaw:
import matplotlib.pyplot as plt
x = np.arange(100) * 0.09
y = 1 / (x - 5)
plt.figure(1)
plt.plot(x,y)
plt.show
Is there anything I can do to remove that line? What is it, I'm not seeing?
Right now, it feels like I've exhausted my options. I've examined the documentation for matplotlib.pyplot.plot and matplotlib.pyplot.scatter and I am unable to fix this problem, even though it feels like this should be an insanely simple operation (I remember once dealing with this in Maple or MatLab or something similar - there you simply set the argument discont=True to accomplish this).
Any help would be very much appreciated.
Your data is not discontinuous. You have created a single vector (y) and are asking to plot the entire vector. You can plot individual portions of any vector as long as the size of that vector matches the size of the (x) vector, or create separate vectors.
import matplotlib.pyplot as plt
x = np.arange(100) * 0.09
y = 1 / (x - 5)
plt.figure(1)
plt.plot(x[:50],y[:50])
plt.plot(x[60:],y[60:])
plt.show()

np.fft.fft not working properly

I'm trying to use python 3.x to do an fft from some data. But when I plot I get my original data (?) not the data's fft. I'm using matlab so I can compare the results.
I've already tried many examples from this site but nothing seems to work. I'm not used to work with python. How can I get a plot similar to matlab's? I don't care if I get -f/2 to f/2 or 0 to f/2 spectrum.
My data
import scipy.io
import numpy as np
import matplotlib.pyplot as plt
mat = scipy.io.loadmat('sinal2.mat')
sinal2 = mat['sinal2']
Fs = 1000
L = 1997
T = 1.0/1000.0
fsig = np.fft.fft(sinal2)
freq = np.fft.fftfreq(len(sinal2), 1/Fs)
plt.figure()
plt.plot( freq, np.abs(fsig))
plt.figure()
plt.plot(freq, np.angle(fsig))
plt.show()
FFT from python:
FFT from matlab:
The imported signal sinal2 has a size (1997,1). In case of 2 dimensional arrays like this, numpy.fft.fft by default computes the FFT along the last axis. In this case that means computing 1997 FFTs of size 1. As you may know a 1-point FFT is an identity mapping (meaning the FFT of a single value gives the same value), hence the resulting 2D array is identical to the original array.
To avoid this, you can either specify the other axis explicitly:
fsig = np.fft.fft(sinal2, axis=0)
Or otherwise convert the data to a single dimensional array, then compute the FFT of a 1D array:
sinal2 = singal2[:,0]
fsig = np.fft.fft(sinal2)
On a final note, you FFT plot shows a horizontal line connecting the upper and lower halfs of the frequency spectrum. See my answer to another question to address this problem. Since you mention that you really only need half the spectrum, you could also truncate the result to the first N//2+1 points:
plt.plot( freq[0:len(freq)//2+1], np.abs(fsig[0:len(fsig)//2+1]))

Broken / combined colormaps

I'm plotting a 2D scalar field with imshow, and I'd like to clearly contrast negative values from positive ones. Is there a way to implement a colormap composed of two others (e.g. jet for example, hot for positive and cool for negative)?
You can read the colors from the existing cmaps and just add them, thats fairly simple but has as few drawbacks. If the original colormaps have a different number of colors, the 'edge' of both will not be centered.
If they do have the same number, the resulting cmap will be symmetric, but the 'edge' will only be at zero if the positive value equals the negative value, eg -2 & 2 or -4 & 4, etc.
This can be done like:
import matplotlib.pyplot as plt
import numpy as np
cool = plt.cm.cool
hot = plt.cm.hot
cool_vals = [cool(i) for i in range(cool.N)]
hot_vals = [hot(i) for i in range(hot.N)]
comb_vals = cool_vals + hot_vals
# random hue with constant sat and value
new_cmap = matplotlib.colors.ListedColormap(comb_vals)
plt.imshow(np.arange(20*20).reshape(20,20)-199., interpolation='none', cmap=new_cmap)
plt.colorbar()
Im not aware of very fancy methods in Matplotlib. There is a brand new Python module 'TrollImage' which has a really nice implementation of working with colormaps. Its aimed at satellite images but the colormap part of course applies to any kind of image.

Resources