Scipy.detrend: Function changes range of values - python-3.x

I am trying to detrend this one dimensional array:
array([13.64352283, 13.48914862, 13.00767009, 13.35416524, 13.60143818,
13.40895156, 13.48349417, 13.65703125, 13.4959721 , 13.28891263,
12.97999066, 13.01112397, 12.79519705, 13.32030445, 13.19949068,
12.88691975, 13.32079707])
The function runs without errors but changes the range of values from ~[12,14] to ~[-0.4,0.4].
I believe it is due to the small std dev of the values that this happens.
Any ideas how to fix this, so I can plot the array with trend and the detrended one into one plot?
Normalization is not an option.
Please help.

Well, that is exactly what detrend does: it subtracts the values of the least square linear approximation to the input.
Here is a plot to illustrate what happens:
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
y = np.array([13.64352283, 13.48914862, 13.00767009, 13.35416524, 13.60143818,
13.40895156, 13.48349417, 13.65703125, 13.4959721, 13.28891263,
12.97999066, 13.01112397, 12.79519705, 13.32030445, 13.19949068,
12.88691975, 13.32079707])
plt.plot(y, color='dodgerblue')
plt.plot(signal.detrend(y), color='limegreen')
plt.plot(y - signal.detrend(y), color='crimson')
plt.show()
The red line in the plot is the linear approximation that got subtracted from the original data to obtain detrend(y).

Related

The plot method plots the list shifted back by one, while scatter is ok

Hi the following code represents the first 10 integers' cubes.
The scatter method works fine, the plot method shifts everything one to the left.
The axis looks correct to me.
I tried to figure it out but I don't know where I'm going wrong.
Thank you .
import matplotlib.pyplot as plt
n_values = range(1,11,1)
n_cubes = [n**3 for n in n_values]
fig, ax = plt.subplots()
ax.plot(n_cubes)
ax.scatter(n_values, n_cubes, c=n_cubes, cmap=plt.cm.Reds, s=20)
ax.axis([1, 12, 0, 1100])
print(n_cubes, n_values)
plt.style.use('seaborn')
plt.show()
If you call ax.plot() with only one argument, it will make its own x-axis values. In python, these start with zero. So, all is shifted.
So, you need to call the function like this:
ax.plot(n_values, n_cubes)

Python visualization - histograms

the following two questions are regarding a histogram I am trying to build.
1) I want the bins to be as follows:
[0-10,10-20,...,580-590, 590-600]. I tried the following code:
bins_range=[]
for i in range(0,610,10):
bins_range.append(i)
plt.hist(df['something'], bins=bins_range, rwidth=0.95)
I expected to see bins as above with their corresponding amount of samples for each bin, but instead I got only 10 bins (as the default parameter).
2) How can I change the y-axis as follows: say my max bin contains 40 samples, so instead of 40 on the y-axis I want it to be 100%, and the others correspondly. I.e., 30 will be 75%, 20 will be 50% and so on.
Your code seems to be working OK. You can even pass the range command directly to the bins parameter of hist.
To get the y-axis as percentages, I think you need two passes: first calculate the bins to know how much the highest bin contains. Then, do the plotting using 1/highest as weights. There is a numpy np.hist that does all the calculations without plotting.
Use the PercentFormatter() to display the axis in percentages. It gets a parameter to tell how many 100% represents. Use PercentFormatter(max(hist)) to get the highest value as 100%. If you just want the total as 100%, just pass PercentFormatter(len(x)), without the need to calculate the histogram twice. As internally the y-axis is still in values, the ticks don't show up at the desired positions. You can use plt.yticks(np.linspace(0, max(hist), 11)) to have ticks for every 10%.
To get nicer separations between the bars, you can set an explicit edge color. Best without the rwidth=0.95
Example code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
x = np.random.rayleigh(200, 50000)
hist, bins = np.histogram(x, bins=range(0, 610, 10))
plt.hist(x, bins=bins, ec='white', fc='darkorange')
plt.gca().yaxis.set_major_formatter(PercentFormatter(max(hist)))
plt.yticks(np.linspace(0, max(hist), 11))
plt.show()
PS: To use matplotlib's standard yticks, and having the y-axis also internally in percentages, you can use the weights parameter of hist. This can be handy when you want to interactively resize or zoom the plot, or need horizontal lines at specific percentages.
plt.hist(x, bins=bins, ec='white', fc='dodgerblue', weights=np.ones_like(x)/max(hist))
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))

How the standard normal distribution works in practice in NumPy and PyTorch?

I have two points to ask about:
1)
I would like to understand what is precisely returned from the np.random.randn from NumPy and torch.randn from PyTorch. They both return a tensor with random numbers from a normal distribution with mean 0 and std 1, hence, a standard normal distribution. However, it is not the same thing as puting x values in the standard normal distribution function here and getting its respective image values y. The values returned by PyTorch and NumPy does not seem like this.
For me, it seems that both np.random.randn and torch.randn from these libraries returns the x values from the functions, not the image y as I calculated below. Is that correct?
normal = np.array([(1/np.sqrt(2*np.pi))*np.exp(-(1/2)*(i**2)) for i in range(-38,39)])
Printing the normal variable shows me something like this.
array([1.10e-314, 2.12e-298, 1.51e-282, 3.94e-267, 3.79e-252, 1.34e-237,
1.75e-223, 8.36e-210, 1.47e-196, 9.55e-184, 2.28e-171, 2.00e-159,
6.45e-148, 7.65e-137, 3.34e-126, 5.37e-116, 3.17e-106, 6.90e-097,
5.52e-088, 1.62e-079, 1.76e-071, 7.00e-064, 1.03e-056, 5.53e-050,
1.10e-043, 8.00e-038, 2.15e-032, 2.12e-027, 7.69e-023, 1.03e-018,
5.05e-015, 9.13e-012, 6.08e-009, 1.49e-006, 1.34e-004, 4.43e-003,
5.40e-002, 2.42e-001, 3.99e-001, 2.42e-001, 5.40e-002, 4.43e-003,
1.34e-004, 1.49e-006, 6.08e-009, 9.13e-012, 5.05e-015, 1.03e-018,
7.69e-023, 2.12e-027, 2.15e-032, 8.00e-038, 1.10e-043, 5.53e-050,
1.03e-056, 7.00e-064, 1.76e-071, 1.62e-079, 5.52e-088, 6.90e-097,
3.17e-106, 5.37e-116, 3.34e-126, 7.65e-137, 6.45e-148, 2.00e-159,
2.28e-171, 9.55e-184, 1.47e-196, 8.36e-210, 1.75e-223, 1.34e-237,
3.79e-252, 3.94e-267, 1.51e-282, 2.12e-298, 1.10e-314])
2) Also, if we ask these libraries that I want a matrix of values from a standard normal distribution, it means that all rows and columns are draw from the same standard distribution? If I want i.i.d distributions in every row, I would need to call np.random.randn over a for loop for each row and then vstack them?
1) Yes, they give you x and not phi(x) since the formula for phi(x) gives the probability density of sampling a value x. If you want to know the probability of getting values in an interval [a,b] you need to integrate phi(x) between a and b. Intuitively, if you look at the function phi(x) you'll see that you're more likely to get values near zero than, say, values near 1.
An easy way to see it, is look at the histogram of the sampled values.
import numpy as np
import matplotlib.pyplot as plt
samples = np.random.normal(size=[1000])
plt.hist(samples)
2) they're iid. Just use a 2d size like so:
samples = np.random.normal(size=[10, 10])

Creating a structured grid of subplots with Seaborn FacetGrid

My attempt to use FacetGrid in Seaborn does not produces the expected results.
Moreover, I would like to control the white space in the grid.
My data and code is the following:
toy.to_json()
'{"has_cus_id_but_not_acc_id":{"0":0,"1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":0,"8":0,"9":0,"10":0,"11":0,"12":0,"13":0,"14":0,"15":0,"16":0,"17":0,"18":1,"19":0,"20":0,"21":0,"22":1,"23":0,"24":0,"25":1,"26":0,"27":1,"28":0,"29":1,"30":0,"31":1,"32":0,"33":1,"34":0,"35":1,"36":0,"37":1,"38":0,"39":0,"40":1,"41":1,"42":0,"43":1,"44":0,"45":1,"46":0,"47":1,"48":0,"49":1,"50":0,"51":1,"52":0,"53":1,"54":0,"55":1,"56":0,"57":1,"58":0,"59":1,"60":0,"61":1,"62":0,"63":1,"64":0,"65":1,"66":0,"67":1,"68":0,"69":1,"70":0,"71":1,"72":0,"73":1,"74":0,"75":1,"76":0,"77":0,"78":1,"79":0,"80":1,"81":0,"82":0,"83":1,"84":0,"85":1},"reg_year":{"0":2014.0,"1":2014.0,"2":2014.0,"3":2014.0,"4":2014.0,"5":2014.0,"6":2014.0,"7":2014.0,"8":2015.0,"9":2015.0,"10":2015.0,"11":2015.0,"12":2015.0,"13":2015.0,"14":2015.0,"15":2015.0,"16":2015.0,"17":2016.0,"18":2016.0,"19":2016.0,"20":2016.0,"21":2016.0,"22":2016.0,"23":2016.0,"24":2016.0,"25":2016.0,"26":2016.0,"27":2016.0,"28":2016.0,"29":2016.0,"30":2016.0,"31":2016.0,"32":2016.0,"33":2016.0,"34":2016.0,"35":2016.0,"36":2016.0,"37":2016.0,"38":2017.0,"39":2017.0,"40":2017.0,"41":2017.0,"42":2017.0,"43":2017.0,"44":2017.0,"45":2017.0,"46":2017.0,"47":2017.0,"48":2017.0,"49":2017.0,"50":2017.0,"51":2017.0,"52":2017.0,"53":2017.0,"54":2017.0,"55":2017.0,"56":2017.0,"57":2017.0,"58":2017.0,"59":2017.0,"60":2018.0,"61":2018.0,"62":2018.0,"63":2018.0,"64":2018.0,"65":2018.0,"66":2018.0,"67":2018.0,"68":2018.0,"69":2018.0,"70":2018.0,"71":2018.0,"72":2018.0,"73":2018.0,"74":2018.0,"75":2018.0,"76":2018.0,"77":2018.0,"78":2018.0,"79":2018.0,"80":2018.0,"81":2018.0,"82":2019.0,"83":2019.0,"84":2019.0,"85":2019.0},"reg_month":{"0":3.0,"1":5.0,"2":6.0,"3":7.0,"4":9.0,"5":10.0,"6":11.0,"7":12.0,"8":1.0,"9":3.0,"10":5.0,"11":6.0,"12":7.0,"13":8.0,"14":9.0,"15":11.0,"16":12.0,"17":1.0,"18":1.0,"19":2.0,"20":3.0,"21":4.0,"22":4.0,"23":5.0,"24":6.0,"25":6.0,"26":7.0,"27":7.0,"28":8.0,"29":8.0,"30":9.0,"31":9.0,"32":10.0,"33":10.0,"34":11.0,"35":11.0,"36":12.0,"37":12.0,"38":1.0,"39":2.0,"40":2.0,"41":3.0,"42":4.0,"43":4.0,"44":5.0,"45":5.0,"46":6.0,"47":6.0,"48":7.0,"49":7.0,"50":8.0,"51":8.0,"52":9.0,"53":9.0,"54":10.0,"55":10.0,"56":11.0,"57":11.0,"58":12.0,"59":12.0,"60":1.0,"61":1.0,"62":2.0,"63":2.0,"64":3.0,"65":3.0,"66":4.0,"67":4.0,"68":5.0,"69":5.0,"70":6.0,"71":6.0,"72":7.0,"73":7.0,"74":8.0,"75":8.0,"76":9.0,"77":10.0,"78":10.0,"79":11.0,"80":11.0,"81":12.0,"82":1.0,"83":1.0,"84":2.0,"85":2.0},"Total_Revenue":{"0":35852.02,"1":2623.97,"2":3526.67,"3":21466.71,"4":72784.1200000003,"5":103921.2899999999,"6":10852.87,"7":16522.07,"8":7443.76,"9":68962.1600000002,"10":10956.38,"11":193856.8799999985,"12":110766.6099999997,"13":123861.8599999987,"14":2722.34,"15":303488.6900000007,"16":6876.58,"17":17729.5,"18":4687.93,"19":26914.06,"20":2228.12,"21":15708.93,"22":859.58,"23":19164.89,"24":163164.4799999995,"25":33180.7300000001,"26":10033.01,"27":1114.48,"28":462613.2900000042,"29":9822.95,"30":70901.4400000003,"31":22370.29,"32":46711.8900000002,"33":2335.02,"34":7259.28,"35":11.83,"36":13590.51,"37":7677.77,"38":282.01,"39":358522.7900000003,"40":5844.0,"41":7027.28,"42":1908.71,"43":4032.35,"44":11072.6,"45":3973.15,"46":30706.23,"47":2644.13,"48":23831.75,"49":670.12,"50":6949.54,"51":4687.7,"52":9672.69,"53":7333.01,"54":12814.33,"55":689.39,"56":6962.86,"57":2283.16,"58":1259.5,"59":224.84,"60":12812.12,"61":247.68,"62":25452.65,"63":1245.02,"64":24211.36,"65":5255.25,"66":28402.76,"67":9148.55,"68":14822.61,"69":345.37,"70":12408.13,"71":989.93,"72":10601.33,"73":730.32,"74":169020.5000000001,"75":697.54,"76":3862038.6799997138,"77":6148750.9899984254,"78":194.06,"79":2379382.4500000761,"80":1174.11,"81":1729567.9000000793,"82":889650.029999995,"83":95.8,"84":415996.6999999974,"85":654.78}}'
g = sns.FacetGrid(toy, col='has_cus_id_but_not_acc_id', hue='reg_year')
g.map(sns.barplot, 'reg_month', 'Total_Revenue')
g.add_legend();
If I use bar in pyplot I get this:
g = sns.FacetGrid(toy, col='has_cus_id_but_not_acc_id', hue='reg_year')
g.map(plt.bar, 'reg_month', 'Total_Revenue')
g.add_legend();
Again, I would like to be able to define the white space of the grid.
In addition I would not like to have the bars stacked one over the other but rather one next to the other.
Some values of the year 2018 are really large compared to the any of the values where has_cus_id_but_not_acc_id is 1. Hence the right plot is almost empty. It might make sense to use a logarithmic scale.
Now you have 6 years, so each month would need to show 6 bars next to each other. That will make bars pretty small and does not let the chart be easily readable. Still it's possible.
The following does not use seaborn, but pandas and matplotlib:
import matplotlib.pyplot as plt
import pandas as pd
toy = '{"has_cus_id_but_not_acc_id":{"0":0,"1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":0,"8":0,"9":0,"10":0,"11":0,"12":0,"13":0,"14":0,"15":0,"16":0,"17":0,"18":1,"19":0,"20":0,"21":0,"22":1,"23":0,"24":0,"25":1,"26":0,"27":1,"28":0,"29":1,"30":0,"31":1,"32":0,"33":1,"34":0,"35":1,"36":0,"37":1,"38":0,"39":0,"40":1,"41":1,"42":0,"43":1,"44":0,"45":1,"46":0,"47":1,"48":0,"49":1,"50":0,"51":1,"52":0,"53":1,"54":0,"55":1,"56":0,"57":1,"58":0,"59":1,"60":0,"61":1,"62":0,"63":1,"64":0,"65":1,"66":0,"67":1,"68":0,"69":1,"70":0,"71":1,"72":0,"73":1,"74":0,"75":1,"76":0,"77":0,"78":1,"79":0,"80":1,"81":0,"82":0,"83":1,"84":0,"85":1},"reg_year":{"0":2014.0,"1":2014.0,"2":2014.0,"3":2014.0,"4":2014.0,"5":2014.0,"6":2014.0,"7":2014.0,"8":2015.0,"9":2015.0,"10":2015.0,"11":2015.0,"12":2015.0,"13":2015.0,"14":2015.0,"15":2015.0,"16":2015.0,"17":2016.0,"18":2016.0,"19":2016.0,"20":2016.0,"21":2016.0,"22":2016.0,"23":2016.0,"24":2016.0,"25":2016.0,"26":2016.0,"27":2016.0,"28":2016.0,"29":2016.0,"30":2016.0,"31":2016.0,"32":2016.0,"33":2016.0,"34":2016.0,"35":2016.0,"36":2016.0,"37":2016.0,"38":2017.0,"39":2017.0,"40":2017.0,"41":2017.0,"42":2017.0,"43":2017.0,"44":2017.0,"45":2017.0,"46":2017.0,"47":2017.0,"48":2017.0,"49":2017.0,"50":2017.0,"51":2017.0,"52":2017.0,"53":2017.0,"54":2017.0,"55":2017.0,"56":2017.0,"57":2017.0,"58":2017.0,"59":2017.0,"60":2018.0,"61":2018.0,"62":2018.0,"63":2018.0,"64":2018.0,"65":2018.0,"66":2018.0,"67":2018.0,"68":2018.0,"69":2018.0,"70":2018.0,"71":2018.0,"72":2018.0,"73":2018.0,"74":2018.0,"75":2018.0,"76":2018.0,"77":2018.0,"78":2018.0,"79":2018.0,"80":2018.0,"81":2018.0,"82":2019.0,"83":2019.0,"84":2019.0,"85":2019.0},"reg_month":{"0":3.0,"1":5.0,"2":6.0,"3":7.0,"4":9.0,"5":10.0,"6":11.0,"7":12.0,"8":1.0,"9":3.0,"10":5.0,"11":6.0,"12":7.0,"13":8.0,"14":9.0,"15":11.0,"16":12.0,"17":1.0,"18":1.0,"19":2.0,"20":3.0,"21":4.0,"22":4.0,"23":5.0,"24":6.0,"25":6.0,"26":7.0,"27":7.0,"28":8.0,"29":8.0,"30":9.0,"31":9.0,"32":10.0,"33":10.0,"34":11.0,"35":11.0,"36":12.0,"37":12.0,"38":1.0,"39":2.0,"40":2.0,"41":3.0,"42":4.0,"43":4.0,"44":5.0,"45":5.0,"46":6.0,"47":6.0,"48":7.0,"49":7.0,"50":8.0,"51":8.0,"52":9.0,"53":9.0,"54":10.0,"55":10.0,"56":11.0,"57":11.0,"58":12.0,"59":12.0,"60":1.0,"61":1.0,"62":2.0,"63":2.0,"64":3.0,"65":3.0,"66":4.0,"67":4.0,"68":5.0,"69":5.0,"70":6.0,"71":6.0,"72":7.0,"73":7.0,"74":8.0,"75":8.0,"76":9.0,"77":10.0,"78":10.0,"79":11.0,"80":11.0,"81":12.0,"82":1.0,"83":1.0,"84":2.0,"85":2.0},"Total_Revenue":{"0":35852.02,"1":2623.97,"2":3526.67,"3":21466.71,"4":72784.1200000003,"5":103921.2899999999,"6":10852.87,"7":16522.07,"8":7443.76,"9":68962.1600000002,"10":10956.38,"11":193856.8799999985,"12":110766.6099999997,"13":123861.8599999987,"14":2722.34,"15":303488.6900000007,"16":6876.58,"17":17729.5,"18":4687.93,"19":26914.06,"20":2228.12,"21":15708.93,"22":859.58,"23":19164.89,"24":163164.4799999995,"25":33180.7300000001,"26":10033.01,"27":1114.48,"28":462613.2900000042,"29":9822.95,"30":70901.4400000003,"31":22370.29,"32":46711.8900000002,"33":2335.02,"34":7259.28,"35":11.83,"36":13590.51,"37":7677.77,"38":282.01,"39":358522.7900000003,"40":5844.0,"41":7027.28,"42":1908.71,"43":4032.35,"44":11072.6,"45":3973.15,"46":30706.23,"47":2644.13,"48":23831.75,"49":670.12,"50":6949.54,"51":4687.7,"52":9672.69,"53":7333.01,"54":12814.33,"55":689.39,"56":6962.86,"57":2283.16,"58":1259.5,"59":224.84,"60":12812.12,"61":247.68,"62":25452.65,"63":1245.02,"64":24211.36,"65":5255.25,"66":28402.76,"67":9148.55,"68":14822.61,"69":345.37,"70":12408.13,"71":989.93,"72":10601.33,"73":730.32,"74":169020.5000000001,"75":697.54,"76":3862038.6799997138,"77":6148750.9899984254,"78":194.06,"79":2379382.4500000761,"80":1174.11,"81":1729567.9000000793,"82":889650.029999995,"83":95.8,"84":415996.6999999974,"85":654.78}}'
df = pd.read_json(toy)
df['reg_year'].astype(int)
u = df["has_cus_id_but_not_acc_id"].unique()
y = df['reg_year'].unique()
fig, axes = plt.subplots(1,len(u), sharey=True)
axes[0].set_yscale("log")
for ax, (n, grp) in zip(axes.flat, df.groupby("has_cus_id_but_not_acc_id")):
piv = grp.pivot('reg_month', 'reg_year', 'Total_Revenue')
empty = pd.DataFrame(index=range(1,12), columns=y)
empty.combine_first(piv).plot.bar(ax=ax, width=0.8, legend=False)
axes[1].legend()
plt.show()

plotting asymmetric errorbars using matplotlib

I am trying to plot asymmetric error bars which are really 95% confidence interval. The output that I get is not the desired outcome. I am not sure what part of the code is not giving rise to the desired outcome.
import numpy as np
import matplotlib.pyplot as plt
x = (18,20,22,24,26,28,30,32,34)
apo_average = (1933.877,1954.596,2058.192,2244.664,2265.383,2265.383,2306.821,2534.731,2576.169)
std_apo=(35.88652754,0,179.4326365,35.88652754,0,0,35.88652754,35.88652696,0)
error = np.array(apo_average)
lower_error_apo=error-((4.303*(np.array(std_apo)))/np.sqrt(3))
higher_error_apo=error+((4.303*(np.array(std_apo)))/np.sqrt(3))
asymmetric_error_apo=[lower_error_apo, higher_error_apo]
fig = plt.figure()
ax = fig.add_subplot(111)
plt.scatter(x,apo_average,marker='o',label="0 Cu", color='none', edgecolor='blue', linewidth='1')
ax.errorbar(x,apo_average,yerr=asymmetric_error_apo, markerfacecolor='blue',markeredgecolor='blue')
The outcome is 
This is quite unexpected. For instance, I intended to put a lower error for the first error bar to be 1844.723, which doesn't agree with what's shown in the picture. This trend stays the same with every error bars.
Most of the time it helps to read the documentation which states
xerr/yerr : scalar or array-like, shape(N,) or shape(2,N), optional
If a scalar number, len(N) array-like object, or a N-element array-like object, errorbars are drawn at +/-value relative to the data. Default is None.
If a sequence of shape 2xN, errorbars are drawn at -row1 and +row2 relative to the data.
You therefore need to use the values calculated from the standard deviation directly, instead of subtracting them from or adding them to the mean.
lower_error_apo=(4.303*(np.array(std_apo)))/np.sqrt(3)
higher_error_apo=(4.303*(np.array(std_apo)))/np.sqrt(3)

Resources