talib computed rsi and exchange rsi look very different - ta-lib

I found some mentions of similar issues but nothing concrete, since im new to data science with python its probably a mistake on my side.
I also tried reversing the the input data but both plots seem way to different to look similar to what the exchange is showing.
All ideas apreciated
Cheers
Pic:tradingview.com vs talib RSI
import json
import coinbase
import numpy as np
import requests as req
price_hist = req.get("https://api.pro.coinbase.com/products/BTC-EUR/candles?granularity=3600")# [ time, low, high, open, close, volume ],
data = json.loads(price_hist.content.decode('utf-8'))
candles = np.array(data)
close_data = candles[:,4]
close_data_rev = np.flip(candles[:,4], 0)
rsi_graph = ta.RSI(close_data, timeperiod=14)
rsi_graph_rev = ta.RSI(close_data_rev, timeperiod=14)
plt.plot(x_data, rsi_graph)
plt.plot(x_data, rsi_graph_rev)
plt.xticks(rotation=45)
fig_size[0] = 12
fig_size[1] = 9
plt.show()

Figured it out. The flip command didnt work as intented. Flipped the array now with close_data_rev = close_data_rev [::-1] and that makes the rsi look like on the exchanges.

Related

Is there a way using librosa's waveplot to store the coordinates of the graph rather than show the image of the waveplot?

I am working on an audio project where I am using Librosa and have the following code from an example online. Rather than opening up an image with a graph of the amplitude versus time, I want to be able to store the coordinates that make up the graph in an array. I have tried a lot of different examples found on stackoverflow as well as other websites with no luck. I am relatively new to python and this is my first question on stackoverflow so please be kind.
import librosa.display
import matplotlib.pyplot as plt
from IPython.display import display, Audio
filename = 'queen2.mp3'
samples, sampleRate = librosa.load(filename)
display(Audio(filename))
plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=None, max_points=200)
plt.show()
librosa is open-source (under the ISC license), so you can look at the code to see how it does this. The documentation for functions has a handy [source] link which takes you do the code. For librosa.display.waveplot you will see that it calls a function __envelope() to compute the envelope. Presumably it is these coordinates you are after.
hop_length = 1
y = __envelope(y, hop_length)
y_top = y[0]
y_bottom = -y[-1]
import numpy as np
def __envelope(x, hop):
'''Compute the max-envelope of non-overlapping frames of x at length hop
x is assumed to be multi-channel, of shape (n_channels, n_samples).
'''
x_frame = np.abs(util.frame(x, frame_length=hop, hop_length=hop))
return x_frame.max(axis=1)

How to get the interceipt from model summary in Python linearmodels?

I am running a panel reggression using Python linearmodels, something like:
import pandas as pd
from linearmodels.panel import PanelOLS
data = pd.read_csv('data.csv', sep=',')
data = data.set_index(['panel_id', 'date'])
controls = ['A','B','C']
controls['const'] = 1
model = PanelOLS(data.Y, controls, entity_effects= True)
result = model.fit(use_lsdv=True)
I really need to pull out the coefficient on the constant, but looks like this would not work
intercept = result.summary.const
Could not really find the answer in
linearmodels' documentation on github
More generally, does anyone know how to pull out the estimate coefficients from the linearmodels summary? Thank you!
result.params['const']
would give the intercept, in general result.params gives the series of regression coefficients in linearmodels

Vast difference in cv2 imshow vs matplotlib imshow?

I am currently working on a program that requires me to read DICOM files and display them correctly. After extracting the pixel array from the DICOM file, I ran it through both the imshow function from matplotlib and cv2. To my surprise they both yield vastly different images. One has color while the other has no, and one shows more detail than the other. Im confused as to why this is happening. I found Difference between plt.show and cv2.imshow? and tried converting the pixels to BRG instead of RGB what cv2 uses but this changes nothing. I am wondering why it is that these 2 frameworks show the same pixel buffer so differently. below is my code and an image to show the outcomes
import cv2
import os
import pydicom
import numpy as np
import matplotlib.pyplot as plt
inputdir = 'datasets/dicom/98890234/20030505/CT/CT2/'
outdir = 'datasets/dicom/pngs/'
test_list = [ f for f in os.listdir(inputdir)]
for f in test_list[:1]: # remove "[:10]" to convert all images
ds = pydicom.dcmread(inputdir + f)
img = np.array(ds.pixel_array, dtype = np.uint8) # get image array
rows,cols = img.shape
cannyImg = cv2.Canny(img, cols, rows)
cv2.imshow('thing',cv2.cvtColor(img, cv2.COLOR_BRG2RBG))
cv2.imshow('thingCanny', cannyImg)
plt.imshow(ds.pixel_array)
plt.show()
cv2.waitKey()
Using the cmap parameter with imshow() might solve the issue. Try this:
plt.imshow(arr, cmap='gray', vmin=0, vmax=255)
Refer to the docs for more info.
Not an answer but too long for a comment. I think the root cause of your problems is in the initialization of the array already:
img = np.array(ds.pixel_array, dtype = np.uint8)
uint8 is presumably not what you have in the DICOM file. First because it looks like a CT image which is usually stored with 10+ bpp and second because the artifacts you are facing look very familiar to me. These kind of artifacts (dense bones displayed in black, gradient effects) usually occur if >8 bit pixeldata is interpreted as 8bit.
BTW: To me, both renderings look obviously incorrect.
Sorry for not being a python expert and just being able to tell what is wrong but unable to tell how to get it right.

Adding numerical values from dict to a new column in a Pandas DataFrame

I am practicing machine learning and working with a movie/rating dataset. I am trying to create a new column in the dataframe which numerically identifies each genre (around 1300 of them). My logic was to create a dictionary of the unique genres and label with a integer. Then create a for loop to iterate through each row of the dataframe, checking the genre of each, then assigning its appropriate value to a new column named "genre_Id". However this has been causing a infinite loop in which I can not even break with ctrl-c. Same issue when working in Jupyter ( Interrupt Kernel fails to stop it). Below is a summarized version of my approach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
movies_data = pd.read_csv("C://mypython/moviedata/movies.csv")
ratings_data = pd.read_csv("C://mypython/moviedata/ratings.csv")
joined = pd.merge(movies_data,ratings_data, how = 'inner', on=['movieId'])
print(joined.head())
pd.options.display.float_format = '{:,.2f}'.format
genres = joined['genres'].unique()
genre_dict = {}
Id = 1
for i in genres:
genre_dict[i] = Id
Id += 1
joined['genre_id'] = 0
increment = 0
for i in joined['genres']:
if i in genre_dict:
joined['genre_id'][increment] = genre_dict[i]
increment += 1
I know I should probably be taking a smaller sample to work with as there is about 20,000,000 rows in the dataset but I figured I'd try this as a exercise.
I also recieve the "setting values from copy warning" though this hasn't caused me issues in the past for my other projects. Any thoughts on how to do this would be greatly appreciated.
EDIT Found a solution using the Series map feature.
joined['genre_id'] = joined.genres.map(genre_dict)
I have no permission to just comment. This is a suggestion and right procedure to handle categorical values in a dataset. You can use inbuilt sklearn.preprocessing.OneHotEncoder function which do the work you wanted to do.
For better understanding with examples check this One Hot Encode Sequence Data in Python. Let me know if this works for you.

Optimal way to display data with different ranges

I have an application which I pull data from an FPGA & display it for the engineers. Good application ... until you start displaying data which are extremely different in ranges...
say: a signal perturbating around +4000 and another around zero (both with small peak-peak).
At the moment the only real workaround is to "export to csv" and then view in Excel but I would like to improve the application so that this isn't needed
Option 1 is a more dynamic pointer that will give you readings of ALL visible plots for the present x
Option 2. Multiple Y axis. This is where it gets a bit ... tight with respect to UI area.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA
import numpy as np
t = np.arange(0,1,0.00001)
data = [5000*np.sin(t*2*np.pi*10),
10*np.sin(t*2*np.pi*20),
20*np.sin(t*2*np.pi*30),
np.sin(t*2*np.pi*40)+5000,
np.sin(t*2*np.pi*50)-5000,
np.sin(t*2*np.pi*60),
np.sin(t*2*np.pi*70),
]
fig = plt.figure()
host = host_subplot(111, axes_class=AA.Axes)
axis_list = [None]*7
for i in range(len(axis_list)):
axis_list[i] = host.twinx()
new_axis = axis_list[i].get_grid_helper().new_fixed_axis
axis_list[i].axis['right'] = new_axis(loc='right',
axes=axis_list[i],
offset=(60*i,0))
axis_list[i].axis['right'].toggle(all=True)
axis_list[i].plot(t,data[i])
plt.show()
for i in data:
plt.plot(t,i)
plt.show()
This code snippet doesn't contain any figure resize to ensure all 7 y-axis are visible BUT ignoring that, you can see it is quite large...
Any advice with respect to multi-Y or a better solution to displaying no more than 7 datasets?

Resources