enquiry on uniformly distributed random numbers using python

enquiry on uniformly distributed random numbers using python - python-3.x

please how can I randomly generate 5,000 integers uniformly distributed in [1, 100] and find the mean using python. I tried the function np.random.randint(100, size=5000), but I got below error message while trying to get the mean.
Traceback (most recent call last):
File "", line 1, in
TypeError: 'numpy.ndarray' object is not callable

You can use random.randint:
import numpy as np
r=np.random.randint(0,100,5000)
Then use mean to find the mean of that:
>>> np.mean(r)
49.4686
You can also use the array method of mean():
>>> r.mean()
49.4686

you can use this:
np.random.randint(1, 100, size=1000).mean()

Related

Size of WebDataset in Pytorch

When it comes to the Pytorch Dataloader which takes a default dataset (e.g. datasets.ImageFolder), we can find the size of a dataset that is used by the dataloader with len(dataloader). However, what about WebDataset?
As WebDataset is a PyTorch Dataset, is it possible to get the size of a loader which takes a WebDataset?
https://webdataset.github.io/webdataset/

WebDataset doesn't provide a __len__ method, as it conforms to the PyTorch IterableDataset interface. IterableDataset is designed for stream-like data, and considers it wrong to have a len().
If you have code that depends on len() to be available, you can set the length to some value using with_length():
>>> dataset = wds.WebDataset(url)
>>> len(dataset)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'WebDataset' has no len()
>>> dataset = dataset.with_length(10)
>>> len(dataset)
10

Scipy and Numpy upgrade generates "TypeError: Cannot cast array data from dtype('O') to dtype('float64')"

I'm converting a code from Python 2.7 to Python 3.8.
In its Python 2.7 version, I had to use downgraded versions of scipy and numpy in order to avoid a TypeError (see below). With Python 3.8, these downgraded versions of scipy and numpy are not available anymore and I get this error, which I'm unable to fix.
1. Setup
Previous : MacOS Catalina 10.15.7, Python 2.7.16, numpy 1.9.0, scipy 1.0.1
New : MacOS Catalina 10.15.7, Python 3.8.6, numpy 1.20.3, scipy 1.4.0
2. Code
It happens when calling scipy.integrate.odeint :
y_trajectory = scipy.integrate.odeint(growth_a_derivs, y_start, t_array, atol=eps, args=myargs)
With growth_a_derivs a function, y_start, t_array numpy arrays with dtype float64, eps a float and myargs a tuple of length 1 containing a dictionary.
3. Error
The following traceback pops :
Traceback (most recent call last):
<long blah blah blah>
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyxcosmo/icosmo_cosmo.py", line 149, in growth_a
result=growth_a_ode(cosmo, a, unnorm=unnorm)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyxcosmo/icosmo_cosmo.py", line 671, in growth_a_ode
y_trajectory = scipy.integrate.odeint(growth_a_derivs, y_start, t_array, atol=eps, args=myargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/scipy/integrate/odepack.py", line 242, in odeint
output = _odepack.odeint(func, y0, t, args, Dfun, col_deriv, ml, mu,
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
So the last line is in scipy's integrate pckg, the full call to odeint is :
output = _odepack.odeint(func, y0, t, args, Dfun, col_deriv, ml, mu,
full_output, rtol, atol, tcrit, h0, hmax, hmin,
ixpr, mxstep, mxhnil, mxordn, mxords,
int(bool(tfirst)))
Inside this, the error is in a compiled scipy .so file, thus I can't analyze this error further. I just know that the call to growth_a_derivs isn't problematic (its print() calls all work). But I don't know whether its a numpy or scipy problem, and how to overcome it.
4. Resolution !
I was able to understand the problem thanks to #RFoxtea answer below.
I believe this issue was caused by the change in numpy version.
The output of growth_a_derivs is :
np.array([f1,f2])
with :
type(f1) = numpy.float64
type(f2) = numpy.ndarray
f2.shape = (1,)
With numpy 1.9, this gives :
np.array([f1,f2]) = [1.234, np.array([1.234])]
np.array([f1,f2]).dtype = numpy.float64
With numpy 1.20, this gives :
np.array([f1,f2]) = [1.234, np.array([1.234])]
np.array([f1,f2]).dtype = numpy.object
And that is not accepted by scipy.integrate.odeint.
However, notice that scipy.integrate.solve_ivp is ok with that.
Thanks for your help !

At some point in running scipy.integrate.odeint, Numpy is told to convert an array of Python objects into an array of floats, and it's answering you that it can't do that. Your description suggests the problem has to be with y_start, t_start or (maybe, I'm not sure) the return value of growth_a_derivs. Please make sure that these all have an appropriate dtype that can be converted into a float. Could be very easy to fix.

Thanks to #RFoxtea who provided the right solution.
However there is an interesting alternative. You can use scipy.integrate.solve_ivp instead of scipy.integrate.odeint.
Solve_ivp can handle an output of funthat is with dtype object.

Programming a simple Stock prediction service with Alpha Vantage in Python. I get this error

This is the program for the stock prediction to be simply printed...
from alpha_vantage.timeseries import TimeSeries
# Your key here
key = 'yourkeyhere'
ts = TimeSeries(key)
aapl, meta = ts.get_daily(symbol='AAPL')
print(aapl['2020-22-5'])
I get this error...
Traceback (most recent call last):
File "C:/Users/PycharmProjects/AlphaVantageTest/AlphaVantageTest.py", line 7, in <module>
print(aapl['2020-22-5'])
KeyError: '2020-22-5'
Since that didn't work, I tried getting a little more technical with it...
from alpha_vantage.timeseries import TimeSeries
from alpha_vantage.techindicators import TechIndicators
from matplotlib.pyplot import figure
import matplotlib.pyplot as plt
# Your key here
key = 'W01B6S3ALTS82VRF'
# Chose your output format, or default to JSON (python dict)
ts = TimeSeries(key, output_format='pandas')
ti = TechIndicators(key)
# Get the data, returns a tuple
# aapl_data is a pandas dataframe, aapl_meta_data is a dict
aapl_data, aapl_meta_data = ts.get_daily(symbol='AAPL')
# aapl_sma is a dict, aapl_meta_sma also a dict
aapl_sma, aapl_meta_sma = ti.get_sma(symbol='AAPL')
# Visualization
figure(num=None, figsize=(15, 6), dpi=80, facecolor='w', edgecolor='k')
aapl_data['4. close'].plot()
plt.tight_layout()
plt.grid()
plt.show()
I get these errors...
Traceback (most recent call last):
File "C:/Users/PycharmProjects/AlphaVantageTest/AlphaVantageTest.py", line 9, in <module>
ts = TimeSeries(key, output_format='pandas')
File "C:\Users\PycharmProjects\AlphaVantageTest\venv\lib\site-packages\alpha_vantage\alphavantage.py", line 66, in __init__
raise ValueError("The pandas library was not found, therefore can "
ValueError: The pandas library was not found, therefore can not be used as an output format, please install manually
How can I improve my program so that I don't receive these errors? None of these programs are bad syntax wise. Thank you to anyone that can help.

You need to install pandas. If you're just using pip, you can run pip install pandas if you are using conda to manage your envs you can use conda install pandas.
Glad it worked. According to this meta overflow post: What if I answer a question in a comment?
I am posting my comment as an answer so you can mark the question as answered.

Saving Numpy Structure Array to *.mat file

I am using numpy.loadtext to generate a structured Numpy array from a CSV data file that I would like to save to a MAT file for colleagues who are more familiar with MATLAB than Python.
Sample case:
import numpy as np
import scipy.io
mydata = np.array([(1, 1.0), (2, 2.0)], dtype=[('foo', 'i'), ('bar', 'f')])
scipy.io.savemat('test.mat', mydata)
When I attempt to use scipy.io.savemat on this array, the following error is thrown:
Traceback (most recent call last):
File "C:/Project Data/General Python/test.py", line 6, in <module>
scipy.io.savemat('test.mat', mydata)
File "C:\python35\lib\site-packages\scipy\io\matlab\mio.py", line 210, in savemat
MW.put_variables(mdict)
File "C:\python35\lib\site-packages\scipy\io\matlab\mio5.py", line 831, in put_variables
for name, var in mdict.items():
AttributeError: 'numpy.ndarray' object has no attribute 'items'
I'm a Python novice (at best), but I'm assuming this is because savemat is set up to handle dicts and the structure of Numpy's structured arrays is not compatible.
I can get around this error by pulling my data into a dict:
tmp = {}
for varname in mydata.dtype.names:
tmp[varname] = mydata[varname]
scipy.io.savemat('test.mat', tmp)
Which loads into MATLAB fine:
>> mydata = load('test.mat')
mydata =
foo: [1 2]
bar: [1 2]
But this seems like a very inefficient method since I'm duplicating the data in memory. Is there a smarter way to accomplish this?

You can do scipy.io.savemat('test.mat', {'mydata': mydata}).
This creates a struct mydata with fields foo and bar in the file.
Alternatively, you can pack your loop in a dict comprehension:
tmp = {varname: mydata[varname] for varname in mydata.dtype.names}
I don't think creating a temprorary dictionary duplicates data in memory, because Python generally only stores references, and numpy in particular tries to create views into the original data whenever possible.

sklearn LogisticRegression does not accept csr_matrix

I am a newby and I have to classify the words of a lexicon according to the De Pauw and Wagacha (1998) method (basically, maxent on char n-grams). The data is very large (500 000 entries and millions of n-grams). So I must load the samples as a sparse matrix. But I ran into a problem.
sklearn.linear_model.LogisticRegression().fit(X,y) says it does not accept scipy.sparse.csr.csr_matrix training vectors. I got this error
Traceback (most recent call last):
File "test-LR-4.py", line 8, in <module>
clf.fit(X,y)
File "/usr/lib/pymodules/python2.7/sklearn/svm/base.py", line 441, in fit
% type(X))
ValueError: Training vectors should be array-like, not <class 'scipy.sparse.csr.csr_matrix'>
for the following script:
from sklearn.linear_model import LogisticRegression
import numpy as np
import scipy.sparse as sp
X = sp.csr_matrix([[0, 1, 2],[1, 2, 3],[3, 2, 1]])
y = np.array(range(3))
clf=LogisticRegression(dual=True)
clf.fit(X,y)

As mentioned in comments by #Andreas and #Fred Foo, upgrading the sklearn version (> 0.13) will solve the problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

enquiry on uniformly distributed random numbers using python - python-3.x

You can use random.randint: import numpy as np r=np.random.randint(0,100,5000) Then use mean to find the mean of that: >>> np.mean(r) 49.4686 You can also use the array method of mean(): >>> r.mean() 49.4686

you can use this: np.random.randint(1, 100, size=1000).mean()

Related

Size of WebDataset in Pytorch

Scipy and Numpy upgrade generates "TypeError: Cannot cast array data from dtype('O') to dtype('float64')"

Programming a simple Stock prediction service with Alpha Vantage in Python. I get this error

Saving Numpy Structure Array to *.mat file

sklearn LogisticRegression does not accept csr_matrix

Categories

Resources