I attempt to construct a tensor from a generator as follows:
>>> torch.tensor(i**2 for i in range(10))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Could not infer dtype of generator
Currently I just do:
>>> torch.tensor([i**2 for i in range(10)])
tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
Is there a way to avoid needing this intermediate list?
As #blue-phoenox already points out, it is preferred to use the built-in PyTorch functions to create the tensor directly. But if you have to deal with generator, it can be advisable to use numpy as a intermediate stage. Since PyTorch avoid to copy the numpy array, it should be quite performat (compared to the simple list comprehension)
>>> import torch
>>> import numpy as np
>>> torch.from_numpy(np.fromiter((i**2 for i in range(10)), int))
tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
I don't see why you want to use a generator. The list doesn't really make a difference here.
The question is: Do you want to create your data in Python first and move it then to PyTorch (slower in most cases) or do you want to create it directly in PyTorch.
(A generator would always create the data in Python first)
So if you want to load data the story is different, but if you want to generate data I see no reason why you shouldn't do so in PyTorch directly.
If you want to directly create your list in PyTorch for your example you can do so using arange and pow:
torch.arange(10).pow(2)
Output:
tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
torch.arange(10) works the same way like range in python, so it's exactly as versatile range. Then pow(2) just takes your tensor to the 2nd power.
But you can also do all other sorts of computation instead of pow once you created your tensor using arange.
Related
I want to invert list order without changing the values.
The original list is the following:
[15, 15, 10, 8, 73, 1]
While the resulting expecting list is:
[10, 8, 15, 15, 1, 73]
The example has been taken from a real data handling problem from a more complex pandas data frame.
I proposed a list problem only to simplify the issue. So, it can also be a pandas function.
zlist = int(len(list)/2)
for i in range(0, zlist):
a, b = list.index(sorted(list, reverse=True)[i]), list.index(sorted(list,reverse=False)[i])
list[b], list[a] = list[a], list[b]
I am running the code in this notebook.
https://colab.research.google.com/github/zaidalyafeai/Notebooks/blob/master/Deep_GCN_Spam.ipynb#scrollTo=UjoTbUQVnCz8
I get an error when I change the data set to my own data set. I know this might be an error of my code. Then I cleared all the code to generate data sets. I saved the two data sets as file and reload it. I really cannot see the difference between these two data sets.
The shape and type of these two data sets is provided below. I can provide any information that is needed. Can anyone help me fix this?
This is my data set
data = torch.load("dataset.pt")
data
>>>Data(edge_attr=[3585, 1], edge_index=[2, 3585], x=[352, 1], y=[352])
data.x.dtype, data.y.dtype, data.edge_attr.dtype, data.edge_index.dtype
>>>(torch.float32, torch.int64, torch.float32, torch.int64)
data.edge_index.T.numpy().shape
>>>(3585, 2)
np.unique(data.edge_index.T.numpy(), axis=0).shape
>>>(3585, 2)
np.unique(data.edge_index.T.numpy(), axis=0).shape
>>>(3585, 2)
data.edge_index.unique().shape
>>>torch.Size([352])
data.edge_index
>>>tensor([[ 13, 13, 13, ..., 103, 103, 103],
[ 1, 2, 3, ..., 6, 9, 10]])
This is the data set mentioned in the notebook
data2 = torch.load("spam.pt")
data2
>>>Data(edge_attr=[50344, 1], edge_index=[2, 50344], x=[1000, 1], y=[1000])
data2.x.dtype, data2.y.dtype, data2.edge_attr.dtype, data2.edge_index.dtype
>>>(torch.float32, torch.int64, torch.float32, torch.int64)
data2.edge_index
>>>tensor([[ 0, 1, 1, ..., 999, 999, 999],
[455, 173, 681, ..., 377, 934, 953]])
Python version: 3.8
PyTorch geometric version: 1.6.2
CUDA version: 10.2
System: Windows 10
Screeshot
In my case, I need to make sure that my edge_attributes where in the range of [0,1].
See here
From this thread I found out that I can use an approach with the random.choices for my needs:
class Weights:
ITEM = {
'a': 0.5,
'b': 0.4,
'c': 0.3,
'd': 0.2,
'e': 0.1
}
import random
slot_1 = random.choices(population=list(Weights.ITEM.keys()), weights=list(Weights.ITEM.values()), k=1)[0]
slot_2 = ...?
slot_3 = ...?
Is it possible for me to get an array with the k=3 that will have "unique" results (probably [a,b,c]) or somehow to exclude any previously selected value from the next call (with k=1)?
For example lets say slot_1 got "b" and slot_2 will get a random from the list of everything else without the "b" value.
This step can be sensitive to the performance and I think that creating new arrays each time is not a good idea.
Maybe there is something except random.choices that can be applied in this case.
You could take all the samples all at once using numpy's random.choice with the replace = False option (assuming the weights are just renormalized between steps,) and store them using multiple assignment, to get it into one line of code.
import numpy as np
slot_1, slot_2, slot_3 = np.random.choice(list(Weights.ITEM.keys()), size = 3, replace=False, p=list(Weights.ITEM.values()))
More generally, you could have a function that generated arbitrary length subsamples (k is length, n is number of samples):
def a(n,k,values,weights):
a = np.split(np.random.choice(values, size = n*k,replace=False, p=weights), n)
return [list(sublist) for sublist in a]
>>> a(3,5, range(100), [.01]*100)
[[39, 34, 27, 91, 88], [19, 98, 62, 55, 38], [37, 22, 54, 11, 84]]
I have two arrays, reference array and target array. Each array has day of year (DOY) information and I am trying to find the difference in actual number of days between the two. Here is the code,
import numpy as np
array_ref = np.array([[362,284],[89,360]])
array_ref
array([[362, 284],
[ 89, 360]])
array_n = np.array([[2, 365], [194, 10]])
array_n
array([[ 2, 365],
[194, 10]])
The absolute difference gives this,
print(abs(array_ref-array_n))
[[360 81]
[105 350]]
However, I am trying to achieve this,
[[5, 81]
[105, 15]]
I am not sure if I have to use any datetime or timedelta function or if there is a more simpler way to achieve this. Thanks for your help.
With remainder division.
(array_n-array_ref)%365
array([[ 5, 81],
[105, 15]], dtype=int32)
In general, you may want to check which subtraction is closer:
np.minimum((array_ref-array_n)%365, (array_n-array_ref)%365)
array([[ 5, 81],
[105, 15]], dtype=int32)
Though this will clearly fail to take leap years into account.
(Python 3.5.1)
I've been trying to use Sympy for some Project Euler problems, but I've come across something weird about how set(sympy.primerange(a, b)) and similar constructions work.
>>> import sympy
>>> PR = sympy.primerange(1, 20)
>>> set(PR)
{2, 3, 5, 7, 11, 13, 17, 19}
So far, so good. But:
>>> import sympy
>>> PR = sympy.primerange(1, 20)
>>> set(PR)
{2, 3, 5, 7, 11, 13, 17, 19}
>>> set(PR)
set()
Calling simply PR gives me <generator object primerange at 0x039C1720> after calling list(PR) once or twice. The same thing happens with for p in PR: print(p) and list(PR).
Why does this not work:
>>> import sympy, itertools
>>> sympy.sieve.extend(100)
>>> set(itertools.takewhile(lambda p: p<20, sympy.sieve))
set()
>>> sympy.sieve
<Sieve with 25 primes sieved: 2, 3, 5, ... 89, 97>
Why don't we get the set {2, 3, 5, 7, 11, 13, 17, 19}?
The first phenomenon is has do to with generators. sympy.primerange returns a generator, not a list. Generators let you iterate over their elements once, producing them on demand. The call to set() iterates over every element in the generator PR, consuming it.
itertools.takewhile requires an iterable for its second argument. sympy.sieve is not an iterable. It allows you to to request arbitrary primes by index, and maintains a dynamic internal sieve. Because sympy.sieve isn't an iterable, takewhile can't extract any elements from it. That's why you don't get your expected results.
Kudos to you for doing Project Euler.