Numpy shape not including classes - python-3.x

I want to create a dataset(data.npy) that has the same format as cifar10. The sample omniglot dataset contains
data = [0001_01.png,0001_02.png,0001_03.png,0001_04.png,0002_01.png,0002_02.png,0002_03.png]
class_name,filename = data[0].split('_')
Each file is append with class.There are 1600 classes and each class has 20 samples. The expected dataset(data.npy) shape is (1600,20,784)
But the shape i got is (1,20,784). Below given is the snippet
classes = []
examples = []
prev = files[0].split('_')[0]
for f in files:
cur_id = f.split('_')[0]
cur_pic = misc.imresize(misc.imread('new_data/' + f),[28,28])
cur_pic = (np.float32(cur_pic)/255).flatten()
if prev == cur_id:
examples.append(cur_pic)
else:
classes.append(examples)
examples = [cur_pic]
prev = cur_id
np.save('data',np.asarray(classes))
Any suggestions would be greatly helpful. The above code is taken from https://github.com/zergylord/oneshot

Related

how to assign value to tensor based on tensor index

import torch
couplings_lookup = torch.tensor(range(1,901)).reshape(6,6,5,5)
fccl = torch.tensor([1,2,3,4,5,0])
xx = torch.tensor([3,2,2,2,1,0])
coupling = []
for idx, pos in enumerate(fccl):
ccoupling_mask = torch.zeros(couplings_lookup.shape)
ccoupling_mask[fccl[:idx],pos,xx[:idx],:] = 1
ccoupling = torch.sum(couplings_lookup * ccoupling_mask, axis=(0,1,2))
coupling.append(ccoupling)
print(ccoupling)
As the sample shows above, how could I convert the “for” loop into tensor operation in pytorch or einsum making the program run faster ? Or how to think about problem like this ?

perform principal component analysis on moving window of data

This was partly answered by #WhoIsJack but not completely solved given the errors I get. Basically, I'm trying to perform principal component analysis on a rolling window of data. For example, I'd run PCA on the last 200 days in the df, move forward 1 day and do PCA again on the last 200 days. So as you move forward each day, you'd include the next day's measurement and exclude the last measurement.
You have a random df:
data = np.random.random(size=(1000,10))
df = pd.DataFrame(data)
Here's window size:
window = 200
Initialize an empty df of appropriate size for the output
df_pca = pd.DataFrame( np.zeros((data.shape[0] - window + 1, data.shape[1])) )
Define PCA fit-transform function. Instead of attempting to return the result, it is written into the previously created output array.
def rolling_pca(window_data):
pca = PCA()
transf = pca.fit_transform(df.iloc[window_data])
df_pca.iloc[int(window_data[0])] = transf[0,:]
return True
Create a df containing row indices for the workaround
df_idx = pd.DataFrame(np.arange(df.shape[0]))
Use rolling to apply the PCA function
_ = df_idx.rolling(window).apply(rolling_pca)
The results should be contained here:
print(df_pca)
However, when I generate the results only the first row of data looks to contain PCAs while the rest of the rows are zero.
I also tried the following function:
def rolling_pca(x, window):
r = x.rolling(window=window)
pca = PCA(3)
y = pca.fit(r)
z = pca.fit_transform(y)
return z
window = 200
Which I thought would generate a new df with rolling PCAs:
data = df.apply(rolling_pca, window=window)
But I got the following error: setting an array element with a sequence.
I've also tried manually calculating with below. I get: "unsupported operand type(s) for /: 'Rolling' and 'int'"
def rolling_pca(x, window):
# create rolling dataframe
r = x.rolling(window=window)
# demand data
X = np.matrix(r)
X_dm = X - np.mean(X, axis = 0)
#Eigenvalue decomposition (of covariance matrix)
Cov_X = np.cov(X_dm, rowvar = False)
eigen = np.linalg.eig(Cov_X)
eig_values_X = np.matrix(eigen[0])
eig_vectors_X = np.matrix(eigen[1])
#transformed data
Y_dm = X_dm * eig_vectors_X
#assign transformed yields
yields_trans = Y_dm.copy()
# get PCs
pc1_yields = x.copy()
pcas = yields_trans[:,0:3]
return pcas
#assign window length
window = 300
rolling_pca(data, window=window)
And tried below. Get error: "LinAlgError: 0-dimensional array given. Array must be at least two-dimensional"
def pca(x):
# demand data
X = np.matrix(x.values)
X_dm = X - np.mean(X, axis = 0)
#Eigenvalue decomposition (of covariance matrix)
Cov_X = np.cov(X_dm, rowvar = False)
eigen = np.linalg.eig(Cov_X)
eig_values_X = np.matrix(eigen[0])
eig_vectors_X = np.matrix(eigen[1])
#transformed data
Y_dm = X_dm * eig_vectors_X
#assign transformed yields
yields_trans = Y_dm.copy()
# get 3 PCs
pcas = yields_trans[:,0:3]
final_pcas = pd.DataFrame(pcas)
return final_pcas
data.rolling(200).apply(pca)
Any thoughts would be appreciated!

Finding conditional mutual information from 3 discrete variable

I am trying to find conditional mutual information between three discrete random variable using pyitlib package for python with the help of the formula:
I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z)
The expected Conditional Mutual information value is= 0.011
My 1st code:
import numpy as np
from pyitlib import discrete_random_variable as drv
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.entropy_conditional(X,Z)
##print(a)
b=drv.entropy_conditional(Y,Z)
##print(b)
c=drv.entropy_conditional(X,Y,Z)
##print(c)
p=a+b-c
print(p)
The answer i am getting here is=0.4632245116328402
My 2nd code:
import numpy as np
from pyitlib import discrete_random_variable as drv
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.information_mutual_conditional(X,Y,Z)
print(a)
The answer i am getting here is=0.1583445441575102
While the expected result is=0.011
Can anybody help? I am in big trouble right now. Any kind of help will be appreciable.
Thanks in advance.
I think that the library function entropy_conditional(x,y,z) has some errors. I also test my samples, the same problem happens.
however, the function entropy_conditional with two variables is ok.
So I code my entropy_conditional(x,y,z) as entropy(x,y,z), the results is correct.
the code may be not beautiful.
def gen_dict(x):
dict_z = {}
for key in x:
dict_z[key] = dict_z.get(key, 0) + 1
return dict_z
def entropy(x,y,z):
x = np.array([x,y,z]).T
x = x[x[:,-1].argsort()] # sorted by the last column
w = x[:,-3]
y = x[:,-2]
z = x[:,-1]
# dict_w = gen_dict(w)
# dict_y = gen_dict(y)
dict_z = gen_dict(z)
list_z = [dict_z[i] for i in set(z)]
p_z = np.array(list_z)/sum(list_z)
pos = 0
ent = 0
for i in range(len(list_z)):
w = x[pos:pos+list_z[i],-3]
y = x[pos:pos+list_z[i],-2]
z = x[pos:pos+list_z[i],-1]
pos += list_z[i]
list_wy = np.zeros((len(set(w)),len(set(y))), dtype = float , order ="C")
list_w = list(set(w))
list_y = list(set(y))
for j in range(len(w)):
pos_w = list_w.index(w[j])
pos_y = list_y.index(y[j])
list_wy[pos_w,pos_y] += 1
#print(pos_w)
#print(pos_y)
list_p = list_wy.flatten()
list_p = np.array([k for k in list_p if k>0]/sum(list_p))
ent_t = 0
for j in list_p:
ent_t += -j * math.log2(j)
#print(ent_t)
ent += p_z[i]* ent_t
return ent
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.entropy_conditional(X,Z)
##print(a)
b=drv.entropy_conditional(Y,Z)
c = entropy(X, Y, Z)
p=a+b-c
print(p)
0.15834454415751043
Based on the definitions of conditional entropy, calculating in bits (i.e. base 2) I obtain H(X|Z)=0.784159, H(Y|Z)=0.325011, H(X,Y|Z) = 0.950826. Based on the definition of conditional mutual information you provide above, I obtain I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z)= 0.158344. Noting that pyitlib uses base 2 by default, drv.information_mutual_conditional(X,Y,Z) appears to be computing the correct result.
Note that your use of drv.entropy_conditional(X,Y,Z) in your first example to compute conditional entropy is incorrect, you can however use drv.entropy_conditional(XY,Z), where XY is a 1D array representing the joint observations about X and Y, for example XY = [2*xy[0] + xy[1] for xy in zip(X,Y)].

'Word2Vec' object has no attribute 'index2word'

I'm getting this error "AttributeError: 'Word2Vec' object has no attribute 'index2word'" in following code in python. Anyone knows how can I solve it?
Acctually "tfidf_weighted_averaged_word_vectorizer" throws the error. "obli.csv" contains line of sentences.
Thank you.
from feature_extractors import tfidf_weighted_averaged_word_vectorizer
dataset = get_data2()
corpus, labels = dataset.data, dataset.target
corpus, labels = remove_empty_docs(corpus, labels)
# print('Actual class label:', dataset.target_names[labels[10]])
train_corpus, test_corpus, train_labels, test_labels = prepare_datasets(corpus,
labels,
test_data_proportion=0.3)
tfidf_vectorizer, tfidf_train_features = tfidf_extractor(train_corpus)
vocab = tfidf_vectorizer.vocabulary_
tfidf_wv_train_features = tfidf_weighted_averaged_word_vectorizer(corpus=tokenized_train,
tfidf_vectors=tfidf_train_features,
tfidf_vocabulary=vocab,
model=model,
num_features=100)
def get_data2():
obli = pd.read_csv('db/obli.csv').values.ravel().tolist()
cl0 = [0 for x in range(len(obli))]
nonObli = pd.read_csv('db/nonObli.csv').values.ravel().tolist()
cl1 = [1 for x in range(len(nonObli))]
all = obli + nonObli
db = Db(all,cl0 + cl1)
db.data = all
db.target = cl0 + cl1
return db
This is code from chapter 4 of Text Analytics for Python by Dipanjan Sarkar.
index2word in gensim has been moved since that text was published.
Instead of model.index2word you should use model.wv.index2word.

MullionType errors in Revit API/Dynamo script

I’m working on a Python script that takes a set of input lines and assigns a mullion to the corresponding gridline that they intersect. However, I’m getting a strange error:
that I don’t know how to correct towards the end of the script. Python is telling me that it expected a MullionType and got a Family Type (see image). I’m using a modified version of Spring Nodes’ Collector.WallTypes that collects Mullion Types instead but the output of the node is a Family Type, which the script won’t accept. Any idea how to get the Mullion Type to feed into the final Python node?
SpringNodes script:
#Copyright(c) 2016, Dimitar Venkov
# #5devene, dimitar.ven#gmail.com
import clr
clr.AddReference("RevitServices")
import RevitServices
from RevitServices.Persistence import DocumentManager
doc = DocumentManager.Instance.CurrentDBDocument
clr.AddReference("RevitAPI")
from Autodesk.Revit.DB import *
clr.AddReference("RevitNodes")
import Revit
clr.ImportExtensions(Revit.Elements)
def tolist(obj1):
if hasattr(obj1,"__iter__"): return obj1
else: return [obj1]
fn = tolist(IN[0])
fn = [str(n) for n in fn]
result, similar, names = [], [], []
fec = FilteredElementCollector(doc).OfClass(MullionType)
for i in fec:
n1 = Element.Name.__get__(i)
names.append(n1)
if any(fn1 == n1 for fn1 in fn):
result.append(i.ToDSType(True))
elif any(fn1.lower() in n1.lower() for fn1 in fn):
similar.append(i.ToDSType(True))
if len(result) > 0:
OUT = result,similar
if len(result) == 0 and len(similar) > 0:
OUT = "No exact match found. Check partial below:",similar
if len(result) == 0 and len(similar) == 0:
OUT = "No match found! Check names below:", names
The SpringNodes script outputs a Family Type, even though the collector is for Mullion Types (see above image)
Here's my script:
import clr
# Import RevitAPI
clr.AddReference("RevitAPI")
import Autodesk
from Autodesk.Revit.DB import *
# Import DocumentManager and TransactionManager
clr.AddReference("RevitServices")
import RevitServices
from RevitServices.Persistence import DocumentManager
from RevitServices.Transactions import TransactionManager
# Import ToDSType(bool) extension method
clr.AddReference("RevitNodes")
import Revit
clr.ImportExtensions(Revit.GeometryConversion)
from System import Array
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
import math
doc = DocumentManager.Instance.CurrentDBDocument
app = DocumentManager.Instance.CurrentUIApplication.Application
walls = UnwrapElement(IN[0])
toggle = IN[1]
inputLine = IN[2]
mullionType = IN[3]
wallSrf = []
heights = []
finalPoints = []
directions = []
isPrimary = []
projectedCrvs = []
keySegments = []
keySegmentsGeom = []
gridSegments = []
gridSegmentsGeom = []
gridLines = []
gridLinesGeom = []
keyGridLines = []
keyGridLinesGeom = []
projectedGridlines = []
lineDirections = []
gridLineDirection = []
allTrueFalse = []
if toggle == True:
TransactionManager.Instance.EnsureInTransaction(doc)
for w, g in zip(walls,inputLine):
pointCoords = []
primary = []
## Get curtain wall element sketch line
originLine = Revit.GeometryConversion.RevitToProtoCurve.ToProtoType( w.Location.Curve, True )
originLineLength = w.Location.Curve.ApproximateLength
## Get curtain wall element height, loft to create surface
for p in w.Parameters:
if p.Definition.Name == 'Unconnected Height':
height = p.AsDouble()
topLine = originLine.Translate(0,0,height)
srfCurves = [originLine,topLine]
wallSrf = NurbsSurface.ByLoft(srfCurves)
## Get centerpoint of curve, determine whether it extends across entire gridline
projectedCrvCenterpoint = []
for d in g:
lineDirection = d.Direction.Normalized()
lineDirections.append(lineDirection)
curveProject= d.PullOntoSurface(wallSrf)
if abs(lineDirection.Z) == 1:
if curveProject.Length >= height-.5:
primary.append(False)
else:
primary.append(True)
else:
if curveProject.Length >= originLineLength-.5:
primary.append(False)
else:
primary.append(True)
centerPoint = curveProject.PointAtParameter(0.5)
pointList = []
projectedCrvCenterpoint.append(centerPoint)
## Project centerpoint of curve onto wall surface
for h in [centerPoint]:
pointUnwrap = UnwrapElement(centerPoint)
pointList.append(pointUnwrap.X)
pointList.append(pointUnwrap.Y)
pointList.append(pointUnwrap.Z)
pointCoords.append(pointList)
finalPoints.append(pointCoords)
isPrimary.append(primary)
projectedCrvs.append(projectedCrvCenterpoint)
TransactionManager.Instance.TransactionTaskDone()
TransactionManager.Instance.EnsureInTransaction(doc)
##Gather all segments of gridline geometry
for wall in UnwrapElement(walls):
gridSegments2 = []
gridSegmentsGeom2 = []
gridLines1 = []
gridLinesGeom1 = []
for id1 in wall.CurtainGrid.GetVGridLineIds():
gridLinesGeom1.append(Revit.GeometryConversion.RevitToProtoCurve.ToProtoType(doc.GetElement(id1).FullCurve))
gridLines1.append(doc.GetElement(id1))
VgridSegments1 = []
VgridSegmentsGeom1 = []
for i in doc.GetElement(id1).AllSegmentCurves:
VgridSegments1.append(i)
VgridSegmentsGeom1.append(Revit.GeometryConversion.RevitToProtoCurve.ToProtoType(i,True))
gridSegments2.append(VgridSegments1)
gridSegmentsGeom2.append(VgridSegmentsGeom1)
for id2 in wall.CurtainGrid.GetUGridLineIds():
gridLinesGeom1.append(Revit.GeometryConversion.RevitToProtoCurve.ToProtoType(doc.GetElement(id2).FullCurve))
gridLines1.append(doc.GetElement(id2))
UgridSegments1 = []
UgridSegmentsGeom1 = []
for i in doc.GetElement(id2).AllSegmentCurves:
UgridSegments1.append(i)
UgridSegmentsGeom1.append(Revit.GeometryConversion.RevitToProtoCurve.ToProtoType(i,True))
gridSegments2.append(UgridSegments1)
gridSegmentsGeom2.append(UgridSegmentsGeom1)
gridSegments.append(gridSegments2)
gridSegmentsGeom.append(gridSegmentsGeom2)
gridLines.append(gridLines1)
gridLinesGeom.append(gridLinesGeom1)
boolFilter = [[[[b.DoesIntersect(x) for x in d] for d in z] for b in a] for a,z in zip(projectedCrvs, gridSegmentsGeom)]
boolFilter2 = [[[b.DoesIntersect(x) for x in z] for b in a] for a,z in zip(projectedCrvs, gridLinesGeom)]
##Select gridline segments that intersect with centerpoint of projected lines
for x,y in zip(boolFilter,gridSegments):
keySegments2 = []
keySegmentsGeom2 = []
for z in x:
keySegments1 = []
keySegmentsGeom1 = []
for g,l in zip(z,y):
for d,m in zip(g,l):
if d == True:
keySegments1.append(m)
keySegmentsGeom1.append(Revit.GeometryConversion.RevitToProtoCurve.ToProtoType(m,True))
keySegments2.append(keySegments1)
keySegmentsGeom2.append(keySegmentsGeom1)
keySegments.append(keySegments2)
keySegmentsGeom.append(keySegmentsGeom2)
##Order gridlines according to intersection with projected points
for x,y in zip(boolFilter2, gridLines):
keyGridLines1 = []
keyGridLinesGeom1 = []
for z in x:
for g,l in zip(z,y):
if g == True:
keyGridLines1.append(l)
keyGridLinesGeom1.append(Revit.GeometryConversion.RevitToProtoCurve.ToProtoType(l.FullCurve,True))
keyGridLines.append(keyGridLines1)
keyGridLinesGeom.append(keyGridLinesGeom1)
##Add mullions at intersected gridline segments
TransactionManager.Instance.TransactionTaskDone()
TransactionManager.Instance.EnsureInTransaction(doc)
for x,y,z in zip(keyGridLines,keySegments,isPrimary):
projectedGridlines1 = []
for h,j,k in zip(x,y,z):
for i in j:
if i != None:
h.AddMullions(i,mullionType,k)
projectedGridlines1.append(h)
projectedGridlines.append(projectedGridlines1)
else:
None
if toggle == True:
OUT = projectedGridlines
else:
None
TransactionManager.Instance.TransactionTaskDone()
Apologies for the messiness of the code, it's a modification of another node that I've been working on. Thanks for your help.
Bo,
Your problem is rooted in how Dynamo is wrapping elements to use with its own model. That last call .ToDSType(True) is the gist of the issue. MullionType class is a subclass (it inherits properties) from a ElementType class in Revit. When Dynamo team wraps that object into a custom wrapper they only wrote a top level wrapper that treats all ElementTypes the same, hence this outputs an ElementType/FamilyType rather than a specific MullionType.
First I would suggest that you replace the line of code in your code:
mullionType = IN[3]
with:
mullionType = UnwrapElement(IN[3])
This is their built in method for unwrapping elements to be used with calls to Revit API.
If that still somehow remains an issue, you could try and retrieve the MullionType object again, this time directly in your script, before you use it. You can do so like this:
for x,y,z in zip(keyGridLines,keySegments,isPrimary):
projectedGridlines1 = []
for h,j,k in zip(x,y,z):
for i in j:
if i != None:
h.AddMullions(i,doc.GetElement(mullionType.Id),k)
projectedGridlines1.append(h)
projectedGridlines.append(projectedGridlines1)
This should make sure that you get the MullionType element before it was wrapped.
Again, try unwrapping it first, then GetElement() call if first doesn't work.

Resources