Error in retainedges[dat$seg] : invalid subscript type 'list' - spatstat

I am conducting linearK function for the observed point pattern on a linear network and I get the following error
Error in retainedges[dat$seg] : invalid subscript type 'list'
I do not understand what it means and how should I correct it.
On the traceback call, I get the following information
> traceback()
4: thinNetwork(x, retainvertices = subi)
3: countends(L, X[-j], D[-j, j], toler = toler)
2: linearKengine(X, r = r, ..., denom = denom, correction = correction,
ratio = ratio)
1: linearK(sl2)
Could someone help me on what this error means and how I can correct it.
Thank you.

Your network is a bit problematic since it is disconnected. It has one very big component with 3755 vertices and 5593 lines and then 5 small components with only 2 or 3 vertices and 1 or 2 lines that are not connected to anything else. In your example you have only two points in this big network (both occurring in the big component as far as I can tell). We might be able to handle this in future versions of spatstat, but for now I suggest you simply discard the small empty components. Then I think linearK works as expected for your example (although I doubt you find interesting information from a pattern of 2 points!).
To identify connected components of a linear network use connected.linnet with argument what = "components" then you get a list of connected components and you can use the big connected component to define a new lpp on a connected linnet. With your example you could do something like (noting that component number 1 is the main component):
comp <- connected(as.linnet(sl2), what = "comp")
sl2new <- lpp(as.ppp(sl2), comp[[1]])

Related

MDAnalysis comparing position of ion in trajectory to previous ts

So I have a system where I need to be able to determine the exact position of my ions and run an equation on the average position of that ion. I found my ion positions were inconsistent due to some ions wrapping across the periodic boundary and severely changing the position for that one window. Leading me to have an average of say +20 when the ion just shuffled between +40 and -40.
I was wanting to correct that by implementing a way to unwrap my wrapped coordinates for ions on the edge of my box.
Essentially I was thinking that for each frame in my trajectory, MDAnalysis would check the position of ION 1 in frame 1. Then in frame 2 it would check the same ion once more and compare it to the previous position. If it for example goes from + coordinates to - coordinates then I would have a count that adds +1 meaning that it wrapped once. If it goes from - to + I would have it subtract 1. Then by the end of all of the frames I would have a number that could help me identify how I could perform my analysis.
However my coding skills are less than lackluster and I wanted to know how I would go about implementing this? I have essentially gotten the count down, but the comparison between frames is where I am confused. How would I do this comparison?
Thanks in advance
There are a few ways to answer this question. Firstly,
Essentially I was thinking that for each frame in my trajectory, MDAnalysis would check the position of ION 1 in frame 1. Then in frame 2 it would check the same ion once more and compare it to the previous position. If it for example goes from + coordinates to - coordinates then I would have a count that adds +1 meaning that it wrapped once. If it goes from - to + I would have it subtract 1. Then by the end of all of the frames I would have a number that could help me identify how I could perform my analysis.
You could write your own analysis class.
One untested way to do it is prototyped below -- the tutorial goes more into what each method (_prepare, _conclude, etc) does.
from MDAnalysis.analysis.base import AnalysisBase
import numpy as np
class CountWrappings(AnalysisBase):
def __init__(self, universe, select="name NA"):
super().__init__(universe.universe.trajectory)
# these are your selected ions
self.atomgroup = universe.select_atoms(select)
self.n_atoms = len(self.atomgroup)
def _prepare(self):
# self.results is a dictionary of results
self.results.wrapping_per_frame = np.zeros((self.n_frames, self.n_atoms), dtype=bool)
self._last_positions = self.atomgroup.positions
def _single_frame(self):
# does sign change for any element in 2D array?
compare_signs = np.sign(self.atomgroup.positions) == np.sign(self._last_positions)
sign_changes_any_axis = np.any(compare_signs, axis=1)
# _frame_index is the relative index of the frame being currently analyzed
self.results.wrapping_per_frame[self._frame_index] = sign_changes_any_axis
self._last_positions = self.atomgroup.positions
def _conclude(self):
self.results.n_wraps = self.results.wrapping_per_frame.sum(axis=0)
n_wraps = CountWrappings(my_universe, select="name NA CL MG")
n_wraps.run()
print(n_wraps.results.wrapping_per_frame)
print(n_wraps.results.n_wraps)
However, I'm not sure that addresses your actual aim:
I was wanting to correct that by implementing a way to unwrap my wrapped coordinates for ions on the edge of my box.
Are you computing the ion positions relative to anything? Potentially you could add bonds between each ion and the center so that you can use the AtomGroup.unwrap() function. Alternatively, is your data compatible with GROMACS? GROMACS has an unwrapping utility called "nojump" that unwraps atoms jumping across box edges, e.g.
gmx trjconv -f my_trajectory.xtc -s my_topology.gro -pbc nojump -o my_unwrapped_trajectory.xtc
As Lily mentioned, you could write your own analysis to do this or use GROMACS. However, both Lily's example and the GROMACS implementation of 'nojump' fail to account for box size fluctuations under the NPT ensemble (assuming you've used NPT). von Bulow et al. wrote about this widespread problem a couple of years ago. As far as I'm aware, the only implementation of nojump unwrapping that accounts for box size fluctuations is in LiPyphilic (disclaimer: I am the author of LiPyphilic).
Using LiPyphilic, you can unwrap your trajectory like so:
import MDAnalysis as mda
from lipyphilic.transformations import nojump
u = mda.Universe(pdb, xtc)
ions = u.select_atoms('name NA CLA')
u.trajectory.add_transformations(
nojump(
ag=ions,
nojump_x=True,
nojump_y=True,
nojump_z=True)
)
Then, when you do further analysis with your MDAnalysis Universe, the atoms will automatically be unwrapped at each frame.

Reducing time and memory used in loop calculation when locating earthquakes

I'm trying to locate tremor, which is a type of earthquake with smaller amplitude. I use grid search, which is a method that finds the coordinate where 'the difference between theoretical value and observed value of differential time in seismic wave arrival' becomes minimum.
The code I made is as follows. First I defined two functions that calculate distance between earthquake source and each point on grid, and that calculate travel time of seismic waves using obspy.
def distance(a,i):
return math.sqrt(((ste[a].stats.sac.stla-la[i])**2)+((ste[a].stats.sac.stlo-lo[i])**2))
def traveltime(a):
return model.get_travel_times(source_depth_in_km=35, distance_in_degree=a, phase_list=["S"], receiver_depth_in_km=0)[0].time
Then I conducted grid search using following codes.
di=[(la[i],lo[i],distance(a,i), distance(b,i)) for i in range(len(lo))
for a in range(len(ste))
for b in range(len(ste)) if a<b]
didf=pd.DataFrame(di)
latot=didf[0]
lotot=didf[1]
dia=didf[2]
dib=didf[3]
tt=[]
for i in range(len(di)):
try:
tt.append((latot[i],lotot[i],traveltime(dia[i])-traveltime(dib[i])))
except IndexError:
continue
ttdf=pd.DataFrame(tt)
final=[(win[j],ttdf[0][i],ttdf[1][i],(ttdf[2][i]-shift[j])**2) for i in range(len(ttdf))
for j in range(len(ccdf))]
where la and lo are the list of latitude and longitude coordinates with 0.01 degree interval, and ste is the list of the east components seismogram of each station. I have to get the list 'final' to proceed to the next step.
However, the problem is that it takes too much time to calculate three segments of codes written above. Moreover, the result I get after tens of hours of calculation is 'out of memory' error message. Is there any solution that can reduce both time and memory?
Without access to your dataset, it's a little difficult to debug, but here are a few suggestions for you.
for i in range(len(di)):
try:
tt.append((latot[i],lotot[i],traveltime(dia[i])-traveltime(dib[i])))
except IndexError:
continue
• Given the size of these lists, I think that the Garbage Collector might be slowing down this for loop; you might consider turning it off for the duration of the loop (gc.disable()).
• In theory, the Append statement shouldn't be the source of your performance problems, since it over-allocates:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
but you already know the size of the array, so you might consider using numpy.zeroes() to fill the list before the for-loop, and use the index to directly address each element. Alternatively, you could just use list comprehensions, as you did earlier, and avoid the problem altogether.
• I see that you've tagged the question with python-3.x, so range() shouldn't be an issue like it was in 2.x (otherwise you would want to consider using xrange()).
If you update your question with more details, I could probably provide a more detailed answer...hope this helps.

ValueError: shapes (5,14) and (16,) not aligned: 14 (dim 1)!= 16 (dim 0)

I am working on housing dataset and when trying to fit the linear regression model getting error as mentioned. Complete code as below.
I am not sure where is code going wrong. I tried pasting the code as it is from the reference book.
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing_prepared, housing_labels)
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.transform(some_data)
print("Predictions:\t", lin_reg.predict(some_data_prepared))
ERROR: ValueError: shapes (5,14) and (16,) not aligned: 14 (dim 1) != 16 (dim 0)
What am I doing wrong here?
Explanation
Hi, I guess you are reading and following the Hands on Machine Learning with Scikit Learn and Tensorflow book. The problem also occurred to me.
In the following part of the code you select from the data set the first 5 instances. One of the attributes in the data set which is called ocean_proximity is an object and for the linear regression model to be able to operate with it, it must be translated to an integer, which in the book is done with a one hot encoding.
One hot encoding works by analyzing all the categories that can be assigned to the attribute, in this case 5 ('<1H OCEAN', 'INLAND', 'NEAR OCEAN', 'NEAR BAY', 'ISLAND'), and then creating a matrix of that length for each instance and zeroing every element of the matrix except the category of that instance which is assigned a 1 (or another value). For example:
If ocean_proximity equals '<1H OCEAN' the conversion would be [1, 0, 0, 0, 0]
In this piece of code you select the five first instances of the data set, but this does not assure you that all the categories in "ocean_proximity" will appear. It could happen that only 3 of them appear or just 1. Therefor if you apply a one hot encoding to those five selected rows and only 3 categories appear (for example just 'INLAND', 'ISLAND' and 'NEAR BAY'), the matrices created by the one hot encoding will be of length 3.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.transform(some_data)
The error is just telling you that, since the one hot conversion of some_data created matrices of a length inferior to 5, the total columns in some_data_prepared is 14, which is less than the columns in housing_prepared (16), thus making the model unable to predict the prices.
If you transform both some_data_prepared and housing_prepared into dataframes and then call .head() you will see the problem.
some_data_prepared.head()
housing_prepared.head()
Solution
To solve the problem you must create the columns missing in some_data_prepared by creating a zeroed numpy array of shape [5,x] (being 5 the number of rows and x the number of columns missing) and concatenating it to some_data_prepared to match the shape of the housing_prepared data set.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.fit_transform(some_data)
dummy_array = np.zeros((5,1))
some_data_prepared = np.c_[some_data_prepared, dummy_array]
predictions = linear_regression.predict(some_data_prepared)
print("Predictions: ", predictions)
print("Labels: ", some_labels.values)
Missing category values (ocean proximity in this case) in some_data compared to housing_prepared is the issue.
housing_prepared.shape gives (16512, 16), but some_data_prepared.shape gives (5,14), so add zeros for the missing columns:
dummy_array = np.zeros((5,2))
some_data_prepared = np.c_[some_data_prepared,dummy_array]
the 2 in np.zeros determines the difference of columns
I've at first encountered the same issue on the considered piece of code. After exploring the issues of the handson-ml repository, I think I have understood the subtlety which is causing the error here.
My guess is that (as in my case), closing the notebook might have caused what was in memory (and the trained model in particular) to be lost. In my case, I could get the result and avoid the error rerunning the notebook from the beginning.
Instead, from a theoretical viewpoint, you should never call fit() or fit_transform() on data which is not training data (eg on some_data). Here, running fit_transform(some_data) and then stacking the dummy array to some_data_prepared works, but it forces the model to be trained again on some_data rather than on housing_prepared, which is not what you want.

Selecting an element of a list inside a list (python)

Im attempting to write a turn based, pokemon-esque, game to test my python skills & learn new things.
I'm having trouble selecting an element from a list inside of another list.
Punch = ["Punch!", 20]
Kick = ["Kick!", 40]
Moves = [Punch, Kick]
Player = ["Jamie", 100, Moves]
print ("Do you want to punch, or kick?")
attack = input(" ")
if attack == "punch":
atk = 0
if attack == "kick":
atk = 1
damage = Player[2[atk[1]]]
print (Player[0]," uses ", Player[2[atk[0]]])
but this results in error:
TypeError: 'int' object is not subscriptable
I understand why this error happens. But I'm wondering is there is another way to call up an element of a list inside of a list.
Thanks
What you want is probably something like this :
damage = Player[2][atk][1]
But beware because you only define atk in if statements so atk could potentially not be defined after those ifs.
Moreover you place either 1 or 2 in atk but you only have two moves which makes me think you want to put either 0 or 1 in it.
Note: You should not capitalise the name of your variables as it would imply they are classes instead of variables
from the way i understand
damage=Player[2][atk][1]
As has already been mentioned . The thing to understand is what is happening when you do this.
player[2] refers to Moves and when you further subscript it, it subscripts to the Moves, so player[2][atk] simply becomes Moves[atk].
A thing to keep in mind is that it is truly just a reference, if you were to change the list Moves, the value in player[2] will also change.
For example:
code:
Moves[1][2]=1000
print(player[2][1][2])
Will give output
1000

Why I get different values everytime I run the function hmmlearn.hmm.GaussianHMM.fit()

I have a program.
n = 6
data=pd.read_csv('11.csv',index_col='datetime')
volume = data['TotalVolumeTraded']
close = data['ClosingPx']
logDel = np.log(np.array(data['HighPx'])) - np.log(np.array(data['LowPx']))
logRet_1 = np.array(np.diff(np.log(close)))
logRet_5 = np.log(np.array(close[5:])) - np.log(np.array(close[:-5]))
logVol_5 = np.log(np.array(volume[5:])) - np.log(np.array(volume[:-5]))
logDel = logDel[5:]
logRet_1 = logRet_1[4:]
close = close[5:]
Date = pd.to_datetime(data.index[5:])
A = np.column_stack([logDel,logRet_5,logVol_5])
model = GaussianHMM(n_components= n, covariance_type="full", n_iter=2000).fit([A])
hidden_states = model.predict(A)
I run the code the first time ,the value of "hidden_states" is as follow,
I run the code the second time ,the value of "hidden_states" is as follow,
Why are two values "hidden_states" different?
I am not completely sure what happens here, but here're two possible explanations for the results you're seeing.
The model does not maintain any ordering over state labels. So state labelled as 1 in one run could end up being 4 in another run. This is known as label switching problem in latent variable models.
GaussianHMM initializes emission parameters via k-means which might converge to different values depending on the data. The initial parameters are passed to the EM-algorithm which is also prone to local maxima. Therefore different runs could result in different parameter estimates and (as a result) slightly different predictions.
Try to control the randomness by setting the seed and the random_state when you define your model. Moreover you could initialize the startprob_ and the transmat_ and see how it behaves.
That way you might have a better explanation about the cause of this behavior.

Resources