return the total number of neighbors by using pysal's weights object - pysal

I construct a weights object:
import pysal as ps
neighbors = {0: [3, 1], 1: [0, 2, 2], 2: [1, 2], 3: [0, 1, 1]}
weights = {0: [1, 1], 1: [1, 1, 1], 2: [1, 1], 3: [1, 1, 1]}
w = ps.W(neighbors, weights)
Weights object in pysal has a neighbors attribute like the following:
w.neighbors
It will returns a dict: {0: [3, 1], 1: [0, 2, 2], 2: [1, 2], 3: [0, 1, 1]}.
I've checked pysal's api and find lots of methods and attribute to return something about the number of neighbors but not the total number of all neighbors.
For the above w, I want it to return something like: {0: 2, 1: 3, 2: 2, 3: 3}. Instead of looping over the dict like:
n_nbrs = dict()
for key, value in w.neighbors.items():
n_nbrs[key] = len(value)
Is there any easy way to achieve this?

You can use w.cardinalities. It will return exactly what are you looking for - {0: 2, 1: 3, 2: 2, 3: 3}.
PySAL is currently changing its structure, so the weights module is now part of libpysal package and its documentation is explaining it, unlike the one you are referring to.

Related

Deleting an item from a list which is a value in a given dictionary

myDict={0:[1,2,3,4],1:[0,2],2:[0,1],3:[0,4],4:[0,3]}
Hi, being new to the concept of dictionaries I am unable to figure out how to delete an item from the list of any key value pair. Lets say I am on myDict[0], my concern is how do I delete lets say the values 1 and 2 of 0:[1,2,3,4] list. Thank you!
myDict = {0: [1, 2, 3, 4], 1: [0, 2], 2: [0, 1], 3: [0, 4], 4: [0, 3]}
myDict[0] = [3, 4] # Whatever here
'''
You can treat myDict[0] as a normal list
'''
myDict[0].remove(3) # Same as list.remove(), because myDict[0] is just a list
print(myDict[0])
print(myDict[0][0])# Printing the 0th value in the list, which is myDict[0]
myDict = {0: [1, 2, 3, 4], 1: [0, 2], 2: [0, 1], 3: [0, 4], 4: [0, 3]}
myDict[0].remove(myDict[0][0])
myDict[0].remove(myDict[0][1])
print(myDict)

For a given graph, construct shortest path tree- Python

Problem scale- I am taking OSM Road network of a city (6000 nodes and 50000 edges.)
Input - The graph is read as a netwrokx Digraph. (weighted)
For a given node r, I want to construct shortest path tree. Is there a standard Networkx function or library which can do so ? If not, How can I do this efficiently ? ( as opposed to running Dijkstra for all r-v pair)
Input in any form is highly valued!
This function returns the shortest path, from any node to every node reachable
Here's an example of its output:
For the following (very simple) graph:
G = nx.path_graph(5)
the single_source_shortest_path function:
path_1 = nx.single_source_shortest_path(G, 1)
returns:
{1: [1], 0: [1, 0], 2: [1, 2], 3: [1, 2, 3], 4: [1, 2, 3, 4]}
Where the shortest path from node 1 to any target T is returned by path_1[T].
There is also a networkX built-in solution to run Dijkstra for every node pair(docs):
shortest_paths = dict(nx.all_pairs_dijkstra_path(G))
shortest_paths[1]
# returns {1: [1], 0: [1, 0], 2: [1, 2], 3: [1, 2, 3], 4: [1, 2, 3, 4]}

Using numba to randomly sample possible combinations of categories

I am trying to speed up a function that randomly samples a number of records with the possible combinations of a number of categories for a number of records and ensures they are unique (i.e. let's assume there's 3 records, any of them can be either 0 or 1 and I want 10 random samples of unique possible combinations of records).
If I did not use numba, I might would do something like this:
import numpy as np
def myfunc(categories, NumberOfRecords, maxsamples):
return np.unique( np.random.choice(np.arange(categories), size=(maxsamples*10, NumberOfRecords), replace=True), axis=0 )[0:maxsamples]
Annoyingly, numba does not support axis in np.unique, so I can do something like this, but some of the records may turn out to be non-unique.
from numba import njit, int64
import numpy as np
#njit(int64[:,:](int64, int64, int64), cache=True)
def myfunc(categories, NumberOfRecords, maxsamples):
return np.random.choice(np.arange(categories), size=(maxsamples, NumberOfRecords), replace=True)
myfunc(categories=2, NumberOfRecords=3, maxsamples=10)
E.g. in one call (obviously there's some randomness here), I got the below (for which the indices 1 and 6, and 3 and 4, and 7 and 9 are identical rows):
array([[0, 1, 1],
[1, 1, 0],
[0, 1, 0],
[1, 0, 1],
[1, 0, 1],
[1, 1, 1],
[1, 1, 0],
[1, 0, 0],
[0, 0, 0],
[1, 0, 0]])
My questions are:
Is this something where I would even expect a speed up from numba?
If so, how can I get a unique rows (this seems rather difficult with numba, but presumably there's a way)?
Perhaps there's a way to get at this more efficiently (perhaps without creating more random samples than I need in the end)?
In the following, I don't use numba, but all the operations use vectorized numpy functions.
Each row of the result that you generate can be interpreted as an integer expressed in base N, where N is the number of categories. With that interpretation, what you want is to sample without replacement from the integers [0, 1, ... N**R-1], where R is the number of "records". You can use the choice function for that, with the argument replace=False. Once you have that, you need to convert the chosen integers to base N. For that, I use the function int2base, which is a pared down version of a function that I wrote in a different answer.
Here's the code:
import numpy as np
def int2base(x, base, ndigits):
# x = np.asarray(x) # Uncomment this line for general purpose use.
powers = base ** np.arange(ndigits)
digits = (x.reshape(x.shape + (1,)) // powers) % base
return digits
def makesample(ncategories, nrecords, nsamples, rng=None):
if rng is None:
rng = np.random.default_rng()
n = ncategories ** nrecords
choices = rng.choice(n, replace=False, size=nsamples)
return int2base(choices, ncategories, nrecords)
In makesample, I included the optional argument rng. It allows you to specify the object that holds the choice function. If not provided, it uses np.random.default_rng().
Example:
In [118]: makesample(2, 3, 6)
Out[118]:
array([[0, 1, 1],
[0, 0, 1],
[1, 0, 1],
[0, 0, 0],
[1, 1, 0],
[1, 1, 1]])
In [119]: makesample(5, 4, 12)
Out[119]:
array([[3, 4, 0, 1],
[2, 0, 2, 0],
[4, 2, 4, 3],
[0, 1, 0, 4],
[0, 2, 0, 1],
[1, 2, 0, 1],
[0, 3, 0, 4],
[3, 3, 0, 3],
[3, 4, 1, 4],
[2, 4, 1, 1],
[3, 4, 1, 0],
[1, 1, 4, 4]])
makesample will raise an exception if you ask for too many samples:
In [120]: makesample(2, 3, 10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-120-80044e78a60a> in <module>
----> 1 makesample(2, 3, 10)
~/code_snippets/python/numpy/random_samples_for_so_question.py in makesample(ncategories, nrecords, nsamples, rng)
17 rng = np.random.default_rng()
18 n = ncategories ** nrecords
---> 19 choices = rng.choice(n, replace=False, size=nsamples)
20 return int2base(choices, ncategories, nrecords)
_generator.pyx in numpy.random._generator.Generator.choice()
ValueError: Cannot take a larger sample than population when 'replace=False'

Delete all values from a dictionary that occur more than once

I use a dictionary that looks somewhat like this:
data = {1: [3, 5], 2: [1, 2], 3: [1, 2, 3, 4], 4: [1, 2, 3], 5: [1, 2, 3]}
I want to delete values and their corresponding keys in that dictionary that are having exactly the same value. So my dictionary should look like this:
data = {1: [3, 5], 2: [1, 2], 3: [1, 2, 3, 4]}
I've tried to use this right here: Removing Duplicates From Dictionary
But although I tried changing it, it gets quite complicated really fast and there is probably an easier way to do this. I've also tried using count() function, but it did not work. Here is what it looks like. Maybe I declared it the wrong way?
no_duplicates = [value for value in data.values() if data.count(value) == 1]
Is there an easy way the remove all key-value-pairs that are not unique with respect to their values?
You can do this with a dictionary comprehension, where you make a dictionary with the key value pairs where the value count is 1
def get_unique_dict(data):
#Get the list of dictionary values
values = list(data.values())
#Make a new dictionary with key-value pairs where value occurs exactly once
return {key: value for key, value in data.items() if values.count(value) == 1}
data = {1: [3, 5], 2: [1, 2], 3: [1, 2, 3, 4], 4: [1, 2, 3], 5: [1, 2, 3]}
print(get_unique_dict(data))
The output will be
{
1: [3, 5],
2: [1, 2],
3: [1, 2, 3, 4]
}

Address Flattened Dictionary of objects which contain x and y { (x,y) coordinates } values as a matrix

I have created a dictionary of an image with objects that have the correct x and y coordinates. However this dictionary is linear. Let's say for a 465x1000 image it goes upto keys of 464999 (starting index 0).
So the current access let's say at key 197376 is accessed like this
for keys, values in Grids.items():
print(keys)
print(values)
And output is
197376
x: 394
y: 376
The first values is the key
and x : [value] and y: [value] is the string representation of objects.
How can I address this flattened dictionary of objects (which have these x and y coordinates) as a matrix?
Is there some way to convert it to a format so that it can be addressed as a normal list of lists
Grids[394][376]
which gives the object at the specified coordinate.
Any other logic/ variations are also welcomed which achieve this goal.
You can reverse-engenier the index from your pixelcoords. Siplificated:
width = 10 # max in x
height= 5 # max in y
# create linear dict with demodata list of the coords, your value would be your data object
dic = { a+b*width:[a,b] for a in range(width) for b in range(height)}
# control
print(dic)
# calculate the linear indes into dict by a given tuple of coords and the dict with data
def getValue(tup,d):
idx = tup[0]+tup[1]*width
return d[idx]
print(getValue((8,2),dic)) # something in between
print(getValue((0,0),dic)) # 0,0 based coords.
print(getValue((10-1,5-1),dic)) # highest coord for my example
Output:
{0: [0, 0], 1: [1, 0], 2: [2, 0], 3: [3, 0], 4: [4, 0], 5: [5, 0],
6: [6, 0], 7: [7, 0], 8: [8, 0], 9: [9, 0], 10: [0, 1], 11: [1, 1],
12: [2, 1], 13: [3, 1], 14: [4, 1], 15: [5, 1], 16: [6, 1], 17: [7, 1],
18: [8, 1], 19: [9, 1], 20: [0, 2], 21: [1, 2], 22: [2, 2], 23: [3, 2],
24: [4, 2], 25: [5, 2], 26: [6, 2], 27: [7, 2], 28: [8, 2], 29: [9, 2],
30: [0, 3], 31: [1, 3], 32: [2, 3], 33: [3, 3], 34: [4, 3], 35: [5, 3],
36: [6, 3], 37: [7, 3], 38: [8, 3], 39: [9, 3], 40: [0, 4], 41: [1, 4],
42: [2, 4], 43: [3, 4], 44: [4, 4], 45: [5, 4], 46: [6, 4], 47: [7, 4],
48: [8, 4], 49: [9, 4]}
[8, 2]
[0, 0]
[9, 4]
Using a pandas or numpy array in the first place would probably be smarter though :o) I do not have much experience with those. Calculating the linear index is peanuts computation wise so you can capsule that away as well.

Resources