How to export "simplices" array from Delaunay triangulation? - python-3.x

I am using the "Delaunay triangulation" module in from "scipy.spatial."
I am able to generate an array (actually an ndarray, since I am using x, y and z coordinates) from the "simplices," but unable to export it into any format I can use for further processing.
The code is straightforward:
tri = Delaunay(points)
a = np.array(points[tri.simplices])
What I get looks like this:
[[7.02192702e+05, 7.53337067e+06, 1.43116411e+02],
[7.02275075e+05, 7.53339801e+06, 1.53508313e+02],
[7.02073353e+05, 7.53340902e+06, 1.40979450e+02],
[7.02288667e+05, 7.53338498e+06, 1.52185457e+02]],
...,
[[7.02038856e+05, 7.53333613e+06, 1.39584833e+02],
[7.02069568e+05, 7.53327029e+06, 1.46902739e+02],
[7.02062213e+05, 7.53331215e+06, 1.31241316e+02],
[7.02040635e+05, 7.53329922e+06, 1.30787203e+02]],...
By playing around with it I can export it into an extended string:
702299.971067+7533414.077516+163.2373+...
But I would prefer to have it in a .csv file with columns, or convert that extended string into a table or array with a set number of columns.
I assume I'm doing something wrong in saving or writing the output, but can't find any obvious solutions to saving/exporting arrays online anywhere.
Any ideas? suggestions?

Once it's in an np.ndarray format, just use np.savetxt() to save the array to a .txt file. (see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html).
This is the simplest method I know of.

Related

Reading a set of HDF5 files and then slicing the resulting datasets without storing them in the end

I think some of my question is answered here:1
But the difference that I have is that I'm wondering if it is possible to do the slicing step without having to re-write the datasets to another file first.
Here is the code that reads in a single HDF5 file that is given as an argument to the script:
with h5py.File(args.H5file, 'r') as df:
print('Here are the keys of the input file\n', df.keys())
#interesting point here: you need the [:] behind each of these and we didn't need it when
#creating datasets not using the 'with' formalism above. Adding that even handled the cases
#in the 'hits' and 'truth_hadrons' where there are additional dimensions...go figure.
jetdset = df['jets'][:]
haddset = df['truth_hadrons'][:]
hitdset = df['hits'][:]
Then later I do some slicing operations on these datasets.
Ideally I'd be able to pass a wild-card into args.H5file and then the whole set of files, all with the same data formats, would end up in the three datasets above.
I do not want to store or make persistent these three datasets at the end of the script as the output are plots that use the information in the slices.
Any help would be appreciated!
There are at least 2 ways to access multiple files:
If all files follow a naming pattern, you can use the glob
module. It uses wildcards to find files. (Note: I prefer
glob.iglob; it is an iterator that yields values without creating a list. glob.glob creates a list which you frequently don't need.)
Alternatively, you could input a list of filenames and loop on
the list.
Example of iglob:
import glob
for fname in glob.iglob('img_data_0?.h5'):
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Example with a list of names:
filenames = [ 'img_data_01.h5', 'img_data_02.h5', 'img_data_03.h5' ]
for fname in filenames:
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Next, your code mentions using [:] when you access a dataset. Whether or not you need to add indices depends on the object you want returned.
If you include [()], it returns the entire dataset as a numpy array. Note [()] is now preferred over [:]. You can use any valid slice notation, e.g., [0,0,:] for a slice of a 3-axis array.
If you don't include [:], it returns a h5py dataset object, which
behaves like a numpy array. (For example, you can get dtype and shape, and slice the data). The advantage? It has a smaller memory footprint. I use h5py dataset objects unless I specifically need an array (for example, passing image data to another package).
Examples of each method:
jets_dset = h5f['jets'] # w/out [()] returns a h5py dataset object
jets_arr = h5f['jets'][()] # with [()] returns a numpy array object
Finally, if you want to create a single array that merges values from 3 datasets, you have to create an array big enough to hold the data, then load with slice notation. Alternatively, you can use np.concatenate() (However, be careful, as concatenating a lot of data can be slow.)
A simple example is shown below. It assumes you know the shape of the dataset, and they are the same for all 3 files. (a0, a1 are the axes lengths for 1 dataset) If you don't know them, you can get them from the .shape attribute
Example for method 1 (pre-allocating array jets3x_arr):
a0, a1 = 100, 100
jets3x_arr = np.empty(shape=(a0, a1, 3)) # add dtype= if not float
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
jets3x_arr[:,:,cnt] = h5f['jets']
Example for method 2 (using np.concatenate()):
a0, a1 = 100, 100
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
if cnt == 0:
jets3x_arr= h5f['jets'][()].reshape(a0,a1,1)
else:
jets3x_arr= np.concatenate(\
(jets3x_arr, h5f['jets'][()].reshape(a0,a1,1)), axis=2)

fast pixels processing using np.isin and np.where [issue]

I am new with python and image processing. I am trying to find a list of given RGB values in an input image using np.isin and np.where ( i wanted to avoid nested looping over all the pixels ).
So, here is the input and output
Input: https://imgur.com/eNylzA9
Output: https://imgur.com/lDctkj9
I am using the following code -
fliterlist = [
[244,240,255],
[253,239,255],
[255,234,249],
[255,230,245],
[255,229,243],
[255,228,242]
]
# actual list has more than 100 elements
def imageTest(img,count=0):
outImg = np.zeros(img.shape,dtype=np.uint8)
posArray=(np.isin(img,bb)).all(axis=2)
outImg[np.where(posArray)] = [255,255,255]
outname = './fast/imageTest_'+str(count)+'.jpg'
outputlist[outname]=outImg
return
For some reason, I am not getting the output as expected. I mean if I use a double nested loop for iterating over all the pixel, I get the desired result. But here it looks like np.isin is giving me a different outarray.
Please help me identify the issue.
Here is the example idea which works perfectly -
image - https://imgur.com/zP3zuLj
Use the inrange function of opencv, as follows:
mask = cv2.inRange(image, lower_range, upper_range)
I understand you want to find a particular pixel, so adjust range accordingly.
In my experience, whatever task you will try to accomplish its better to use a range of pixel, so that will be beneficial. But later is up to you.

pandas reading data from column in as float or int and not str despite dtype setting

i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in
-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10
The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.
My reader is setup as follows.
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)
despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following
pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})
I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?
I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)
the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.

Proper Syntax for List Comprehension Involving an Integer and a Float?

I have a List of Lists that looks like this (Python3):
myLOL = ["['1466279297', '703.0']", "['1466279287', '702.0']", "['1466279278', '702.0']", "['1466279268', '706.0']", "['1466279258', '713.0']"]
I'm trying to use a list comprehension to convert the first item of each inner list to an int and the second item to a float so that I end up with this:
newLOL = [[1466279297, 703.0], [1466279287, 702.0], [1466279278, 702.0], [1466279268, 706.0], [1466279258, 713.0]]
I'm learning list comprehensions, can somebody please help me with this syntax?
Thank you!
[edit - to explain why I asked this question]
This question is a means to an end - the syntax requested is needed for testing. I'm collecting sensor data on a ZigBee network, and I'm using an Arduino to format the sensor messages in JSON. These messages are published to an MQTT broker (Mosquitto) running on a Raspberry Pi. A Redis server (also running on the Pi) serves as an in-memory message store. I'm writing a service (python-MQTT client) to parse the JSON and send a LoL (a sample of the data you see in my question) to Redis. Finally, I have a dashboard running on Apache on the Pi. The dashboard utilizes Highcharts to plot the sensor data dynamically (via a web socket connection between the MQTT broker and the browser). Upon loading the page, I pull historical chart data from my Redis LoL to "very quickly" populate the charts on my dashboard (before any realtime data is added dynamically). I realize I can probably format the sensor data the way I want in the Redis store, but that is a problem I haven't worked out yet. Right now, I'm trying to get my historical data to plot correctly in Highcharts. With the data properly formatted, I can get this piece working.
Well, you could use ast.literal_eval:
from ast import literal_eval
myLOL = ["['1466279297', '703.0']", "['1466279287', '702.0']", "['1466279278', '702.0']", "['1466279268', '706.0']", "['1466279258', '713.0']"]
items = [[int(literal_eval(i)[0]), float(literal_eval(i)[1])] for i in myLOL]
Try:
import json
newLOL = [[int(a[0]), float(a[1])] for a in (json.loads(s.replace("'", '"')) for s in myLOL)]
Here I'm considering each element of the list as a JSON, but since it's using ' instead of " for the strings, I have to replace it first (it only works because you said there will be only numbers).
This may work? I wish I was more clever.
newLOL = []
for listObj in myLOL:
listObj = listObj.replace('[', '').replace(']', '').replace("'", '').split(',')
newListObj = [int(listObj[0]), float(listObj[1])]
newLOL.append(newListObj)
Iterates through your current list, peels the string apart into a list by replace un-wanted string chracters and utilizing a split on the comma. Then we take the modified list object and create another new list object with the values being the respective ints and floats. We then append the prepared newListObj to the newLOL list. Considering you want an actual set of lists within your list. Your previously documented input list actually contains strings, which look like lists.
This is a very strange format and the best solution is likely to change the code which generates that.
That being said, you can use ast.literal_eval to safely evaluate the elements of the list as Python tokens:
>>> lit = ast.literal_eval
>>> [[lit(str_val) for str_val in lit(str_list)] for str_list in myLOL]
[[1466279297, 703.0], [1466279287, 702.0], [1466279278, 702.0], [1466279268, 706.0], [1466279258, 713.0]]
We need to do it twice - once to turn the string into a list containing two strings, and then once per resulting string to convert it into a number.
Note that this will succeed even if the strings contain other valid tokens. If you want to validate the format too, you'd want to do something like:
>>> def process_str_list(str_list):
... l = ast.literal_eval(str_list)
... if not isinstance(l, list):
... raise TypeError("Expected list")
... str_int, str_float = l
... return [int(str_int), float(str_float)]
...
>>> [process_str_list(str_list) for str_list in myLOL]
[[1466279297, 703.0], [1466279287, 702.0], [1466279278, 702.0], [1466279268, 706.0], [1466279258, 713.0]]
Your input consists of a list of strings, where each string is the string representation of a list. The first task is to convert the strings back into lists:
import ast
lol2 = map(ast.literal_eval, mylol) # [['1466279297', '703.0'], ...]
Now, you can simply get int and float values from lol2:
newlol = [[int(a[0]), float(a[1])] for a in lol2]

how use struct.pack for list of strings

I want to write a list of strings to a binary file. Suppose I have a list of strings mylist? Assume the items of the list has a '\t' at the end, except the last one has a '\n' at the end (to help me, recover the data back). Example: ['test\t', 'test1\t', 'test2\t', 'testl\n']
For a numpy ndarray, I found the following script that worked (got it from here numpy to r converter):
binfile = open('myfile.bin','wb')
for i in range(mynpdata.shape[1]):
binfile.write(struct.pack('%id' % mynpdata.shape[0], *mynpdata[:,i]))
binfile.close()
Does binfile.write automatically parses all the data if variable has * in front it (such in the *mynpdata[:,i] example above)? Would this work with a list of integers in the same way (e.g. *myIntList)?
How can I do the same with a list of string?
I tried it on a single string using (which I found somewhere on the net):
oneString = 'test'
oneStringByte = bytes(oneString,'utf-8')
struct.pack('I%ds' % (len(oneString),), len(oneString), oneString)
but I couldn't understand why is the % within 'I%ds' above replaced by (len(oneString),) instead of len(oneString) like the ndarray example AND also why is both len(oneString) and oneString passed?
Can someone help me with writing a list of string (if necessary, assuming it is written to the same binary file where I wrote out the ndarray) ?
There's no need for struct. Simply join the strings and encode them using either a specified or an assumed text encoding in order to turn them into bytes.
''.join(L).encode('utf-8')

Resources