how use struct.pack for list of strings - string

I want to write a list of strings to a binary file. Suppose I have a list of strings mylist? Assume the items of the list has a '\t' at the end, except the last one has a '\n' at the end (to help me, recover the data back). Example: ['test\t', 'test1\t', 'test2\t', 'testl\n']
For a numpy ndarray, I found the following script that worked (got it from here numpy to r converter):
binfile = open('myfile.bin','wb')
for i in range(mynpdata.shape[1]):
binfile.write(struct.pack('%id' % mynpdata.shape[0], *mynpdata[:,i]))
binfile.close()
Does binfile.write automatically parses all the data if variable has * in front it (such in the *mynpdata[:,i] example above)? Would this work with a list of integers in the same way (e.g. *myIntList)?
How can I do the same with a list of string?
I tried it on a single string using (which I found somewhere on the net):
oneString = 'test'
oneStringByte = bytes(oneString,'utf-8')
struct.pack('I%ds' % (len(oneString),), len(oneString), oneString)
but I couldn't understand why is the % within 'I%ds' above replaced by (len(oneString),) instead of len(oneString) like the ndarray example AND also why is both len(oneString) and oneString passed?
Can someone help me with writing a list of string (if necessary, assuming it is written to the same binary file where I wrote out the ndarray) ?

There's no need for struct. Simply join the strings and encode them using either a specified or an assumed text encoding in order to turn them into bytes.
''.join(L).encode('utf-8')

Related

Reading a set of HDF5 files and then slicing the resulting datasets without storing them in the end

I think some of my question is answered here:1
But the difference that I have is that I'm wondering if it is possible to do the slicing step without having to re-write the datasets to another file first.
Here is the code that reads in a single HDF5 file that is given as an argument to the script:
with h5py.File(args.H5file, 'r') as df:
print('Here are the keys of the input file\n', df.keys())
#interesting point here: you need the [:] behind each of these and we didn't need it when
#creating datasets not using the 'with' formalism above. Adding that even handled the cases
#in the 'hits' and 'truth_hadrons' where there are additional dimensions...go figure.
jetdset = df['jets'][:]
haddset = df['truth_hadrons'][:]
hitdset = df['hits'][:]
Then later I do some slicing operations on these datasets.
Ideally I'd be able to pass a wild-card into args.H5file and then the whole set of files, all with the same data formats, would end up in the three datasets above.
I do not want to store or make persistent these three datasets at the end of the script as the output are plots that use the information in the slices.
Any help would be appreciated!
There are at least 2 ways to access multiple files:
If all files follow a naming pattern, you can use the glob
module. It uses wildcards to find files. (Note: I prefer
glob.iglob; it is an iterator that yields values without creating a list. glob.glob creates a list which you frequently don't need.)
Alternatively, you could input a list of filenames and loop on
the list.
Example of iglob:
import glob
for fname in glob.iglob('img_data_0?.h5'):
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Example with a list of names:
filenames = [ 'img_data_01.h5', 'img_data_02.h5', 'img_data_03.h5' ]
for fname in filenames:
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Next, your code mentions using [:] when you access a dataset. Whether or not you need to add indices depends on the object you want returned.
If you include [()], it returns the entire dataset as a numpy array. Note [()] is now preferred over [:]. You can use any valid slice notation, e.g., [0,0,:] for a slice of a 3-axis array.
If you don't include [:], it returns a h5py dataset object, which
behaves like a numpy array. (For example, you can get dtype and shape, and slice the data). The advantage? It has a smaller memory footprint. I use h5py dataset objects unless I specifically need an array (for example, passing image data to another package).
Examples of each method:
jets_dset = h5f['jets'] # w/out [()] returns a h5py dataset object
jets_arr = h5f['jets'][()] # with [()] returns a numpy array object
Finally, if you want to create a single array that merges values from 3 datasets, you have to create an array big enough to hold the data, then load with slice notation. Alternatively, you can use np.concatenate() (However, be careful, as concatenating a lot of data can be slow.)
A simple example is shown below. It assumes you know the shape of the dataset, and they are the same for all 3 files. (a0, a1 are the axes lengths for 1 dataset) If you don't know them, you can get them from the .shape attribute
Example for method 1 (pre-allocating array jets3x_arr):
a0, a1 = 100, 100
jets3x_arr = np.empty(shape=(a0, a1, 3)) # add dtype= if not float
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
jets3x_arr[:,:,cnt] = h5f['jets']
Example for method 2 (using np.concatenate()):
a0, a1 = 100, 100
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
if cnt == 0:
jets3x_arr= h5f['jets'][()].reshape(a0,a1,1)
else:
jets3x_arr= np.concatenate(\
(jets3x_arr, h5f['jets'][()].reshape(a0,a1,1)), axis=2)

How to split a String by bodySize in Groovy Script

Before anything else, I hope that this world situation is not affecting you too much and that you can be as long as possible at home and in good health.
You see, I'm very, very new to Groovy Script and I have a question: How can I separate a String based on its body size?
Assuming that the String has a size of 3,000 characters getting the body like
def body = message.getBody (java.lang.String) as String
and its size like
def bodySize = body.getBytes (). Length
I should be able to separate it into 500-character segments and save each segment in a different variable (which I will later set in a property).
I read some examples but I can't adjust them to what I need.
Thank you very much in advance.
Assuming it's ok to have a List of segment strings, you can simply do:
def segments = body.toList().collate(500)*.join()
This splits the body into a list of characters, collates these into 500 length groups, and then joins each group back to a String.
As a small example
def body = 'abcdefghijklmnopqrstuvwxyz'
def segments = body.toList().collate(5)*.join()
Then segments equals
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']

Stringify list back to list in Python 3

I have a list like string which I want to convert to a list, but so far I'm unlucky. The string is like follows:
my_string="[749385,435,'20/07/11 05:32','34035',1298,tmp_host_name,'312642',6577,tmp_guest_name,'-0.5,-1.0','2.5,3.0','9.5 ',tmp_league_name,'2' ,'0','0','0','4',' 2','0','1','0.0,-0.5','4.5','1.0',1]"
My problems are:
I can't use eval because some of the items in the list to be are not strings, so it gives me
eval(my_string)
>NameError: name 'tmp_host_name' is not defined
I can't use ast.literal_eval because again, it gives an error
ast.literal_eval(my_string)
>ValueError: malformed node or string: <_ast.Name object at 0x0000017E7DA9E488>
and I can't do it with strip and split because some of the items are like '2.5,3.0' and this is splitted as well, something I don't want
my_string.strip('][').split(',')
['749385','435',"'20/07/11 05:32'", "'34035'",'1298','tmp_host_name',"'312642'",'6577','tmp_guest_name',"'-0.5","-1.0'","'2.5","3.0'","'9.5','tmp_league_name', "'2' ","'0'","'0'","'0'","'4'","' 2'","'0'","'1'","'0.0","-0.5'","'4.5'","'1.0'",'1']
One possible route is to use my last approach and verify that every element has 2 ' characters, and if not, merge it with the following element, but I'm looking for something a little more pythonic.
newlist=list()
for el in k:
if el.startswith("'") and el.endswith("'"):newlist.append(el)
elif el.startswith("'"):
compound=el
elif el.endswith("'"):
compound+=el
newlist.append(compound)
else:newlist.append(el)
Problem is, if I do this, the resulting list loses its order and becomes useless
Thanks!

How to export "simplices" array from Delaunay triangulation?

I am using the "Delaunay triangulation" module in from "scipy.spatial."
I am able to generate an array (actually an ndarray, since I am using x, y and z coordinates) from the "simplices," but unable to export it into any format I can use for further processing.
The code is straightforward:
tri = Delaunay(points)
a = np.array(points[tri.simplices])
What I get looks like this:
[[7.02192702e+05, 7.53337067e+06, 1.43116411e+02],
[7.02275075e+05, 7.53339801e+06, 1.53508313e+02],
[7.02073353e+05, 7.53340902e+06, 1.40979450e+02],
[7.02288667e+05, 7.53338498e+06, 1.52185457e+02]],
...,
[[7.02038856e+05, 7.53333613e+06, 1.39584833e+02],
[7.02069568e+05, 7.53327029e+06, 1.46902739e+02],
[7.02062213e+05, 7.53331215e+06, 1.31241316e+02],
[7.02040635e+05, 7.53329922e+06, 1.30787203e+02]],...
By playing around with it I can export it into an extended string:
702299.971067+7533414.077516+163.2373+...
But I would prefer to have it in a .csv file with columns, or convert that extended string into a table or array with a set number of columns.
I assume I'm doing something wrong in saving or writing the output, but can't find any obvious solutions to saving/exporting arrays online anywhere.
Any ideas? suggestions?
Once it's in an np.ndarray format, just use np.savetxt() to save the array to a .txt file. (see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html).
This is the simplest method I know of.

Proper Syntax for List Comprehension Involving an Integer and a Float?

I have a List of Lists that looks like this (Python3):
myLOL = ["['1466279297', '703.0']", "['1466279287', '702.0']", "['1466279278', '702.0']", "['1466279268', '706.0']", "['1466279258', '713.0']"]
I'm trying to use a list comprehension to convert the first item of each inner list to an int and the second item to a float so that I end up with this:
newLOL = [[1466279297, 703.0], [1466279287, 702.0], [1466279278, 702.0], [1466279268, 706.0], [1466279258, 713.0]]
I'm learning list comprehensions, can somebody please help me with this syntax?
Thank you!
[edit - to explain why I asked this question]
This question is a means to an end - the syntax requested is needed for testing. I'm collecting sensor data on a ZigBee network, and I'm using an Arduino to format the sensor messages in JSON. These messages are published to an MQTT broker (Mosquitto) running on a Raspberry Pi. A Redis server (also running on the Pi) serves as an in-memory message store. I'm writing a service (python-MQTT client) to parse the JSON and send a LoL (a sample of the data you see in my question) to Redis. Finally, I have a dashboard running on Apache on the Pi. The dashboard utilizes Highcharts to plot the sensor data dynamically (via a web socket connection between the MQTT broker and the browser). Upon loading the page, I pull historical chart data from my Redis LoL to "very quickly" populate the charts on my dashboard (before any realtime data is added dynamically). I realize I can probably format the sensor data the way I want in the Redis store, but that is a problem I haven't worked out yet. Right now, I'm trying to get my historical data to plot correctly in Highcharts. With the data properly formatted, I can get this piece working.
Well, you could use ast.literal_eval:
from ast import literal_eval
myLOL = ["['1466279297', '703.0']", "['1466279287', '702.0']", "['1466279278', '702.0']", "['1466279268', '706.0']", "['1466279258', '713.0']"]
items = [[int(literal_eval(i)[0]), float(literal_eval(i)[1])] for i in myLOL]
Try:
import json
newLOL = [[int(a[0]), float(a[1])] for a in (json.loads(s.replace("'", '"')) for s in myLOL)]
Here I'm considering each element of the list as a JSON, but since it's using ' instead of " for the strings, I have to replace it first (it only works because you said there will be only numbers).
This may work? I wish I was more clever.
newLOL = []
for listObj in myLOL:
listObj = listObj.replace('[', '').replace(']', '').replace("'", '').split(',')
newListObj = [int(listObj[0]), float(listObj[1])]
newLOL.append(newListObj)
Iterates through your current list, peels the string apart into a list by replace un-wanted string chracters and utilizing a split on the comma. Then we take the modified list object and create another new list object with the values being the respective ints and floats. We then append the prepared newListObj to the newLOL list. Considering you want an actual set of lists within your list. Your previously documented input list actually contains strings, which look like lists.
This is a very strange format and the best solution is likely to change the code which generates that.
That being said, you can use ast.literal_eval to safely evaluate the elements of the list as Python tokens:
>>> lit = ast.literal_eval
>>> [[lit(str_val) for str_val in lit(str_list)] for str_list in myLOL]
[[1466279297, 703.0], [1466279287, 702.0], [1466279278, 702.0], [1466279268, 706.0], [1466279258, 713.0]]
We need to do it twice - once to turn the string into a list containing two strings, and then once per resulting string to convert it into a number.
Note that this will succeed even if the strings contain other valid tokens. If you want to validate the format too, you'd want to do something like:
>>> def process_str_list(str_list):
... l = ast.literal_eval(str_list)
... if not isinstance(l, list):
... raise TypeError("Expected list")
... str_int, str_float = l
... return [int(str_int), float(str_float)]
...
>>> [process_str_list(str_list) for str_list in myLOL]
[[1466279297, 703.0], [1466279287, 702.0], [1466279278, 702.0], [1466279268, 706.0], [1466279258, 713.0]]
Your input consists of a list of strings, where each string is the string representation of a list. The first task is to convert the strings back into lists:
import ast
lol2 = map(ast.literal_eval, mylol) # [['1466279297', '703.0'], ...]
Now, you can simply get int and float values from lol2:
newlol = [[int(a[0]), float(a[1])] for a in lol2]

Resources