Running same function for different arguments in a loop in python - python-3.x

I have large 3D same-sized array of data like density, temperature, pressure, entropy, … . I want to run a same function (like divergence()) for each of these arrays. The easy way is as follows:
div_density = divergence(density)
div_temperature = divergence(temperature)
div_pressure = divergence(pressure)
div_entropy = divergence(entropy)
Considering the fact that I have several arrays (about 100), I'd like to use a loop as follows:
var_list = ['density', 'temperature', 'pressure', 'entropy']
div = np.zeros((len(var_list)))
for counter, variable in enumerate(var_list):
div[Counter] = divergence(STV(variable))
I'm looking for a function like STV() which simply changes "string" to the "variable name". Is there a function like that in python? If yes, what is that function (by using such function, data should not be removed from the variable)?
These 3D arrays are large and because of the RAM limitation cannot be saved in another list like:
main_data=[density, temperature, pressure, entropy]
So I cannot have a loop on main_data.

One workaround is to use exec as follows
var_list = ['density', 'temperature', 'pressure', 'entropy']
div = np.zeros((len(var_list)))
for counter, variable in enumerate(var_list):
s = "div[counter] = divergence("+variable+")"
exec(s)
exec basically executes the string given as the argument in the python interpreter.

how about using a dictionary? that links the variable content to names.
Instead of using variable names density = ... use dict entries data['density'] for the data:
data = {}
# load ur variables like:
data['density'] = ...
divs = {}
for key, val in data.items():
divs[key] = divergence(val)
Since the data you use is large and the operations you try to do are computational expensive I would have a look at some of the libraries that provide methods to handle such data structures. Some of them also use c/c++ bindings for the expensive calculations (such as numpy does). Just to name some: numpy, pandas, xarray, iris (especially for earth data)

Related

Reading a set of HDF5 files and then slicing the resulting datasets without storing them in the end

I think some of my question is answered here:1
But the difference that I have is that I'm wondering if it is possible to do the slicing step without having to re-write the datasets to another file first.
Here is the code that reads in a single HDF5 file that is given as an argument to the script:
with h5py.File(args.H5file, 'r') as df:
print('Here are the keys of the input file\n', df.keys())
#interesting point here: you need the [:] behind each of these and we didn't need it when
#creating datasets not using the 'with' formalism above. Adding that even handled the cases
#in the 'hits' and 'truth_hadrons' where there are additional dimensions...go figure.
jetdset = df['jets'][:]
haddset = df['truth_hadrons'][:]
hitdset = df['hits'][:]
Then later I do some slicing operations on these datasets.
Ideally I'd be able to pass a wild-card into args.H5file and then the whole set of files, all with the same data formats, would end up in the three datasets above.
I do not want to store or make persistent these three datasets at the end of the script as the output are plots that use the information in the slices.
Any help would be appreciated!
There are at least 2 ways to access multiple files:
If all files follow a naming pattern, you can use the glob
module. It uses wildcards to find files. (Note: I prefer
glob.iglob; it is an iterator that yields values without creating a list. glob.glob creates a list which you frequently don't need.)
Alternatively, you could input a list of filenames and loop on
the list.
Example of iglob:
import glob
for fname in glob.iglob('img_data_0?.h5'):
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Example with a list of names:
filenames = [ 'img_data_01.h5', 'img_data_02.h5', 'img_data_03.h5' ]
for fname in filenames:
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Next, your code mentions using [:] when you access a dataset. Whether or not you need to add indices depends on the object you want returned.
If you include [()], it returns the entire dataset as a numpy array. Note [()] is now preferred over [:]. You can use any valid slice notation, e.g., [0,0,:] for a slice of a 3-axis array.
If you don't include [:], it returns a h5py dataset object, which
behaves like a numpy array. (For example, you can get dtype and shape, and slice the data). The advantage? It has a smaller memory footprint. I use h5py dataset objects unless I specifically need an array (for example, passing image data to another package).
Examples of each method:
jets_dset = h5f['jets'] # w/out [()] returns a h5py dataset object
jets_arr = h5f['jets'][()] # with [()] returns a numpy array object
Finally, if you want to create a single array that merges values from 3 datasets, you have to create an array big enough to hold the data, then load with slice notation. Alternatively, you can use np.concatenate() (However, be careful, as concatenating a lot of data can be slow.)
A simple example is shown below. It assumes you know the shape of the dataset, and they are the same for all 3 files. (a0, a1 are the axes lengths for 1 dataset) If you don't know them, you can get them from the .shape attribute
Example for method 1 (pre-allocating array jets3x_arr):
a0, a1 = 100, 100
jets3x_arr = np.empty(shape=(a0, a1, 3)) # add dtype= if not float
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
jets3x_arr[:,:,cnt] = h5f['jets']
Example for method 2 (using np.concatenate()):
a0, a1 = 100, 100
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
if cnt == 0:
jets3x_arr= h5f['jets'][()].reshape(a0,a1,1)
else:
jets3x_arr= np.concatenate(\
(jets3x_arr, h5f['jets'][()].reshape(a0,a1,1)), axis=2)

Python equivalent of array of structs from MATLAB

I am familiar with the struct construct from MATLAB, specifically array of structs. I am trying to do that with dictionary in Python. Say I have a initialized a dictionary:
samples = {"Name":"", "Group":"", "Timeseries":[],"GeneratedFeature":[]}
and I am provided with another dictionary called fileList whose keys are group names and each value is a tuples of file-paths. Each file path will generate one sample in samples by populating the Timeseries item. Further some processing will make GeneratedFeature. The name part will be determined by the filepath.
Since I don't know the contents of fileList a priori, in MATLAB if samples were a struct and fileList just a cell array:
fileList={{'Group A',{'filepath1','filepath2'}};{'Group B',{'filepath1', 'filepath2'}}}
I would just set a counter k=1 and run a for loop (with a different index) and do something like:
k=1;
for i=1:numel(fileList)
samples(k).Group=fileList{i}{1};
for j=1:numel(fileList{i}{2})
samples(k).Name=makeNameFrom(fileList{1}{2}{j})
.
.
end
k=k+1
end
But I don't know how to do this in python. I know I can keep the two for loop approach with
for (group, samples) in fileList:
for sample in samples:
But how to tell python that samples is allowed to be an array/list? Is there a more pythonic approach than doing for loop?
You could store your dictionary itself in a list and simply append new dictionaries in every iteration of the loop:
samplelist = []
samplelist.append(samples.copy()) % dictionary copy needed when duplicating
Accessing the elements in the list would then work as follows (For example the 'Name' field of the i-th sample):
samples_i_name = samplelist[i]["Name"]
A list of all names would be accessible by a simple list comprehension:
namelist = [samplelist[i]["Name"] for i in range(len(samplelist))]

rpy2 access R named list items by name, low-level interface

How do I access elements of a named list by name?
I have 3 functions, all of which return a ListSexpVector of class htest. One of them has 5 elements, ['method', 'parameter', 'statistic', 'p.value', 'data.name'], others have a different number, and order. I am interested in extracting the p.value, statistic and parameter from this list. In R I can use $, like so:
p.value <- fit$p.value
statistic <- fit$statistic
param <- fit$parameter
The best equivalent I found in rpy2 goes like:
p_val = fit[list(fit.do_slot('names')).index('p.value')]
stat = fit[list(fit.do_slot('names')).index('statistic')]
param = fit[list(fit.do_slot('names')).index('parameter')]
Which is quite long-winded. Is there a better (shorter, sweeter, Pythonic) way?
There is the good-old-fashioned integer based indexing:
p_val = fit[3]
stat = fit[2]
param = fit[1]
But it doesn't work when the positions are changed, and therefore is a serious limitation because I am fitting 3 different functions, and each return a different order.
The high-level interface is meant to provide a friendlier interface as the low-level interface is quite close to R's C-API. With it one can do:
p_val = fit.rx2('p.value')
or
p_val = fit[fit.names.index('p.value')]
If working with the low-level interface, you will essentially have to implement your own convenience wrapper to reproduce these functionalities. For example:
def dollar(obj, name):
"""R's "$"."""
return obj[fit.do_slot('names').index(name)]

Abaqus Python script -- Reading 'TENSOR_3D_FULL' data from *.odb file

What I want: strain values LE11, LE22, LE12 at nodal points
My script is:
#!/usr/local/bin/python
# coding: latin-1
# making the ODB commands available to the script
from odbAccess import*
import sys
import csv
odbPath = "my *.odb path"
odb = openOdb(path=odbPath)
assembly = odb.rootAssembly
# count the number of frames
NumofFrames = 0
for v in odb.steps["Step-1"].frames:
NumofFrames = NumofFrames + 1
# create a variable that refers to the reference (undeformed) frame
refFrame = odb.steps["Step-1"].frames[0]
# create a variable that refers to the node set ‘Region Of Interest (ROI)’
ROINodeSet = odb.rootAssembly.nodeSets["ROI"]
# create a variable that refers to the reference coordinate ‘REFCOORD’
refCoordinates = refFrame.fieldOutputs["COORD"]
# create a variable that refers to the coordinates of the node
# set in the test frame of the step
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= NODAL)
# count the number of nodes
NumofNodes =0
for v in ROIrefCoords.values:
NumofNodes = NumofNodes +1
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the current frame
currFrame = odb.steps["Step-1"].frames[i1+1]
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the strain 'LE'
Str = currFrame.fieldOutputs["LE"]
ROIStr = Str.getSubset(region=ROINodeSet, position= NODAL)
# initialize list
list = [[]]
# loop over all the nodes in each frame
for i2 in range(NumofNodes):
strain = ROIStr.values [i2]
list.insert(i2,[str(strain.dataDouble[0])+";"+str(strain.dataDouble[1])+\
";"+str(strain.dataDouble[3]))
# write the list in a new *.csv file (code not included for brevity)
odb.close()
The error I get is:
strain = ROIStr.values [i2]
IndexError: Sequence index out of range
Additional info:
Details for ROIStr:
ROIStr.name
'LE'
ROIStr.type
TENSOR_3D_FULL
OIStr.description
'Logarithmic strain components'
ROIStr.componentLabels
('LE11', 'LE22', 'LE33', 'LE12', 'LE13', 'LE23')
ROIStr.getattribute
'getattribute of openOdb(r'path to .odb').steps['Step-1'].frames[1].fieldOutputs['LE'].getSubset(position=INTEGRATION_POINT, region=openOdb(r'path to.odb').rootAssembly.nodeSets['ROI'])'
When I use the same code for VECTOR objects, like 'U' for nodal displacement or 'COORD' for nodal coordinates, everything works without a problem.
The error happens in the first loop. So, it is not the case where it cycles several loops before the error happens.
Question: Does anyone know what is causing the error in the above code?
Here the reason you get an IndexError. Strains are (obviously) calculated at the integration points; according to the ABQ Scripting Reference Guide:
A SymbolicConstant specifying the position of the output in the element. Possible values are:
NODAL, specifying the values calculated at the nodes.
INTEGRATION_POINT, specifying the values calculated at the integration points.
ELEMENT_NODAL, specifying the values obtained by extrapolating results calculated at the integration points.
CENTROID, specifying the value at the centroid obtained by extrapolating results calculated at the integration points.
In order to use your code, therefore, you should get the results using position= ELEMENT_NODAL
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= ELEMENT_NODAL)
With
ROIStr.values[0].data
You will then get an array containing the 6 independent components of your tensor.
Alternative Solution
For reading time series of results for a nodeset, you can use the function xyPlot.xyDataListFromField(). I noticed that this function is much faster than using odbread. The code also is shorter, the only drawback is that you have to get an abaqus license for using it (in contrast to odbread which works with abaqus python which only needs an installed version of abaqus and does not need to get a network license).
For your application, you should do something like:
from abaqus import *
from abaqusConstants import *
from abaqusExceptions import *
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
results = session.openOdb(your_file + '.odb')
# without this, you won't be able to extract the results
session.viewports['Viewport: 1'].setValues(displayedObject=results)
xyList = xyPlot.xyDataListFromField(odb=results, outputPosition=NODAL, variable=((
'LE', INTEGRATION_POINT, ((COMPONENT, 'LE11'), (COMPONENT, 'LE22'), (
COMPONENT, 'LE33'), (COMPONENT, 'LE12'), )), ), nodeSets=(
'ROI', ))
(Of course you have to add LE13 etc.)
You will get a list of xyData
type(xyList[0])
<type 'xyData'>
Containing the desired data for each node and each output. It size will therefore be
len(xyList)
number_of_nodes*number_of_requested_outputs
Where the first number_of_nodes elements of the list are the LE11 at each nodes, then LE22 and so on.
You can then transform this in a NumPy array:
LE11_1 = np.array(xyList[0])
would be LE11 at the first node, with dimensions:
LE.shape
(NumberTimeFrames, 2)
That is, for each time step you have time and output variable.
NumPy arrays are also very easy to write on text files (check out numpy.savetxt).

How can I convert a string into a function in Python?

I have to deal with csv image data from a camera which exports the data with a header. In that header is a simple function for converting CCD counts into power density. This equation includes both the dark offset level as well as a calibration factor. Here is an example from one line of an image file:
Power Density,=,(n - 232) * 4.182e-005 W/cm^2
Notice the commas. The csv header can be expected to have the same structure each time with different constants for dark level (232) and power density conversion (4.182e-005).
What I would like to be able to do is grab the last cell, strip off the units at the end (W/cm^2), and use what is left to define a function in Python. Something like
f = lambda n: '(n - 232) * 4.182e-005'
Is it possible to do so? If so, how?
eval and exec, which use compile, are both ways to dynamically convert code as text to a compiled function. If you dynamically create a new function, you only need to do the conversion once.
row = "Power Density,=,(n - 232) * 4.182e-005 W/cm^2".split(',')
expr = row[2].replace( ' W/cm^2', '')
# f = eval("lambda n:" + expr) # based on your original idea
exec("def f(n): return " + expr) # more flexible
print(f(0))
# -0.00970224
The lambda eval and def exec have the same result, other than f.name, but as usual, the def form is more flexible, even if the flexibility is not needed here.
The usual caveats about executing untrusted code apply. If you are working with photo files not your own and were worried about an adversary feeding you a poisoned file, then indeed you might want to tokenize expr and check that is only has the tokens expected.
I found a way to do it using eval, but I expect that it isn't very pythonic so I would still be interested in seeing other answers.
Here row is the row of interest from a csv.reader object, i.e. the same string I posted in the question divided at the commas.
# Strip the units from the string
strng = row[2].replace( ' W/cm^2', '')
# Define a function based on the string
def f( n):
return eval( strng)
# Evaluate a value
print( f( 0))
# Returns: -0.00970224

Resources