I have a structure in Octave (v5.1.0), which looks like this:
>> cal_data
cal_data =
scalar structure containing the fields:
OG_0100 =
0.045260 -62.422000 0.044310 -60.768600
0.045000 -61.576600 0.044620 -61.303400
OG_0101 =
0.044950 -61.316900 0.044110 -59.609500
0.045150 -62.235500 0.044380 -61.260800
OG_0102 =
0.045160 -61.609900 0.044550 -61.759800
0.044950 -61.725800 0.044480 -61.062300
etc... with fields named incrementally up to OG_0280. Each field is identical and consists of a 2x4 array of doubles.
I would like to create a histogram for each of the 8 values across all the fields in the struct and am getting stuck.
I have tried the following, to no avail:
>> hist([cal_data])
error: hist: Y must be real-valued
error: called from
hist at line 90 column 5
Because each of your struct elements is the same size, it would be more efficient and flexible to store them as a single 3D array. This code converts your current structure to such an array:
cal_data.OG_0100 = [
0.045260 -62.422000 0.044310 -60.768600
0.045000 -61.576600 0.044620 -61.303400];
cal_data.OG_0101 = [
0.044950 -61.316900 0.044110 -59.609500
0.045150 -62.235500 0.044380 -61.260800];
cal_data.OG_0102 = [
0.045160 -61.609900 0.044550 -61.759800
0.044950 -61.725800 0.044480 -61.062300];
data = struct2cell(cal_data);
data = reshape(data,1,1,[]);
data = cell2mat(data);
data(:,:,1) is the same as cal_data.OG_0100. Etc.
You can then make a histogram for each of the values by first reshaping to a 2D matrix, where each column is a value, and each row is an observation (this requires a transpose). However, because of the very different ranges of each value, the single histogram might not be ideal. It might be better to simply draw 8 separate histograms.
data = reshape(data,[],size(data,3)).';
hist(data)
Related
I have large 3D same-sized array of data like density, temperature, pressure, entropy, … . I want to run a same function (like divergence()) for each of these arrays. The easy way is as follows:
div_density = divergence(density)
div_temperature = divergence(temperature)
div_pressure = divergence(pressure)
div_entropy = divergence(entropy)
Considering the fact that I have several arrays (about 100), I'd like to use a loop as follows:
var_list = ['density', 'temperature', 'pressure', 'entropy']
div = np.zeros((len(var_list)))
for counter, variable in enumerate(var_list):
div[Counter] = divergence(STV(variable))
I'm looking for a function like STV() which simply changes "string" to the "variable name". Is there a function like that in python? If yes, what is that function (by using such function, data should not be removed from the variable)?
These 3D arrays are large and because of the RAM limitation cannot be saved in another list like:
main_data=[density, temperature, pressure, entropy]
So I cannot have a loop on main_data.
One workaround is to use exec as follows
var_list = ['density', 'temperature', 'pressure', 'entropy']
div = np.zeros((len(var_list)))
for counter, variable in enumerate(var_list):
s = "div[counter] = divergence("+variable+")"
exec(s)
exec basically executes the string given as the argument in the python interpreter.
how about using a dictionary? that links the variable content to names.
Instead of using variable names density = ... use dict entries data['density'] for the data:
data = {}
# load ur variables like:
data['density'] = ...
divs = {}
for key, val in data.items():
divs[key] = divergence(val)
Since the data you use is large and the operations you try to do are computational expensive I would have a look at some of the libraries that provide methods to handle such data structures. Some of them also use c/c++ bindings for the expensive calculations (such as numpy does). Just to name some: numpy, pandas, xarray, iris (especially for earth data)
I am trying to make a heatmap.
I get my data out of a pipeline that class some rows as noisy, I decided to get a plot including them and a plot without them.
The problem I have: In the plot without the noisy rows I have blank line appearing (the same number of lines than rows removed).
Roughly The code looks like that (I can expand part if required I am trying to keep it shorts).
If needed I can provide a link with similar data publicly available.
data_frame = load_df_fromh5(file) # load a data frame from the hdf5 output
noisy = [..] # a list which indicate which row are vector
# I believe the problem being here:
noisy = [i for (i, v) in enumerate(noisy) if v == 1] # make a vector which indicates which index to remove
# drop the corresponding index
df_cells_noisy = df_cells[~df_cells.index.isin(noisy)].dropna(how="any")
#I tried an alternative method:
not_noisy = [0 if e==1 else 1 for e in noisy)
df = df[np.array(not_noisy, dtype=bool)]
# then I made a clustering using scipy
Z = hierarchy.linkage(df, method="average", metric="canberra", optimal_ordering=True)
df = df.reindex(hierarchy.leaves_list(Z))
# the I plot using the df variable
# quit long function I believe the problem being upstream.
plot(df)
The plot is quite long but I believe it works well because the problem only shows with the no noisy data frame.
IMO I believe somehow pandas keep information about the deleted rows and that they are plotted as a blank line. Any help is welcome.
Context:
Those are single-cell data of copy number anomaly (abnormalities of the number of copy of genomic segment)
Rows represent individuals (here individuals cells) columns represents for the genomic interval the number of copy (2 for vanilla (except sexual chromosome)).
I am trying to create a list, which will be fed as input to the neural network of a Deep Reinforcement Learning model.
What I would like to achieve:
This list should have the properties of this code's output
vec = []
lines = open("data/" + "GSPC" + ".csv", "r").read().splitlines()
for line in lines[1:]:
vec.append(float(line.split(",")[4]))
i.e. just a list of values like this [enter image description here][1]
The original dataframe looks like:
Out[0]:
Close sma15
0 1.26420 1.263037
1 1.26465 1.263193
2 1.26430 1.263350
3 1.26450 1.263533
but by using df.transpose() i obtained the following:
0 1 2 3
Close 1.264200 1.264650 1.26430 1.26450
sma15 1.263037 1.263193 1.26335 1.263533
from here I would like to obtain a list grouped by column, of the type:
[1.264200, 1.263037, 1.264650, 1.263193, 1.26430, 1.26335, 1.26450, 1.263533]
I tried
x = np.array(df.values.tolist(), dtype = np.float32).reshape(1,-1)
but this gives me a float with 1 row and 6 columns, how could I achieve a result that has the properties I am looking for?
From what I can understand, you just want a flattened version of the DataFrame's values. That can be done simply with the ndarray.flatten() method rather than reshaping it.
# Creating your DataFrame object
a = [[1.26420, 1.263037],
[1.26465, 1.263193],
[1.26430, 1.263350],
[1.26450, 1.263533]]
df = pd.DataFrame(a, columns=['Close', 'sma15'])
df.values.flatten()
This gives array([1.2642, 1.263037, 1.26465, 1.263193, 1.2643, 1.26335, 1.2645, 1.263533]) as is (presumably) desired.
PS: I am not sure why you have not included the last row of the DataFrame as the output of your transpose operation. Is that an error?
What I want: strain values LE11, LE22, LE12 at nodal points
My script is:
#!/usr/local/bin/python
# coding: latin-1
# making the ODB commands available to the script
from odbAccess import*
import sys
import csv
odbPath = "my *.odb path"
odb = openOdb(path=odbPath)
assembly = odb.rootAssembly
# count the number of frames
NumofFrames = 0
for v in odb.steps["Step-1"].frames:
NumofFrames = NumofFrames + 1
# create a variable that refers to the reference (undeformed) frame
refFrame = odb.steps["Step-1"].frames[0]
# create a variable that refers to the node set ‘Region Of Interest (ROI)’
ROINodeSet = odb.rootAssembly.nodeSets["ROI"]
# create a variable that refers to the reference coordinate ‘REFCOORD’
refCoordinates = refFrame.fieldOutputs["COORD"]
# create a variable that refers to the coordinates of the node
# set in the test frame of the step
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= NODAL)
# count the number of nodes
NumofNodes =0
for v in ROIrefCoords.values:
NumofNodes = NumofNodes +1
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the current frame
currFrame = odb.steps["Step-1"].frames[i1+1]
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the strain 'LE'
Str = currFrame.fieldOutputs["LE"]
ROIStr = Str.getSubset(region=ROINodeSet, position= NODAL)
# initialize list
list = [[]]
# loop over all the nodes in each frame
for i2 in range(NumofNodes):
strain = ROIStr.values [i2]
list.insert(i2,[str(strain.dataDouble[0])+";"+str(strain.dataDouble[1])+\
";"+str(strain.dataDouble[3]))
# write the list in a new *.csv file (code not included for brevity)
odb.close()
The error I get is:
strain = ROIStr.values [i2]
IndexError: Sequence index out of range
Additional info:
Details for ROIStr:
ROIStr.name
'LE'
ROIStr.type
TENSOR_3D_FULL
OIStr.description
'Logarithmic strain components'
ROIStr.componentLabels
('LE11', 'LE22', 'LE33', 'LE12', 'LE13', 'LE23')
ROIStr.getattribute
'getattribute of openOdb(r'path to .odb').steps['Step-1'].frames[1].fieldOutputs['LE'].getSubset(position=INTEGRATION_POINT, region=openOdb(r'path to.odb').rootAssembly.nodeSets['ROI'])'
When I use the same code for VECTOR objects, like 'U' for nodal displacement or 'COORD' for nodal coordinates, everything works without a problem.
The error happens in the first loop. So, it is not the case where it cycles several loops before the error happens.
Question: Does anyone know what is causing the error in the above code?
Here the reason you get an IndexError. Strains are (obviously) calculated at the integration points; according to the ABQ Scripting Reference Guide:
A SymbolicConstant specifying the position of the output in the element. Possible values are:
NODAL, specifying the values calculated at the nodes.
INTEGRATION_POINT, specifying the values calculated at the integration points.
ELEMENT_NODAL, specifying the values obtained by extrapolating results calculated at the integration points.
CENTROID, specifying the value at the centroid obtained by extrapolating results calculated at the integration points.
In order to use your code, therefore, you should get the results using position= ELEMENT_NODAL
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= ELEMENT_NODAL)
With
ROIStr.values[0].data
You will then get an array containing the 6 independent components of your tensor.
Alternative Solution
For reading time series of results for a nodeset, you can use the function xyPlot.xyDataListFromField(). I noticed that this function is much faster than using odbread. The code also is shorter, the only drawback is that you have to get an abaqus license for using it (in contrast to odbread which works with abaqus python which only needs an installed version of abaqus and does not need to get a network license).
For your application, you should do something like:
from abaqus import *
from abaqusConstants import *
from abaqusExceptions import *
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
results = session.openOdb(your_file + '.odb')
# without this, you won't be able to extract the results
session.viewports['Viewport: 1'].setValues(displayedObject=results)
xyList = xyPlot.xyDataListFromField(odb=results, outputPosition=NODAL, variable=((
'LE', INTEGRATION_POINT, ((COMPONENT, 'LE11'), (COMPONENT, 'LE22'), (
COMPONENT, 'LE33'), (COMPONENT, 'LE12'), )), ), nodeSets=(
'ROI', ))
(Of course you have to add LE13 etc.)
You will get a list of xyData
type(xyList[0])
<type 'xyData'>
Containing the desired data for each node and each output. It size will therefore be
len(xyList)
number_of_nodes*number_of_requested_outputs
Where the first number_of_nodes elements of the list are the LE11 at each nodes, then LE22 and so on.
You can then transform this in a NumPy array:
LE11_1 = np.array(xyList[0])
would be LE11 at the first node, with dimensions:
LE.shape
(NumberTimeFrames, 2)
That is, for each time step you have time and output variable.
NumPy arrays are also very easy to write on text files (check out numpy.savetxt).
Im trying to load an array at a specific time frame (for example if it has 50 frames or time units then get an array corresponding to the 2nd time frame) from netCDF files (.nc). Im currently using vtkNetCDFCFReader and getting the data array "vwnd" from the 1st time frame like this:
vtkSmartPointer<vtkNetCDFCFReader> reader = vtkSmartPointer<vtkNetCDFCFReader>::New();
reader->SetFileName(path.c_str());
reader->UpdateMetaData();
vtkSmartPointer<vtkStructuredGridGeometryFilter> geometryFilter = vtkSmartPointer<vtkStructuredGridGeometryFilter>::New();
geometryFilter->SetInputConnection(reader->GetOutputPort());
geometryFilter->Update();
vtkSmartPointer<vtkPolyData> ncPolydata = vtkSmartPointer<vtkPolyData>::New();
ncPolydata = geometryFilter->GetOutput();
vtkSmartPointer<vtkDataArray> dataArray = ncPolydata->GetCellData()->GetArray("vwnd");
Variable Arrays are : lat, lon, time, vwnd (vwnd has dimensions (lat,lon)). Im also interested in getting arrays for lat and lon. Any help would be appreciated.
Thanks in advance
As the dimension of lat/lon is different from vwnd, you will need 2 vtknetCDFreaders to read in data with different dimensions. Just remember to set the dimension after creating the reader.
For example in C++:
vtknetCDFReader* reader = vtknetCDFReader::New();
reader->SetFileName(fileName.c_str());
reader->UpdateMetaData();
//here you specify the dimension of the reader
reader->SetDimension(dim);
reader->SetVariableArrayStatus("lat",1)
reader->SetVariableArrayStatus("lon",1)
reader->Update();
If you are doing it correctly, you could read in any arrays and store it into vtkDataArray.
If you want to read in the vwnd data in the second time step, just skip the first lat*lon values.