R package rstatix - ANOVA: What is my error? - statistics

I wanted to perform ANOVA on my dataset using rstatix package.
This is the command I used
anova_test(data = light3, dv = gene_copies, wid = ID, within = treatment)
And this is the error it gives me:
Fehler in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
My data consists of 4 different groups (treatment, factor class). Per group there are 3x3 values (gene_copies, numeric class). Each value has an individual ID and assigned timepoint (3 values per timepoint, timepoint factor class) in a separate column. There are no NAs in the table and every group+timepoint has 3 values so that everything is balanced out.
I adapted the command from this script:
https://www.datanovia.com/en/lessons/repeated-measures-anova-in-r/
My dataset has the exact same structure.
Please help

Related

Are there built-in primitives for interactions in Feature tools?

are there built-in primitives performing absolute and relative differences between two numeric columns? Two date columns?
This can currently be done for numeric columns, but not datetimes.
With interaction terms, we typically recommend you manually define the specific features you want. For example, here is how to define difference and absolute difference between to numeric features
import featuretools as ft
es = ft.demo.load_retail(nrows=1000)
total = ft.Feature(es["order_products"]["total"])
unit_price = ft.Feature(es["order_products"]["unit_price"])
difference = unit_price - total
absolute_diff = abs(difference)
fm = ft.calculate_feature_matrix(features=[difference, absolute_diff], entityset=es)
fm.head()
this returns
unit_price - total ABSOLUTE(unit_price - total)
order_product_id
0 -21.0375 21.0375
1 -27.9675 27.9675
2 -31.7625 31.7625
3 -27.9675 27.9675
4 -27.9675 27.9675
We could also pass those values those values to ft.dfs as seed features if we wanted other primitives to stack on top of them.

Get feature names for dataframe.corr

I am using the cancer data set from sklearn and I need to find the correlations between features. I am able to find the correlated columns, but I am not able to present them in a "nice" way, so that they will be an input for Dataframe.drop.
Here is my code:
cancer_data = load_breast_cancer()
df=pd.DataFrame(cancer_data.data, columns=cancer_data.feature_names)
corr = df.corr()
#filter to find correlations above 0.6
corr_triu = corr.where(~pd.np.tril(pd.np.ones(corr.shape)).astype(pd.np.bool))
corr_triu = corr_triu.stack()
corr_result = corr_triu[corr_triu > 0.6]
print(corr_result)
df.drop(columns=[?])
IIUC, you want the columns that correlate with some other column in the dataset, ie drop columns that don't appear in corr_result. So you'll want to get the unique variables from the index of corr_result, from each level. There may be repeats so take care of that as well, such as with sets:
corr_result.index = corr_result.index.remove_unused_levels()
corr_vars = set()
corr_vars.update(corr_result.index.unique(level=0))
corr_vars.update(corr_result.index.unique(level=1))
all_vars = set(df.columns)
df.drop(columns=all_vars - corr_vars)

how do i check if a data set is normal or not in python?

So I'm creating a master program for machine learning from scratch in python and the first step i want to do is to check if the data set is normal or not.
ps : the data set can have many features or just a single feature.
It has to be implemented in python3.
also, normalizing the data can be done by the below function right :
# Find the min and max values for each column
def dataset_minmax(dataset):
minmax = list()
for i in range(len(dataset[0])):
col_values = [row[i] for row in dataset]
value_min = min(col_values)
value_max = max(col_values)
minmax.append([value_min, value_max])
return minmax
# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
for row in dataset:
for i in range(len(row)):
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
THANKS IN ADVANCE!
Your question seems discordant: if your features are not coming from a normal distribution, you cannot "normalize" them, in the sense of changing their distribution. If you mean to check if they have average 0 and SD of 1 that is a different ballpark game.

Abaqus Python script -- Reading 'TENSOR_3D_FULL' data from *.odb file

What I want: strain values LE11, LE22, LE12 at nodal points
My script is:
#!/usr/local/bin/python
# coding: latin-1
# making the ODB commands available to the script
from odbAccess import*
import sys
import csv
odbPath = "my *.odb path"
odb = openOdb(path=odbPath)
assembly = odb.rootAssembly
# count the number of frames
NumofFrames = 0
for v in odb.steps["Step-1"].frames:
NumofFrames = NumofFrames + 1
# create a variable that refers to the reference (undeformed) frame
refFrame = odb.steps["Step-1"].frames[0]
# create a variable that refers to the node set ‘Region Of Interest (ROI)’
ROINodeSet = odb.rootAssembly.nodeSets["ROI"]
# create a variable that refers to the reference coordinate ‘REFCOORD’
refCoordinates = refFrame.fieldOutputs["COORD"]
# create a variable that refers to the coordinates of the node
# set in the test frame of the step
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= NODAL)
# count the number of nodes
NumofNodes =0
for v in ROIrefCoords.values:
NumofNodes = NumofNodes +1
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the current frame
currFrame = odb.steps["Step-1"].frames[i1+1]
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the strain 'LE'
Str = currFrame.fieldOutputs["LE"]
ROIStr = Str.getSubset(region=ROINodeSet, position= NODAL)
# initialize list
list = [[]]
# loop over all the nodes in each frame
for i2 in range(NumofNodes):
strain = ROIStr.values [i2]
list.insert(i2,[str(strain.dataDouble[0])+";"+str(strain.dataDouble[1])+\
";"+str(strain.dataDouble[3]))
# write the list in a new *.csv file (code not included for brevity)
odb.close()
The error I get is:
strain = ROIStr.values [i2]
IndexError: Sequence index out of range
Additional info:
Details for ROIStr:
ROIStr.name
'LE'
ROIStr.type
TENSOR_3D_FULL
OIStr.description
'Logarithmic strain components'
ROIStr.componentLabels
('LE11', 'LE22', 'LE33', 'LE12', 'LE13', 'LE23')
ROIStr.getattribute
'getattribute of openOdb(r'path to .odb').steps['Step-1'].frames[1].fieldOutputs['LE'].getSubset(position=INTEGRATION_POINT, region=openOdb(r'path to.odb').rootAssembly.nodeSets['ROI'])'
When I use the same code for VECTOR objects, like 'U' for nodal displacement or 'COORD' for nodal coordinates, everything works without a problem.
The error happens in the first loop. So, it is not the case where it cycles several loops before the error happens.
Question: Does anyone know what is causing the error in the above code?
Here the reason you get an IndexError. Strains are (obviously) calculated at the integration points; according to the ABQ Scripting Reference Guide:
A SymbolicConstant specifying the position of the output in the element. Possible values are:
NODAL, specifying the values calculated at the nodes.
INTEGRATION_POINT, specifying the values calculated at the integration points.
ELEMENT_NODAL, specifying the values obtained by extrapolating results calculated at the integration points.
CENTROID, specifying the value at the centroid obtained by extrapolating results calculated at the integration points.
In order to use your code, therefore, you should get the results using position= ELEMENT_NODAL
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= ELEMENT_NODAL)
With
ROIStr.values[0].data
You will then get an array containing the 6 independent components of your tensor.
Alternative Solution
For reading time series of results for a nodeset, you can use the function xyPlot.xyDataListFromField(). I noticed that this function is much faster than using odbread. The code also is shorter, the only drawback is that you have to get an abaqus license for using it (in contrast to odbread which works with abaqus python which only needs an installed version of abaqus and does not need to get a network license).
For your application, you should do something like:
from abaqus import *
from abaqusConstants import *
from abaqusExceptions import *
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
results = session.openOdb(your_file + '.odb')
# without this, you won't be able to extract the results
session.viewports['Viewport: 1'].setValues(displayedObject=results)
xyList = xyPlot.xyDataListFromField(odb=results, outputPosition=NODAL, variable=((
'LE', INTEGRATION_POINT, ((COMPONENT, 'LE11'), (COMPONENT, 'LE22'), (
COMPONENT, 'LE33'), (COMPONENT, 'LE12'), )), ), nodeSets=(
'ROI', ))
(Of course you have to add LE13 etc.)
You will get a list of xyData
type(xyList[0])
<type 'xyData'>
Containing the desired data for each node and each output. It size will therefore be
len(xyList)
number_of_nodes*number_of_requested_outputs
Where the first number_of_nodes elements of the list are the LE11 at each nodes, then LE22 and so on.
You can then transform this in a NumPy array:
LE11_1 = np.array(xyList[0])
would be LE11 at the first node, with dimensions:
LE.shape
(NumberTimeFrames, 2)
That is, for each time step you have time and output variable.
NumPy arrays are also very easy to write on text files (check out numpy.savetxt).

How to find correlation between any combination of arrays

I have 10 data sets and I want to check the correlation between all possible pairs.For example, if I had:
ABCD
I want to check the correlation between AB, AC, AD, BC etc.
I've been using Correl function in excel which is fine for small data sets but if I had 1000 data sets instead of 10, how would I do this?
This solution assumes you have datasets in your global environment and they can be "scraped" based on some criterion. In my case, I opted for ".string" handle. If not, you have to come up with your own way of putting names into a string. Another way would be to put all datasets into a list and work with indices.
A.string <- runif(5)
B.string <- runif(5)
C.string <- runif(5)
# find variables based on a common string
pairs <- combn(ls(pattern = "\\.string"), 2)
# for each pair, fetch variable and use function cor()
apply(pairs, MARGIN = 2, FUN = function(x) {
cor(get(x[1]), get(x[2]))
})
[1] 0.2586141 0.7106571 0.7119712

Resources