Forest Plot for meta-analysis (including weight total) - forestplot

So my problem lies within Rstudio. I have a data set with three columns (study name, effect size (derived from odds ratios) and sample size). Once I have run my meta-regression model I cannot find a code that allows me to generate a forest plot similar to the one included in the link. I am quite new to R and would consider myself a beginner so apologies for a potentially silly question.
I am using a MacBook pro and Rstudio version 2022.02.0+443

Related

Poisson distribution transformation

I'm quite new to biostatistics so I apologize if my question is too dumb.
I'm studying data transformation in biostatistics to fit my data to the normal distribution.
I started with the Poisson distribution (which is quite common in the biostatistics: daily admissions, prevalence of rare disease etc) It is recommended to use the square root to fit data to normal distribution.
I used stata and this free dataset ( https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017?resource=download ) with the results of a huge amount of football matches.
I have created a new variable for this dataset, made by the whole amount of goals scored by both teams in each match. You will find that as the independent variable distributed as following:
We can see that the distribution quite approximate the Poisson's one, as confirmed by the values of mean and std deviation.
Then, I've created a new variable with the square root of this variable and the distribution is the following (blue line is how the normal distrib with the same mean and std deviation looks like):
As you can see It's quite far from a normal distribution of my data, as proven by normality tests, but also easily visible from the q-q plot:
So, my question is, why sqrt didn't work? What can I do to transform my dataset to fit the normal distribution?

How to understand the process of transforming research data into certain distribution (not statistical distribution)?

Statistics is not my major and English is not my native language. I tried to apply for data analysis or data science work in industry. However, I do not know how to describe my research process below in a concise and professional way. I highly appreciated if you could provide me such help.
Background: I simulating properties of materials using different research packages, such as LAMMPS. The simulated data are only coordinates of atoms. Below are my data analysis.
step 1: clean the data to make sure the data complete and atom ID is unique and not exchangeable at different time moments (timesteps).
step 2: Calculated the neighbor atoms' distance of each center atom to find the target species (a configuration formed by several target atoms, such as Al-O-H, Si-O-H, Al-O-H2, H3-O)
step 3: count the amount of species as functions of space and/or time and draw the species distribution as functions of space and/or time, lifetime distribution of species.
NOTE: such distribution is different from statistical distribution, such as Normal Distribution, Binomial Distribution.
step 4: Based on above distribution, the correlation between species would be explored and interpreted.
After above steps, I study the mechanism behind based on materials selves and local environment.
Could anyone point out how to understand above steps in statistical terms or data analytic terms or others?
I sincerely appreciate your time and help.

Pre-aligning molecules in Rdkit before computing shape similarity with ShapeTanimotoDist() possible?

I am building a script to compare shapes of Rdkit generated conformers for a query ligand to a reference ligand extracted from a template protein-ligand complex. For this I want to use the shape similarity Tanimoto metric ShapeTanimotoDist() provided by Rdkit. It seems however that this function does not pre-align the molecules when computing shape similarity. When I did some searches I stumbled upon this discussion 10 years ago wherein someone attempted something similar: https://sourceforge.net/p/rdkit/mailman/message/21906484/.
Quoting Greg Landrum:
There is no alignment step. If you want reasonable shape comparisons,
you first need a reasonable alignment of the molecules. The RDKit
doesn't currently provide a practical method of doing this alignment.
So I am wondering if since then this issue has been resolved and that it therefore would be reasonable to just use this function in a standalone fashion to compare shapes of molecules? In the documentation it states under ShapeTanimotoDist() that it uses a "predefined alignment", which is not elaborated further. I have looked into documentation for the 2 molecule aligning functions Rdkit provides: AlignMol and Open3DAlign (O3A) https://www.rdkit.org/docs/source/rdkit.Chem.rdMolAlign.html. For some reason AlignMol does not work for me (Runtime error), albeit O3A which is supported in Rdkit since 2014 did allow me to compare the conformers with ref ligands. However, when creating an O3A object, is there a way to somehow retrieve the coordinates of the conformer and ref molecule alignment to feed into ShapeTanimotoDist()? And also perhaps visualize this using PyMol?
Cheers
Also perhaps useful to consult: 3D functionality in RDkit section https://www.rdkit.org/docs/Cookbook.html

Invalid Graphics State - ongoing issue for beginner RStudio/R user

I am working on an assignment for a course. The code creates variables for use in a data.dist function from rms. We then create a simple linear regression model using ols(). Printing/creating the first plot before the data.dist() and ols() functions is simple. We use:
plot(x,y,pch='o')
lines(x,yTrue,lty=2,lwd=.5,col='red')
Then we create the data.dist() and the ols(), here named fit0.
mydat=data.frame(x=x,y=y)
dd=datadist(mydat)
options(datadist='dd')
fit0=ols(y~x,data=mydat)
fit0
anova(fit0)
This all works smoothly, printing out the results of the linear regression and the anova table. Then, we want to predict based on the model, and plot these predictions. The plot prints out nicely, however the lines and points won't show up here. The code:
ff=Predict(fit0)
plot(ff)
lines(x,yTrue,lwd=2,lty=1,col='red')
points(x,y,pch='.')
Note - this works fine in R. I much prefer to use RStudio, though can switch to R if there's no clear solution this issue. I've tried dev.off() several times (i.e. repeat until get, I've tried closing RStudio and re-opening, I've uninstalled and reinstalled R and RStudio, rms package (which includes ggplot2), updated the packages, made my RStudio graphics window larger. Any solution I've seen, doesn't work. Help!

Interpolation technique for weirdly spaced point data

I have a spatial dataset that consists of a large number of point measurements (n=10^4) that were taken along regular grid lines (500m x 500m) and some arbitrary lines and blocks in between. Single measurements taken with a spacing of about 0.3-1.0m (varying) along these lines (see example showing every 10th point).
The data can be assumed to be normally distributed but shows a strong small-scale variability in some regions. And there is some trend with elevation (r=0.5) that can easily be removed.
Regardless of the coding platform, I'm looking for a good or "the optimal" way to interpolate these points to a regular 25 x 25m grid over the entire area of interest (5000 x 7000m). I know about the wide range of kriging techniques but I wondered if somebody has a specific idea on how to handle the "oversampling along lines" with rather large gaps between the lines.
Thank you for any advice!
Leo
Kriging technique does not perform well when the points to interpolate are taken on a regular grid, because it is necessary to have a wide range of different inter-points distances in order to well estimate the covariance model.
Your case is a bit particular... The oversampling over the lines is not a problem at all. The main problem is the big holes you have in your grid. If think that these holes will create problems whatever the interpolation technique you use.
However it is difficult to predict a priori if kriging will behave well. I advise you to try it anyway.
Kriging is only suited for interpolating. You cannot extrapolate with kriging metamodel, so that you won't be able to predict values in the bottom left part of your figure for example (because you have no point here).
To perform kriging, I advise you to use the following tools (depending the languages you're more familiar with):
DiceKriging package in R (the one I use preferably)
fields package in R (which is more specialized on spatial fields)
DACE toolbox in matlab
Bonus: a link to a reference book about kriging which is available online: http://www.gaussianprocess.org/
PS: This type of question is more statistics oriented than programming and may be better suited to the stats.stackexchange.com website.

Resources