I now get "sprintf() [function.sprintf]: Too few arguments" but I've made no page change. What could cause this? - kohana

I have a page on an inherited site that has locations. User enters their address and it searches based off an existing db of locations and shows 10 nearest. It links to google maps. Up until 2 weeks ago it worked fine. Now I get sprintf() [function.sprintf]: Too few arguments on the locations.php page.
What causes this to happen? I'm utterly lost in this framework.
error given: An error was detected which prevented the loading of this page. If this problem persists, please contact the website administrator.
application/controllers/locations.php [87]:
sprintf() [function.sprintf]: Too few arguments
That line on that page has this:
//Count reaulta
$query = sprintf("SELECT *, ( 3958 * acos( cos( radians('{$geocode[2]}') ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians('{$geocode[3]}') ) + sin( radians('{$geocode[2]}') ) * sin( radians( latitude ) ) ) ) AS 'distance' FROM 'locations' HAVING 'distance' < '50'");
$db=new Database;
$query = $db->query($query);
$total = $query->count();
Again, as I haven't made any changes to this page I don't understand what could have suddenly caused this issue.

Related

scipy.optimize.minimize() not converging giving success=False

I recently tried to apply backpropagation algorithm in python, I tried fmin_tnc,bfgs but none of them actually worked, so please help me to figure out the problem.
def sigmoid(Z):
return 1/(1+np.exp(-Z))
def costFunction(nnparams,X,y,input_layer_size=400,hidden_layer_size=25,num_labels=10,lamda=1):
#input_layer_size=400; hidden_layer_size=25; num_labels=10; lamda=1;
Theta1=np.reshape(nnparams[0:hidden_layer_size*(input_layer_size+1)],(hidden_layer_size,(input_layer_size+1)))
Theta2=np.reshape(nnparams[(hidden_layer_size*(input_layer_size+1)):],(num_labels,hidden_layer_size+1))
m=X.shape[0]
J=0;
y=y.reshape(m,1)
Theta1_grad=np.zeros(Theta1.shape)
Theta2_grad=np.zeros(Theta2.shape)
X=np.concatenate([np.ones([m,1]),X],1)
a2=sigmoid(Theta1.dot(X.T));
a2=np.concatenate([np.ones([1,a2.shape[1]]),a2])
h=sigmoid(Theta2.dot(a2))
c=np.array(range(1,11))
y=y==c;
for i in range(y.shape[0]):
J=J+(-1/m)*np.sum(y[i,:]*np.log(h[:,i]) + (1-y[i,:])*np.log(1-h[:,i]) );
DEL2=np.zeros(Theta2.shape); DEL1=np.zeros(Theta1.shape);
for i in range(m):
z2=Theta1.dot(X[i,:].T);
a2=sigmoid(z2).reshape(-1,1);
a2=np.concatenate([np.ones([1,a2.shape[1]]),a2])
z3=Theta2.dot(a2);
# print('z3 shape',z3.shape)
a3=sigmoid(z3).reshape(-1,1);
# print('a3 shape = ',a3.shape)
delta3=(a3-y[i,:].T.reshape(-1,1));
# print('y shape ',y[i,:].T.shape)
delta2=((Theta2.T.dot(delta3)) * (a2 * (1-a2)));
# print('shapes = ',delta3.shape,a3.shape)
DEL2 = DEL2 + delta3.dot(a2.T);
DEL1 = DEL1 + (delta2[1,:])*(X[i,:]);
Theta1_grad=np.zeros(np.shape(Theta1));
Theta2_grad=np.zeros(np.shape(Theta2));
Theta1_grad[:,0]=(DEL1[:,0] * (1/m));
Theta1_grad[:,1:]=(DEL1[:,1:] * (1/m)) + (lamda/m)*(Theta1[:,1:]);
Theta2_grad[:,0]=(DEL2[:,0] * (1/m));
Theta2_grad[:,1:]=(DEL2[:,1:]*(1/m)) + (lamda/m)*(Theta2[:,1:]);
grad=np.concatenate([Theta1_grad.reshape(-1,1),Theta2_grad.reshape(-1,1)]);
return J,grad
This is how I called the function (op is scipy.optimize)
r2=op.minimize(fun=costFunction, x0=nnparams, args=(X, dataY.flatten()),
method='TNC', jac=True, options={'maxiter': 400})
r2 is like this
fun: 3.1045444063663266
jac: array([[-6.73218494e-04],
[-8.93179045e-05],
[-1.13786179e-04],
...,
[ 1.19577741e-03],
[ 5.79555099e-05],
[ 3.85717533e-03]])
message: 'Linear search failed'
nfev: 140
nit: 5
status: 4
success: False
x: array([-0.97996948, -0.44658952, -0.5689309 , ..., 0.03420931,
-0.58005183, -0.74322735])
Please help me to find correct way of minimizing this function, Thanks in advance
Finally Solved it, The problem was I used np.randn() to generate random Theta values which gives random values in a standard normal distribution, therefore as too many values were within the same range,therefore this lead to symmetricity in the theta values. Due to this symmetricity problem the optimization terminates in the middle of the process.
Simple solution was to use np.rand() (which provide uniform random distribution) instead of np.randn()

How to use warm starts in Minizinc?

I'm trying to use the warm start anotation in Minizinc to give a known suboptimal solution to a model.
I started by trying to execute this warm start example from the Minizinc documentation (the only one they provide):
array[1..3] of var 0..10: x;
array[1..3] of var 0.0..10.5: xf;
var bool: b;
array[1..3] of var set of 5..9: xs;
constraint b+sum(x)==1;
constraint b+sum(xf)==2.4;
constraint 5==sum( [ card(xs[i]) | i in index_set(xs) ] );
solve
:: warm_start_array( [ %%% Can be on the upper level
warm_start( x, [<>,8,4] ), %%% Use <> for missing values
warm_start( xf, array1d(-5..-3, [5.6,<>,4.7] ) ),
warm_start( xs, array1d( -3..-2, [ 6..8, 5..7 ] ) )
] )
:: seq_search( [
warm_start_array( [ %%% Now included in seq_search to keep order
warm_start( x, [<>,5,2] ), %%% Repeated warm_starts allowed but not specified
warm_start( xf, array1d(-5..-3, [5.6,<>,4.7] ) ),
warm_start( xs, array1d( -3..-2, [ 6..8, 5..7 ] ) )
] ),
warm_start( [b], [true] ),
int_search(x, first_fail, indomain_min)
] )
minimize x[1] + b + xf[2] + card( xs[1] intersect xs[3] );
The example runs, and it gets the optimal solution. However, the output displays warnings stating all the warm start anotations were ignored.
Warning, ignored search annotation: warm_start_array([warm_start([[xi(1), xi(2)], [i(5), i(2)]]), warm_start([[xf(0), xf(2)], [f(5.6), f(4.7)]]), warm_start([[xs(0), xs(1), xs(2)], [s(), s()]])])
Warning, ignored search annotation: warm_start([[xb(0)], [b(true)]])
Warning, ignored search annotation: warm_start_array([warm_start([[xi(1), xi(2)], [i(8), i(4)]]), warm_start([[xf(0), xf(2)], [f(5.6), f(4.7)]]), warm_start([[xs(0), xs(1), xs(2)], [s(), s()]])])
I didnt modified anything in the example, just copy-pasted it and ran it in the Minizinc IDE with the Geocode default solver. In case it is relevant, I'm using Windows. I have ran other models and used other search anotations without problems.
In the example there is two blocks of warm stars (one after solve and one inside seq_search). I'm not sure if both are necessary. I tried removing one, then the other, but the warnings still happen for all the remaining warm start anotations. Also I dont get why 'b' isnt refered in the fisrt block.
There is a similar example in git https://github.com/google/or-tools/issues/539 but it also produces the warnings.
If someone could point me out to a working example of warm_start it would be great.
Your usage of the warm_start annotations are correct, but warm start annotations are currently not supported in most solvers. At the time of writing I believe the warm start annotations are only supported by the Mixed Integer Programming interfaces (CoinBC, Gurobi, CPlex, XPress, and SCIP). Although we've been working on adding support for the annotation in Gecode and Chuffed, support for this annotation has not been included in any of the released versions.

Multiple Linear Regression in Power BI

Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns for the sake of having a concrete example:
Date Equity Duration Credit Manager
-----------------------------------------------
01/31/2017 2.907% 0.226% 1.240% 1.78%
02/28/2017 2.513% 0.493% 1.120% 3.88%
03/31/2017 1.346% -0.046% -0.250% 0.13%
04/30/2017 1.612% 0.695% 0.620% 1.04%
05/31/2017 2.209% 0.653% 0.480% 1.40%
06/30/2017 0.796% -0.162% 0.350% 0.63%
07/31/2017 2.733% 0.167% 0.830% 2.06%
08/31/2017 0.401% 1.083% -0.670% 0.29%
09/30/2017 1.880% -0.857% 1.430% 2.04%
10/31/2017 2.151% -0.121% 0.510% 2.33%
11/30/2017 2.020% -0.137% -0.020% 3.06%
12/31/2017 1.454% 0.309% 0.230% 1.28%
Now in Excel, I can just use the LINEST function to get the beta values:
= LINEST(Returns[Manager], Returns[[Equity]:[Credit]], TRUE, TRUE)
It spits out an array that looks like this:
0.077250253 -0.184974002 0.961578127 -0.001063971
0.707796954 0.60202895 0.540811546 0.008257129
0.50202386 0.009166729 #N/A #N/A
2.688342242 8 #N/A #N/A
0.000677695 0.000672231 #N/A #N/A
The betas are in the top row and using them gives me the following linear estimate:
Manager = 0.962 * Equity - 0.185 * Duration + 0.077 * Credit - 0.001
The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)?
For simple linear regression against one column, I can go back to the mathematical definition and write a least squares implementation similar to the one given in this post.
However, when more columns become involved (I need to be able to do up to 12 columns, but not always the same number), this gets messy really quickly and I'm hoping there's a better way.
The essence:
DAX is not the way to go. Use Home > Edit Queries and then Transform > Run R Script. Insert the following R snippet to run a regression analysis using all available variables in a table:
model <- lm(Manager ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
Edit Manager to any of the other available variable names to change the dependent variable.
The details:
Good question! Why Microsoft has not introduced more flexible solutions is beyond my understanding. But at the time being, you won't be able to find very good approaches without using R in Power BI.
My suggested approach will therefore ignore your request regarding:
The question is how can I get these values in Power BI using DAX
(preferably without having to write a custom R script)?
My answer will however meet your requirements regarding:
A good answer should generalize to more than 3 columns (probably by
working on an unpivoted data table with the indices as values rather
than column headers).
Here we go:
I'm on a system using comma as a decimal separator, so I'm going to be using the following as the data source (If you copy the numbers directly into Power BI, the column separation will not be maintained. If you first paste it into Excel, copy it again and THEN paste it into Power BI the columns will be fine):
Date Equity Duration Credit Manager
31.01.2017 2,907 0,226 1,24 1,78
28.02.2017 2,513 0,493 1,12 3,88
31.03.2017 1,346 -0,046 -0,25 0,13
30.04.2017 1,612 0,695 0,62 1,04
31.05.2017 2,209 0,653 0,48 1,4
30.06.2017 0,796 -0,162 0,35 0,63
31.07.2017 2,733 0,167 0,83 2,06
31.08.2017 0,401 1,083 -0,67 0,29
30.09.2017 1,88 -0,857 1,43 2,04
31.10.2017 2,151 -0,121 0,51 2,33
30.11.2017 2,02 -0,137 -0,02 3,06
31.12.2017 1,454 0,309 0,23 1,28
Starting from scratch in Power BI (for reproducibility purposes) I'm inserting the data using Enter Data:
Now, go to Edit Queries > Edit Queries and check that you have this:
In order to maintain flexibility with regards to the number of columns to include in your analysis, I find it is best to remove the Date Column. This will not have an impact on your regression results. Simply right-click the Date column and select Remove:
Notice that this will add a new step under Query Settings > Applied Steps>:
And this is where you are going to be able to edit the few lines of R code we're going to use. Now, go to Transform > Run R Script to open this window:
Notice the line # 'dataset' holds the input data for this script. Thankfully, your question is only about ONE input table, so things aren't going to get too complicated (for multiple input tables check out this post). The dataset variable is a variable of the form data.frame in R and is a good (the only..) starting point for further analysis.
Insert the following script:
model <- lm(Manager ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
Click OK, and if all goes well you should end up with this:
Click Table, and you'll get this:
Under Applied Steps you'll se that a Run R Script step has been inserted. Click the star (gear ?) on the right to edit it, or click on df to format the output table.
This is it! For the Edit Queries part at least.
Click Home > Close & Apply to get back to Power BI Report section and verfiy that you have a new table under Visualizations > Fields:
Insert a Table or Matrix and activate Coefficients and Variables to get this:
I hope this is what you were looking for!
Now for some details about the R script:
As long as it's possible, I would avoid using numerous different R libraries. This way you'll reduce the risk of dependency issues.
The function lm() handles the regression analysis. The key to obtain the required flexibilty with regards to the number of explanatory variables lies in the Manager ~ . , dataset part. This simply says to run a regression analysis on the Manager variable in the dataframe dataset, and use all remaining columns ~ . as explanatory variables. The coef(model) part extracts the coefficient values from the estimated model. The result is a dataframe with the variable names as row names. The last line simply adds these names to the dataframe itself.
As there is no equivalent or handy replacement for LINEST function in Power BI (I'm sure you've done enough research before posting the question), any attempts would mean rewriting the whole function in Power Query / M, which is already not that "simple" for the case of simple linear regression, not to mention multiple variables.
Rather than (re)inventing the wheel, it's inevitably much easier (one-liner code..) to do it with R script in Power BI.
It's not a bad option given that I have no prior R experience. After a few searches and trial-and-error, I'm able to come up with this:
# 'dataset' holds the input data for this script
# install.packages("broom") # uncomment to install if package does not exist
library(broom)
model <- lm(Manager ~ Equity + Duration + Credit, dataset)
model <- tidy(model)
lm is the built-in linear model function from R, and the tidy function comes with the broom package, which tidies up the output and output a data frame for Power BI.
With the columns term and estimate, this should be sufficient to calculate the estimate you want.
The M Query for your reference:
let
Source = Csv.Document(File.Contents("returns.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type text}, {"Equity", Percentage.Type}, {"Duration", Percentage.Type}, {"Credit", Percentage.Type}, {"Manager", Percentage.Type}}),
#"Run R Script" = R.Execute("# 'dataset' holds the input data for this script#(lf)# install.packages(""broom"")#(lf)library(broom)#(lf)#(lf)model <- lm(Manager ~ Equity + Duration + Credit, dataset)#(lf)model <- tidy(model)",[dataset=#"Changed Type"]),
#"""model""" = #"Run R Script"{[Name="model"]}[Value]
in
#"""model"""
I've figured out how to do this for three variables specifically but this approach doesn't scale up or down to more or fewer variables at all.
Regression =
VAR ShortNames =
SELECTCOLUMNS (
Returns,
"A", [Equity],
"D", [Duration],
"C", [Credit],
"Y", [Manager]
)
VAR n = COUNTROWS ( ShortNames )
VAR A = SUMX ( ShortNames, [A] )
VAR D = SUMX ( ShortNames, [D] )
VAR C = SUMX ( ShortNames, [C] )
VAR Y = SUMX ( ShortNames, [Y] )
VAR AA = SUMX ( ShortNames, [A] * [A] ) - A * A / n
VAR DD = SUMX ( ShortNames, [D] * [D] ) - D * D / n
VAR CC = SUMX ( ShortNames, [C] * [C] ) - C * C / n
VAR AD = SUMX ( ShortNames, [A] * [D] ) - A * D / n
VAR AC = SUMX ( ShortNames, [A] * [C] ) - A * C / n
VAR DC = SUMX ( ShortNames, [D] * [C] ) - D * C / n
VAR AY = SUMX ( ShortNames, [A] * [Y] ) - A * Y / n
VAR DY = SUMX ( ShortNames, [D] * [Y] ) - D * Y / n
VAR CY = SUMX ( ShortNames, [C] * [Y] ) - C * Y / n
VAR BetaA =
DIVIDE (
AY*DC*DC - AD*CY*DC - AY*CC*DD + AC*CY*DD + AD*CC*DY - AC*DC*DY,
AD*CC*AD - AC*DC*AD - AD*AC*DC + AA*DC*DC + AC*AC*DD - AA*CC*DD
)
VAR BetaD =
DIVIDE (
AY*CC*AD - AC*CY*AD - AY*AC*DC + AA*CY*DC + AC*AC*DY - AA*CC*DY,
AD*CC*AD - AC*DC*AD - AD*AC*DC + AA*DC*DC + AC*AC*DD - AA*CC*DD
)
VAR BetaC =
DIVIDE (
- AY*DC*AD + AD*CY*AD + AY*AC*DD - AA*CY*DD - AD*AC*DY + AA*DC*DY,
AD*CC*AD - AC*DC*AD - AD*AC*DC + AA*DC*DC + AC*AC*DD - AA*CC*DD
)
VAR Intercept =
AVERAGEX ( ShortNames, [Y] )
- AVERAGEX ( ShortNames, [A] ) * BetaA
- AVERAGEX ( ShortNames, [D] ) * BetaD
- AVERAGEX ( ShortNames, [C] ) * BetaC
RETURN
{ BetaA, BetaD, BetaC, Intercept }
This is a calculated table that returns the regression coefficients specified:
These numbers match the output from LINEST for the data provided.
Note: The LINEST values I quoted in the question are slightly different from theses as they were calculated from unrounded returns rather than the rounded returns provided in the question.
I referenced this document for the calculation set up and Mathematica to solve the system:

Dodging error bars in marginsplot in Stata

I am using marginsplot to draw some error bars between two different groups. The error bars overlap though, so I'm trying to dodge them slightly left-or-right from one another.
Here is an example slightly edited from the marginsplot help that illustrates the problem:
use http://www.stata-press.com/data/r13/nhanes2
quietly regress bpsystol agegrp##sex
quietly margins agegrp#sex
marginsplot, recast(scatter) ciopts(recast(rspike))
Is there any easy way to dodge the blue Male points and bars slightly to the left, and the red Female points and bars slightly to the right (or vice-versa)? Like what is done is dodged bar charts.
Here it would work out fine to recast the confidence intervals to an area and make it slightly transparent as in the help example further down the line. However, for my actual use I would like to keep the points and spikes.
Here is an approach using the community-contributed commands parmest and eclplot.
The trick is to adjust the values of the group variable by a small amount, for example 0.1, and then to use the subby option of eclplot:
** a short version
use http://www.stata-press.com/data/r13/nhanes2
qui reg bpsystol agegrp##sex
qui margins agegrp#sex
qui parmest , bmat(r(b)) vmat(r(V)) level( `cilevel' ) fast
qui split parm, parse( . # )
qui destring parm*, replace
replace parm1 = parm1 - ( 0.05 )
eclplot estimate min95 max95 parm1, eplot(sc) rplottype(rspike) supby(parm3, spaceby(0.1))
However, the problem with this approach is that all the labels get lost but I do not know of a good way to retrieve them, other than by brute force.
The following is an extended version of the code where I tried to automate re-application of all the value labels by a brute force method:
use http://www.stata-press.com/data/r13/nhanes2, clear
** specify parameters and variables
local cilevel = 95
local groupvar agegrp
local typevar sex
local ytitle "Linear Prediction"
local title "Adjust Predictions of `groupvar'#`typevar' with `cilevel'% CIs"
local eplot scatter
local rplottype rspike
local spaceby 0.1 /* use this param to control the dodge */
** store labels of groupvar ("agegrp") and typevar ("sex")
local varlist `groupvar' `typevar'
foreach vv of var `varlist' {
local `vv'_varlab : var lab `vv'
qui levelsof `vv', local( `vv'_vals )
foreach vl of local `vv'_vals {
local `vv'_`vl'lab : lab `vv' `vl'
lab def `vv'_vallab `vl' "``vv'_`vl'lab'", add
}
}
** run analysis
qui reg bpsystol `groupvar'##`typevar'
margins `groupvar'#`typevar'
** use parmest to store estimates
preserve
parmest , bmat(r(b)) vmat(r(V)) level( `cilevel' ) fast
lab var estimate "`ytitle'"
split parm, parse( . # )
qui destring parm*, replace
rename parm1 `groupvar'
rename parm3 `typevar'
** reaply stored labels
foreach vv of var `varlist' {
lab var `vv' "``vv'_varlab'"
lab val `vv' `vv'_vallab
}
** dodge and plot
replace agegrp = agegrp - ( `spaceby' / 2 )
eclplot estimate min95 max95 agegrp ///
, eplot( `eplot' ) rplottype( `rplottype' ) ///
supby( sex, spaceby( `spaceby' ) ) ///
estopts1( mcolor( navy ) ) estopts2( mcolor( maroon ) ) ///
ciopts1( lcolor( navy ) ) ciopts2( lcolor( maroon ) ) ///
title( "`title'" )
restore

Getting all dihedral angles in Pymol

I want to get all the dihedral angles of a protein in Pymol (phi, psi, chi1, chi2, chi3, chi4) but I only manage to find a function that can shows me the phi and psi.
For instance:
PyMOL>phi_psi 1a11
SER-2: ( 67.5, 172.8 )
GLU-3: ( -59.6, -19.4 )
LYS-4: ( -66.4, -61.7 )
MET-5: ( -64.1, -17.9 )
SER-6: ( -78.3, -33.7 )
THR-7: ( -84.0, -18.1 )
ALA-8: ( -85.7, -40.8 )
ILE-9: ( -75.1, -30.8 )
SER-10: ( -77.6, -47.0 )
VAL-11: ( -61.3, -27.4 )
LEU-12: ( -60.7, -47.5 )
LEU-13: ( -71.1, -38.6 )
ALA-14: ( -46.2, -50.7 )
GLN-15: ( -69.1, -47.4 )
ALA-16: ( -41.9, -52.6 )
VAL-17: ( -82.6, -23.7 )
PHE-18: ( -53.4, -63.4 )
LEU-19: ( -61.2, -30.4 )
LEU-20: ( -61.1, -32.3 )
LEU-21: ( -80.6, -60.1 )
THR-22: ( -45.9, -34.4 )
SER-23: ( -74.5, -47.8 )
GLN-24: ( -83.5, 11.0 )
It's missing the chiral angles. Does anyone know how to get all the dihedral angles?
Many thanks!
You can get arbitrary dihedral angles with get_dihedral. Create four selections, each with a single atom and then use it like this:
get_dihedral s1, s2, s3, s4
It's exposed to the Python API as cmd.get_dihedral(). I suggest writing a Python script that uses this function along with cmd.iterate() to loop over residues. Create a dict so that on each residue you can look up a list of atom quadruples that define the chi-angles.
You can easily do it in R. This is the link containing information on how to calculate the main chain and side chain Torsion/Dihedral Angles:
http://thegrantlab.org/bio3d/html/torsion.pdb.html
But first you have to install the Bio3D package for R: http://thegrantlab.org/bio3d/download
After installing the package, load it by typing library(bio3d) at the R console prompt.
>library(bio3d)
This R script answers your question:
#returns the file path of the current working directory.
getwd()
#sets the working directory to where you want.
setwd("home/R/Rscripts")
#fetches the pdb file from the protein data bank and saves to dataframe 'pb'
pb <- read.pdb("insert PDB ID")
#trim to protein only
pb.prot <- trim.pdb(pb, "protein")
#calculates the torsion angles of the protein and save to dataframe 'tor'
tor <- torsion.pdb(pb.prot)
#to get the number of rows and columns of 'tor'
dim(tor$tbl)
#identify each row by their chain, residue ID and residue Number obtained from your PDB entry
res_label <- paste(pb.prot$atom$chain[pb.prot$calpha], pb.prot$atom$resid[pb.prot$calpha], pb.prot$atom$resno[pb.prot$calpha], sep="-")
rownames(tor$tbl) <- res_label
#creates a table of the torsion angle
torsion <- tor$tbl
#For example, to look at the angles for VAL, residue 223 from chain A
tor$tbl["A-VAL-223",]
#writes out the table to a file
write.table(torsion, file = "torsion_angles.txt", quote = F, sep = "\t")
Your output file which is saved in your working directory will contain a table of the chain-resID-resNo and their corresponding phi, psi, chi1, chi2, chi3, chi4, and chi5 values. Goodluck!
#install bio3d library and call
library(bio3d)
#returns the file path of the current working directory.
getwd()
#sets the working directory to where you want.
setwd("home/R/Rscripts")
#fetches the pdb file from the protein data bank and saves to dataframe 'pb'
pb <- read.pdb("insert PDB ID")
#trim to protein only
pb.prot <- trim.pdb(pb, "protein")
#calculates the torsion angles of the protein and save to dataframe 'tor'
tor <- torsion.pdb(pb.prot)
#to get the number of rows and columns of 'tor'
dim(tor$tbl)
#identify each row by their chain, residue ID and residue Number obtained from your PDB entry
res_label <- paste(pb.prot$atom$chain[pb.prot$calpha], pb.prot$atom$resid[pb.prot$calpha], pb.prot$atom$resno[pb.prot$calpha], sep="-")
rownames(tor$tbl) <- res_label
#creates a table of the torsion angle
torsion <- tor$tbl
#For example, to look at the angles for VAL, residue 223 from chain A
tor$tbl["A-GLY-65",]
#convert "double" into a datatype
dataframe_df=as.data.frame.matrix(torsion)
#write dataframe to a .csv file
write.csv(dataframe_df, file="name.csv", row.names=TRUE,col.names=TRUE)

Resources