Capture Output to stream and store as a string variable - string

Although this question relates to 'BioPerl', the question, I believe, is probably more general than that.
Basically I have produced a Bio::Tree::TreeI object and I am trying to convert that into a string variable.
The only way I can come close to converting that to a string variable is to write that tree to a stream using:
# a $tree = Bio::Tree::TreeI->new() (which I know is an actual tree as it prints to the terminal console)
my $treeOut = Bio::TreeIO->new(-format => 'newick')
$treeOut->write_tree($tree)
The output of ->write_tree is "Writes a tree onto the stream" but how do I capture that in a string variable as I can't find another way of returning a string from any of the functions in Bio::TreeIO

You can redirect standard output to variable,
my $captured;
{
local *STDOUT = do { open my $fh, ">", \$captured; $fh };
$treeOut->write_tree($tree);
}
print $captured;

There is an easier way to accomplish the same goal by setting the file handle for BioPerl objects, and I think it is less of a hack. Here is an example:
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::TreeIO;
my $treeio = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA);
my $treeout = Bio::TreeIO->new(-format => 'newick', -fh => \*STDOUT);
while (my $tree = $treeio->next_tree) {
$treeout->write_tree($tree);
}
__DATA__
(A:9.70,(B:8.234,(C:7.932,(D:6.321,((E:2.342,F:2.321):4.231,((((G:4.561,H:3.721):3.9623,
I:3.645):2.341,J:4.893):4.671)):0.234):0.567):0.673):0.456);
Running this script prints the newick string to your terminal, as you would expect. If you use Bio::Phylo (which I recommend), there is a to_string method (IIRC), so you don't have to create an object just to print your trees, you can just do say $tree->to_string.

Related

How to get the value after a string using perl reg expression

I have the following string :
{\"id\":01,\"start_time\":\"1477954800000\",\"stop_time\":\"1485817200000\",\"url\":http:://www.example.com\}
and I'd like to get for example the value of start_time (1477954800000).
I tried several things in https://regex101.com/ but I could not find a way to deal with the special characters (\":\") between the string and the value .
If the for example the string was like start_time = 1477954800000
I know that by using
start_time\":\"(\w+)/)
I'll get the value.
Any idea on how to get the value when \":\" are involved?
Your sample data looks like a stringified JSON object, if that is the case you should use a JSON parser not a regular expression:
#!perl
use strict;
use warnings;
use feature qw(say);
use JSON;
my $json_string = <DATA>;
chomp($json_string);
my $json_object = decode_json $json_string;
# get the value of the start_time key
say $json_object->{start_time};
# 1477954800000
__DATA__
{"id":1,"start_time":"1477954800000","stop_time":"1485817200000","url":"http://www.example.com"}

File transfer for R extension in NetLogo - filename string with backslash and quotes

I need to use the R extension in NetLogo to do some network calculations. I am creating the network in NetLogo, exporting it to a text file, having R read the text file and construct a graph and calculate properties, then getting the calculation results. The export, read, calculate and get are being controlled by NetLogo through the R extension.
However, NetLogo and R have different default working directories. The problem I have about changing directories in R breaking the connection to the extensions (see R extension breaks connection to extensions directory in NetLogo) is affecting my attempts to use BehaviorSpace on the model.
My new approach is to not change the R working directory, but simply to provide the full path to R of the exported file.
r:clear
let dir pathdir:get-model
r:eval "library(igraph)"
; read network in R (avoid bug of R change working directory)
let runstring (word "r:eval \"gg <- read_graph(file = \"" dir "\\netlogo.gml\", format = \"gml\")\"")
print runstring
run runstring
This produces the correct string to run, output from print statement:
r:eval "gg <- read_graph(file = "C:\Users\Jen\Desktop\Intervention Effect\netlogo.gml", format = "gml")"
But I get an error on the run runstring that this nonstandard character is not allowed. Debugging by putting my constructed string into the run command directly, I have realised it is because I am now in a string environment and have to escape ('\') all my backslashes and quotes. That is, the command that would work if directly typed or included in the NetLogo code, will not work if it is provided as a string to be run.
I haven't yet been able to construct a string to put into the line run runstring that works, even by hand. This means I don't know what the string looks like that I am trying to create. Having identified the appropriate target string, I will need code to take the variable 'dir', convert it to a string, add the various \ characters to the dir, add the various \ characters to the quotes for the rest of the command, and join it so that it runs.
Can anyone provide some bits of this to get me further along?
Still struggling with this
I am now trying to work backwards. Find a string that works and then create it.
If I hard code the run command as follows, NetLogo closes. Even though if I copy the text between the quotes and enter it directly into R, it does what is expected.
let Rstring "gg <- read_graph(file = 'C:\\Users\\Jen\\Desktop\\Intervention Effect\\Networks\\netlogo.gml', format = 'gml')"
r:eval Rstring
The pathdir option ended up working. Here is example code for anyone who has a similar problem in the future.
let filename (word "Networks/netlogo" behaviorspace-run-number ".gml")
export-simple-gml filename
r:clearLocal
let dir pathdir:get-model
set filename (word dir "/" filename)
r:put "fn" filename
r:eval "gg <- read_graph(file = fn, format = 'gml')"
r:eval "V(gg)$name <- V(gg)$id" ; gml uses 'id', but igraph uses 'name'
I have a separate procedure for the actual export, which constructs a simplified gml file because the igraph import of gml format files is somewhat crippled. That's the procedure called within the code above, and the relevant piece is:
to export-simple-gml [ FN ]
carefully [ file-close-all ] [ ]
carefully [ file-delete FN ] [ ]
file-open FN
file-print <line to write to file>
...
end

Scala - proper way to print a string to file

What is the proper way to print a string - and only the string - to file? When I try to do it the standard way known to me, i.e:
def printToFile(o:Object,n:String) = try{
val pathToOutput = "..\\some\\parent\\directory\\"
val path = Paths.get(pathToOutput + n)
val b = new ByteArrayOutputStream()
val os = new ObjectOutputStream(b)
os.writeObject(o)
Files.write(path, b.toByteArray,
StandardOpenOption.CREATE,
StandardOpenOption.TRUNCATE_EXISTING)
}catch{
case _:Exception => println("failed to write")
}
it always seems to prepend
’ NUL ENQtSTXT
Where the part after ENQt seems to vary.
(Doesn't matter if I declare oan Object or a String.)
This is very annoying because I want to print a couple of .dot-Strings (Graphviz) in order to then batch-process the resulting .dot-files to .pdf-files. The prepended nonsense, however, forces me to open each .dot-file and remove it manually - which kind of defeats the purpose of batch-processing them.
This has nothing to do with Scala specifically, it's the way the Java Standard Library works. When you do a writeObject you are writing a Serialized representation of the Object, together with a bunch of additional bytes the JVM can use to re-create that object. If you know the object is a String, then strong-type it (i.e., use printToFile(o:String,n:String) and you can use Files.write(path, o.getBytes, .... Otherwise you could use o.toString.getBytes.
Generally in JVM, if you want to write characters and not bytes, you should prefer *Writer over *OutputStream. In this case (assuming you have a File where you want to write and a String which you want to write):
val writer = new BufferedWriter(new FileWriter(file))
try {
writer.write(string)
} finally {
writer.close()
}
Or with the character-oriented overload of Files.write:
Files.write(path, Collections.singletonList(string), ...)

How do I print a hash in perl?

How do I print $stopwords? It seems to be a string ($) but when I print it I get: "HASH(0x8B694)" with the memory address changing on each run.
I am using Lingua::StopWords and I simply want to print the stop words that it's using so I know for sure what stop words are there. I would like to print these two a file.
Do I need to deference the $stopwords some how?
Here is the code:
use Lingua::StopWords qw( getStopWords );
open(TEST, ">results_stopwords.txt") or die("Unable to open requested file.");
my $stopwords = getStopWords('en');
print $stopwords;
I've tried:
my #temp = $stopwords;
print "#temp";
But that doesn't work. Help!
Last note: I know there is a list of stop words for Lingua::StopWords, but I am using the (en) and I just want to make absolute sure what stop words I am using, so that is why I want to print it and ideally I want to print it to a file which the file part I should already know how to do.
$ doesn't mean string. It means a scalar, which could be a string, number or reference.
$stopwords is a hash reference. To use it as a hash, you would use %$stopwords.
Use Data::Dumper as a quick way to print the contents of a hash (pass by reference):
use Data::Dumper;
...
print Dumper($stopwords);
to dereference a hashref :
%hash = %{$hashref}; # makes a copy
so to iterate over keys values
while(($key,$value)=each%{$hashref}){
print "$key => $value\n";
}
or (less efficient but didactic purpose)
for $key (keys %{$hashref}){
print "$key => $hashref->{$key}\n";
}
Have a look at Data::Printer as a nice alternative to Data::Dumper. It will give you pretty-printed output as well as information on methods which the object provides (if you're printing an object). So, whenever you don't know what you've got:
use Data::Printer;
p( $some_thing );
You'll be surprised at how handy it is.
getStopWords returns a hashref — a reference to a hash — so you would dereference it by prepending %. And you actually only want its keys, not its values (which are all 1), so you would use the keys function. For example:
print "$_\n" foreach keys %$stopwords;
or
print join(' ', keys %$stopwords), "\n";
You can also skip the temporary variable $stopwords, but then you need to wrap the getStopWords call in curly-brackets {...} so Perl can tell what's going on:
print join(' ', keys %{getStopWords('en')}), "\n";

matlab iterative filenames for saving

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?
Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat
You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration
sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.
For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Resources