I am writing a snakemake that will run a bioinformatics pipeline for several input samples. These input files (two for each analysis, one with the partial string match R1 and the second with the partial string match R2) start with a pattern and end with the extension .fastq.gz. Eventually I want to perform multiple operations, though, for this example I just want to align the fastq reads against a reference genome using bwa mem. So for this example my input file is NIPT-N2002394-LL_S19_R1_001.fastq.gz and I want to generate NIPT-N2002394-LL.bam (see code below specifying the directories where input and output are).
My config.yaml file looks like so:
# Run_ID
run: "200311_A00154_0454_AHHHKMDRXX"
# Base directory: the analysis directory from which I will fetch the samples
bd: "/nexusb/nipt/"
# Define the prefix
# will be used to subset the folders in bd
prefix: "NIPT"
# Reference:
ref: "/nexus/bhinckel/19/ONT_projects/PGD_breakpoint/ref_hg19_local/hg19_chr1-y.fasta"
And below is my snakefile
import os
import re
#############
# config file
#############
configfile: "config.yaml"
#######################################
# Parsing variables from config.yaml
#######################################
RUN = config['run']
BD = config['bd']
PREFIX = config['prefix']
FQDIR = f'/nexusb/Novaseq/{RUN}/Unaligned/'
BASEDIR = BD + RUN
SAMPLES = [sample for sample in os.listdir(BASEDIR) if sample.startswith(PREFIX)]
# explanation: in BASEDIR I have multiple subdirectories. The names of the subdirectories starting with PREFIX will be the name of the elements I want to have in the list SAMPLES, which eventually shall be my {sample} wildcard
#############
# RULES
#############
rule all:
input:
expand("aligned/{sample}.bam", sample = SAMPLES)
rule bwa_map:
input:
REF = config['ref'],
R1 = FQDIR + "{sample}_S{s}_R1_001.fastq.gz",
R2 = FQDIR + "{sample}_S{s}_R2_001.fastq.gz"
output:
"aligned/{sample}.bam"
shell:
"bwa mem {input.REF} {input.R1} {input.R2}| samtools view -Sb - > {output}"
But I am getting:
Building DAG of jobs...
WildcardError in line 55 of /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/snakemake/Snakefile:
Wildcards in input files cannot be determined from output files:
's'
When calling snakemake -np
I believe my error lies in the definitions of R1 and R2 in the input directive. I find it puzzling because according to the official documentation snakemake should interpret any wildcard as the regex .+. But it is not doing that for sample NIPT-PearlPPlasma-05-PPx, whose R1 and R2 should be NIPT-PearlPPlasma-05-PPx_S5_R1_001.fastq.gz and NIPT-PearlPPlasma-05-PPx_S5_R2_001.fastq.gz, respectively.
Take a look again at the snakemake tutorial on how input is inferred from output, anyways I think the problem lies in this piece of code:
output:
expand("aligned/{sample}.bam", sample = SAMPLES)
And needs to be changed into
output:
"aligned/{sample}.bam"
What you had didn't work because before expand("aligned/{sample}.bam", sample = SAMPLES) basically becomes a list like this ["aligned/sample0.bam","aligned/sample1.bam"]. When you remove the expand, you only give a "description" of how the output should look like, and thus snakemake can infer the wildcards and input.
edit:
It's difficult to test it since I don't have the actual files, but you should do something like this. Won't work if multiple S-thingies exist.
def get_reads(wildcards):
R1 = FQDIR + f"{wildcards.sample}_S{{s}}_R1_001.fastq.gz"
R2 = FQDIR + f"{wildcards.sample}_S{{s}}_R2_001.fastq.gz"
globbed = glob_wildcards(R1)
R1, R2 = expand([R1, R2], s=globbed.s)
return {"R1": R1, "R2": R2}
rule bwa_map:
input:
unpack(get_reads),
REF = config['ref']
output:
"aligned/{sample}.bam"
shell:
"bwa mem {input.REF} {input.R1} {input.R2}| samtools view -Sb - > {output}"
The problem is here:
rule bwa_map:
input:
REF = config['ref'],
R1 = FQDIR + "{sample}_S{s}_R1_001.fastq.gz",
R2 = FQDIR + "{sample}_S{s}_R2_001.fastq.gz"
output:
"aligned/{sample}.bam"
Your output clearly defines a pattern where {sample} is a wildcard. When Snakemake builds the DAG and finds that any other rule requires a file that matches this pattern, it sets a concrete value to the wildcard.sample. At this moment all the inputs shall be defined, but you are introducing one more level of indirection: the wildcard {s} which is not defined.
The value of {s} shall be clearly inferred from the output. If you can do it in design time, substitute the it with the concrete values, otherwise you may use checkpoint feature of Snakemake.
I have groovy code as below:
def randomInt = RandomUtil.getRandomInt(1,200);
log.info randomInt
def chars = (("1".."9") + ("A".."Z") + ("a".."z")).join()
def randomString = RandomUtil.getRandomString(chars, randomInt) //works well with this code
log.info randomString
evaluate("log.info new Date()")
evaluate('RandomUtil.getRandomString(chars, randomInt)') //got error with this code
I want to evaluate a String which one like a {classname}.{methodname} in SoapUI with Groovy, just like above, but got error here, how to handle this and make it works well as I expect?
I have tried as blew:
evaluate('RandomUtil.getRandomString(chars, randomInt)') //got error with this code
Error As below:
Thu May 23 22:26:30 CST 2019:ERROR:An error occurred [No such property: getRandomString(chars, randomInt) for class: com.hypers.test.apitest.util.RandomUtil], see error log for details
The following code:
log = [info: { println(it) }]
class RandomUtil {
static def random = new Random()
static int getRandomInt(int from, int to) {
from + random.nextInt(to - from)
}
static String getRandomString(alphabet, len) {
def s = alphabet.size()
(1..len).collect { alphabet[random.nextInt(s)] }.join()
}
}
randomInt = RandomUtil.getRandomInt(1, 200)
log.info randomInt
chars = ('a'..'z') + ('A'..'Z') + ('0'..'9')
def randomString = RandomUtil.getRandomString(chars, 10) //works well with this code
log.info randomString
evaluate("log.info new Date()")
evaluate('RandomUtil.getRandomString(chars, randomInt)') //got error with this code
emulates your code, works, and produces the following output when run:
~> groovy solution.groovy
70
DDSQi27PYG
Thu May 23 20:51:58 CEST 2019
~>
I made up the RandomUtil class as you did not include the code for it.
I think the reason you are seeing the error you are seeing is that you define your variables char and randomInt using:
def chars = ...
and
def randomInt = ...
this puts the variables in local script scope. Please see this stackoverflow answer for an explanation with links to documentation of different ways of putting things in the script global scope and an explanation of how this works.
Essentially your groovy script code is implicitly an instance of the groovy Script class which in turn has an implicit Binding instance associated with it. When you write def x = ..., your variable is locally scoped, when you write x = ... or binding.x = ... the variable is defined in the script binding.
The evaluate method uses the same binding as the implicit script object. So the reason my example above works is that I omitted the def and just typed chars = and randomInt = which puts the variables in the script binding thus making them available for the code in the evaluate expression.
Though I have to say that even with all that, the phrasing No such property: getRandomString(chars, randomInt) seems really strange to me...I would have expected No such method or No such property: chars etc.
Sharing the code for for RandomUtil might help here.
I'm trying to create a field mapping to map fields from user-friendly names to member variables in a variety of domain objects. The larger context is that I'm building up an ElasticSearch query based on user-constructed rules stored in a database, but for the sake of MCVE:
class MyClass {
Integer amount = 123
}
target = new MyClass()
println "${target.amount}"
fieldMapping = [
'TUITION' : 'target.amount'
]
fieldName = 'TUITION'
valueSource = '${' + "${fieldMapping[fieldName]}" + '}'
println valueSource
value = Eval.me('valueSource')
The Eval fails. Here's the output:
123
${target.amount}
Caught: groovy.lang.MissingPropertyException: No such property: valueSource for class: Script1
groovy.lang.MissingPropertyException: No such property: valueSource for class: Script1
at Script1.run(Script1.groovy:1)
at t.run(t.groovy:17)
What's necessary to evaluate the generated variable name and return the value 123? It seems like the real problem is that it's not recognizing that valueSource has been defined, not the actual expression held in valueSource, but that could be wring, too.
You're almost there, but you need to use a slightly different mechanism: the GroovyShell. You can instantiate a GroovyShell and use it to evaluate a String as a script, returning the result. Here's your example, modified to work properly:
class MyClass {
Integer amount = 123
}
target = new MyClass()
fieldMapping = [
'TUITION' : 'target.amount'
]
fieldName = 'TUITION'
// These are the values made available to the script through the Binding
args = [target: target]
// Create the shell with the binding as a parameter
shell = new GroovyShell(args as Binding)
// Evaluate the "script", which in this case is just the string "target.amount".
// Inside the shell, "target" is available because you added it to the shell's binding.
result = shell.evaluate(fieldMapping[fieldName])
assert result == 123
assert result instanceof Integer
SOLVED ALREADY --> See edit 7
At this moment I'm fairly new on Perl, and trying to modify part of an existing page (in Wonderdesk).
The way the page works, is that it gets the information from the GET url and parses it to an SQL query.
Since this is part of a much larger system, I'm not able to modify the coding around it, and have to solve it in this script.
A working test I performed:
$input->{help_id} = ['33450','31976'];
When running this, the query that is being build returns something as
select * from table where help_id in(33450,31976)
The part of my code that does not work as expected:
my $callIDs = '33450,31450';
my #callIDs = split(/,/,$callIDs);
my $callIDsearch = \#callIDs;
$input->{help_id} = $callIDsearch;
When running this, the query that is being build returns something as
select * from table where help_id = '33450,31976'
I've tried to debug it, and used Data::Dumper to get the result of $callIDsearch, which appears as [33450, 31450] in my browser.
Can someone give me a hint on how to transform from '123,456' into ['123', '456']?
With kind regards,
Marcel
--===--
Edit:
As requested, minimal code piece that works:
$input->{help_id} = ['123','456']
Code that does not work:
$str = '123,456';
#ids = split(/,/,$str);
$input->{help_id} = \#ids;
--===--
Edit 2:
Source of the question:
The following part of the code is responsible for getting the correct information from the database:
my $input = $IN->get_hash;
my $db = $DB->table('help_desk');
foreach (keys %$input){
if (/^corr/ and !/-opt$/ and $input->{$_} or $input->{keyword}){
$db = $DB->table('help_desk','correspondence');
$input->{rs} = 'DISTINCT help_id,help_name,help_email,help_datetime,help_subject,help_website,help_category,
help_priority,help_status,help_emergency_flag,help_cus_id_fk,help_tech,help_attach';
$input->{left_join} = 1;
last;
}
}
# Do the search
my $sth = $db->query_sth($input);
my $hits = $db->hits;
Now instead of being able to provide a single parameter help_id, I want to be able to provide multiple parameters.
--===--
Edit 3:
query_sth is either of the following two, have not been able to find it out yet:
$COMPILE{query} = __LINE__ . <<'END_OF_SUB';
sub query {
# -----------------------------------------------------------
# $obj->query($HASH or $CGI);
# ----------------------------
# Performs a query based on the options in the hash.
# $HASH can be a hash ref, hash or CGI object.
#
# Returns the result of a query as fetchall_arrayref.
#
my $self = shift;
my $sth = $self->_query(#_) or return;
return $sth->fetchall_arrayref;
}
END_OF_SUB
$COMPILE{query_sth} = __LINE__ . <<'END_OF_SUB';
sub query_sth {
# -----------------------------------------------------------
# $obj->query_sth($HASH or $CGI);
# --------------------------------
# Same as query but returns the sth object.
#
shift->_query(#_)
}
END_OF_SUB
Or
$COMPILE{query} = __LINE__ . <<'END_OF_SUB';
sub query {
# -------------------------------------------------------------------
# Just performs the query and returns a fetchall.
#
return shift->_query(#_)->fetchall_arrayref;
}
END_OF_SUB
$COMPILE{query_sth} = __LINE__ . <<'END_OF_SUB';
sub query_sth {
# -------------------------------------------------------------------
# Just performs the query and returns an active sth.
#
return shift->_query(#_);
}
END_OF_SUB
--===--
Edit 4: _query
$COMPILE{_query} = __LINE__ . <<'END_OF_SUB';
sub _query {
# -------------------------------------------------------------------
# Parses the input, and runs a select based on input.
#
my $self = shift;
my $opts = $self->common_param(#_) or return $self->fatal(BADARGS => 'Usage: $obj->insert(HASH or HASH_REF or CGI) only.');
$self->name or return $self->fatal('NOTABLE');
# Clear errors.
$self->{_error} = [];
# Strip out values that are empty or blank (as query is generally derived from
# cgi input).
my %input = map { $_ => $opts->{$_} } grep { defined $opts->{$_} and $opts->{$_} !~ /^\s*$/ } keys %$opts;
$opts = \%input;
# If build_query_cond returns a GT::SQL::Search object, then we are done.
my $cond = $self->build_query_cond($opts, $self->{schema}->{cols});
if ( ( ref $cond ) =~ /(?:DBI::st|::STH)$/i ) {
return $cond;
}
# If we have a callback, then we get all the results as a hash, send them
# to the callback, and then do the regular query on the remaining set.
if (defined $opts->{callback} and (ref $opts->{callback} eq 'CODE')) {
my $pk = $self->{schema}->{pk}->[0];
my $sth = $self->select($pk, $cond) or return;
my %res = map { $_ => 1 } $sth->fetchall_list;
my $new_results = $opts->{callback}->($self, \%res);
$cond = GT::SQL::Condition->new($pk, 'IN', [keys %$new_results]);
}
# Set the limit clause, defaults to 25, set to -1 for none.
my $in = $self->_get_search_opts($opts);
my $offset = ($in->{nh} - 1) * $in->{mh};
$self->select_options("ORDER BY $in->{sb} $in->{so}") if ($in->{sb});
$self->select_options("LIMIT $in->{mh} OFFSET $offset") unless $in->{mh} == -1;
# Now do the select.
my #sel = ();
if ($cond) { push #sel, $cond }
if ($opts->{rs} and $cond) { push #sel, $opts->{rs} }
my $sth = $self->select(#sel) or return;
return $sth;
}
END_OF_SUB
--===--
Edit 5: I've uploaded the SQL module that is used:
https://www.dropbox.com/s/yz0bq8ch8kdgyl6/SQL.zip
--===--
Edit 6:
On request, the dumps (trimmed to only include the sections for help_id):
The result of the modification in Base.pm for the non-working code:
$VAR1 = [
33450,
31450
];
The result of the modification in Condition.pm for the non-working code:
$VAR1 = [
"help_id",
"IN",
[
33450,
31450
]
];
$VAR1 = [
"cus_username",
"=",
"Someone"
];
$VAR1 = [
"help_id",
"=",
"33450,31450"
];
The result for the modification in Base.pm for the working code:
$VAR1 = [
33450,
31976
];
The result for the modification in Condition.pm for the working code:
$VAR1 = [
"help_id",
"IN",
[
33450,
31976
]
];
It looks as if the value gets changed afterwards somehow :S
All I changed for the working/non-working code was to replace:
$input->{help_id} = ['33450','31976'];
With:
$input->{help_id} = [ split(/,/,'33450,31450') ];
--===--
Edit 7:
After reading all the tips, I decided to start over and found that by writing some logs to files, I could break down into the issue with more details.
I'm still not sure why, but it now works, using the same methods as before. I think it's a typo/glitch/bug in my code somewhere..
Sorry to have bothered you all, but I still recommend the points to go to amon due to his tips providing the breakthrough.
I don't have an answer, but I have found a few critical points where we need to know what is going on.
In build_query_cond (Base.pm line 528), an array argument will be transformed into an key in (...) relation:
if (ref($opts->{$field}) eq 'ARRAY' ) {
my $add = [];
for ( #{$opts->{$field}} ) {
next if !defined( $_ ) or !length( $_ ) or !/\S/;
push #$add, $_;
}
if ( #$add ) {
push #ins, [$field, 'IN', $add];
}
}
Interesting bit in sql (Condition.pm line 181). Even if there is an arrayref, an IN test will be simplified to an = test if it contains only a single element.
if (uc $op eq 'IN' || $op eq '=' and ref $val eq 'ARRAY') {
if (#$val > 1) {
$op = 'IN';
$val = '('
. join(',' => map !length || /\D/ ? quote($_) : $_, #$val)
. ')';
}
elsif (#$val == 0) {
($col, $op, $val) = (qw(1 = 0));
}
else {
$op = '=';
$val = quote($val->[0]);
}
push #output, "$col $op $val";
}
Before these two conditions, it would be interesting to insert the following code:
Carp::cluck(Data::Dumper::Dump(...));
where ... is $opts->{$field} in the first snippet or $cond in the second snippet. The resulting stack trace would allow us to find all subroutines which could have modified the value. For this to work, the following code has to be placed in your main script before starting the query:
use Carp ();
use Data::Dumper;
$Data::Dumper::Useqq = 1; # escape special characters
Once the code has been modified like this, run both the working and not-working code, and print out the resulting query with
print Dumper($result);
So for each of your code snippets, we should get two stack traces and one resulting SQL query.
A shot in the dark... there's a temporary array #callIDs created by this code:
my #callIDs = split(/,/,$callIDs);
my $callIDsearch = \#callIDs;
$input->{help_id} = $callIDsearch;
If some other part of your code modifies #callIDs, even after it's been assigned to $input->{help_id}, that could cause problems. Of course the fact that it's a lexical (my) variable means that any such changes to #callIDs are probably "nearby".
You could eliminate the named temporary array by doing the split like this:
$input->{help_id} = [ split(/,/,$callIDs) ];
I'm not sure I understand exactly why this is happening. It seems that your query builder needs an arrayref of strings. You can use map to do that
my $callIDs = '33450,31450';
my #callIDs = map {$_*1} split(/,/,$callIDs);
$input->{help_id} = \#callIDs;
This code should work
my $callIDs = '33450,31450';
$input->{help_id} = [split ",", $callIDs];
If your code somehow detect your data is number you can use
my $callIDs = '33450,31450';
$input->{help_id} = [map 0+$_, split ',', $callIDs];
If it somehow become number and you need string instead which should not in this case but advice for future work:
my $callIDs = '33450,31450';
$input->{help_id} = [map ''.$_, split ',', $callIDs];
I worked through the Tutorials at eclipse.org/Xtext/documentation and get into expanding these samples. Working with the Domainmodel.xtext sample I generate a Java-Classfile for each entity as stated in the Tut.
The DSL specifies an arbitry number of features, aka class properties:
Entity:
'entity' name = ID
('extends' superType = [Entity | QualifiedName])?
'{'
(features += Feature)*
'}'
;
In DomainmodelGenerator.xtend then I added code to generate a JAVA-classconstructor. The XTEND-Forloop cycles through all arguements - looks like this:
def compile_Constructors(Entity e) '''
public «e.name.toFirstUpper»
(
«FOR f : e.features»
«f.type.fullyQualifiedName» «f.name.toFirstUpper»,
«ENDFOR»
)
{}
'''
Problem
With this the last parameter there is still a comma emitted. How can I get control in XTEND over the loopindex, to make the generator to emit legal JAVA code?
The «FOR» loop has some options which are quite handy:
BEFORE string
SEPARATOR string
AFTER string
These allows you to emit additional strings before, between and after items. If there are no items (empty list) none of them is emitted.
So in your case just use
«FOR f : e.features SEPARATOR ', '»
How about:
def compile_Constructors(Entity e) '''
public «e.name.toFirstUpper»
(
«e.features.map[type.fullyQualifiedName + ' ' + name.toFirstUpper].join(', ')»
)
{}
'''