Nextflow with Azure Batch - Cannot find a matching VM image - azure

While trying to set up Nextflow with Azure Batch (NF-Core), I am getting following error. I tried this on multiple workflows (sarek, ataseq etc.) I get the same error -
N E X T F L O W ~ version 22.04.0
Pulling nf-core/atacseq ...
downloaded from https://github.com/nf-core/atacseq.git
Launching `https://github.com/nf-core/atacseq` [rhl6d5529] DSL1 - revision: 1b3a832db5 [1.2.1]
Downloading plugin nf-azure#0.13.1
----------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/atacseq v1.2.1
----------------------------------------------------
Run Name : rhl6d5529
Data Type : Paired-End
Design File : https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/design.csv
Genome : Not supplied
Fasta File : https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genome.fa
GTF File : https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genes.gtf
Mitochondrial Contig : MT
MACS2 Genome Size : 1.2E+7
Min Consensus Reps : 1
MACS2 Narrow Peaks : No
MACS2 Broad Cutoff : 0.1
Trim R1 : 0 bp
Trim R2 : 0 bp
Trim 3' R1 : 0 bp
Trim 3' R2 : 0 bp
NextSeq Trim : 0 bp
Fingerprint Bins : 100
Save Genome Index : No
Max Resources : 6 GB memory, 2 cpus, 12h time per job
Container : docker - nfcore/atacseq:1.2.1
Output Dir : ./results
Launch Dir : /
Working Dir : /nextflow/atacseq/rhl6d5529
Script Dir : /.nextflow/assets/nf-core/atacseq
User : root
Config Profile : test,azurebatch
Config Description : Minimal test dataset to check pipeline function
Config Contact : Venkat Malladi (#vsmalladi)
Config URL : https://azure.microsoft.com/services/batch/
----------------------------------------------------
Uploading local `bin` scripts folder to az://nextflow/atacseq/rhl6d5529/tmp/66/bd55d79e42999df38ba04a81c3aa04/bin
[- ] process > CHECK_DESIGN -
[- ] process > CHECK_DESIGN [ 0%] 0 of 1
[- ] process > CHECK_DESIGN [ 0%] 0 of 1
Error executing process > 'CHECK_DESIGN (design.csv)'
Caused by:
Cannot find a matching VM image with publisher=microsoft-azure-batch; offer=centos-container; OS type=linux; verification type=verified
[58/55b7f7] process > CHECK_DESIGN (design.csv) [100%] 1 of 1, failed: 1
Error executing process > 'CHECK_DESIGN (design.csv)'
Caused by:
Cannot find a matching VM image with publisher=microsoft-azure-batch; offer=centos-container; OS type=linux; verification type=verified
I tried looking into the source code of nextflow. I found the error to be in AzBatchService.groovy (line number below).
https://github.com/nextflow-io/nextflow/blob/0e593e6ab82880810d8139a4fe6e3c47ff69a531/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L442
I did some further digging in my Azure Batch account instance. Basically, I wanted to confirm if the list of supported images being received from the Azure Batch account has the one that is required for this pipeline. I could confirm that the server did indeed respond with the required image -
What could be the issue here? I remember running the exact same pipeline a few weeks back and it did work a few times. Am I missing something?

Just had another look through the Azure Cloud docs and think this might be relevant:
By default, Nextflow creates CentOS 8-based pool nodes, but this
behavior can be customised in the pool configuration. Below the
configurations for image reference/SKU combinations to select two
popular systems.
Ubuntu 20.04:
sku = "batch.node.ubuntu 20.04"
offer = "ubuntu-server-container"
publisher = "microsoft-azure-batch"
CentOS 8 (default):
sku = "batch.node.centos 8"
offer = "centos-container"
publisher = "microsoft-azure-batch"
I think the issue here is a mismatched nodeAgentSkuId. Nextflow is expecting a CentOS 8 node agent SKU, but you have a CentOS 7 SKU. If it's not possible to change the nodeAgentSkuId somehow, the node agent SKU that Nextflow uses should be able to be overridden by adding this to your nextflow.config:
azure.batch.pools.<name>.sku = 'batch.node.centos 7'
Where <name> is the pool identifier:
azure.batch.pools.<name>.sku
Specify the ID of the Compute Node agent SKU which the pool identified with <name> supports (default: batch.node.centos 8, requires nf-azure#0.11.0).
https://www.nextflow.io/docs/edge/azure.html#advanced-settings

Related

Yocto bitbake core-image-sato with preempt-rt failed

I want to set up a linux kernel with preempt-RT with yocto.
According to meta/recipes-rt/README, I add the following code in build/conf/local.conf, and do bitbake core-image-sato, but bitbake fail.
MACHINE ?= "genericx86-64"
PREFERRED_PROVIDER_virtual/kernel = "linux-yocto-rt"
COMPATIBLE_MACHINE_genericx86-64 = "genericx86-64"
COMPATIBLE_MACHINE_quilt-native = "genericx86-64"
Yocto output the following error:
NOTE: Bitbake server didn't start within 5 seconds, waiting for 90
Loading cache: 100% |#######################################################################################################################################################################| Time: 0:00:13
Loaded 1330 entries from dependency cache.
NOTE: Resolving any missing task queue dependencies
Build Configuration:
BB_VERSION = "1.46.0"
BUILD_SYS = "x86_64-linux"
NATIVELSBSTRING = "universal"
TARGET_SYS = "x86_64-poky-linux"
MACHINE = "genericx86-64"
DISTRO = "poky"
DISTRO_VERSION = "3.1.20"
TUNE_FEATURES = "m64 core2"
TARGET_FPU = ""
meta
meta-poky
meta-yocto-bsp = "dunfell:90a6f6a110ab14890e2f6a1616e74ee259fc0f8f"
Initialising tasks: 100% |##################################################################################################################################################################| Time: 0:00:48
Sstate summary: Wanted 14 Found 0 Missed 14 Current 1203 (0% match, 98% complete)
NOTE: Executing Tasks
ERROR: linux-yocto-rt-5.4.213+gitAUTOINC+2f18e629f7_03cd66d981-r0 do_kernel_metadata: Could not locate BSP definition for genericx86-64/preempt-rt and no defconfig was provided
ERROR: linux-yocto-rt-5.4.213+gitAUTOINC+2f18e629f7_03cd66d981-r0 do_kernel_metadata: Execution of '/media/fff/disk1T/yocto/demo3/poky/build/tmp/work/genericx86_64-poky-linux/linux-yocto-rt/5.4.213+gitAUTOINC+2f18e629f7_03cd66d981-r0/temp/run.do_kernel_metadata.1138429' failed with exit code 1
ERROR: Logfile of failure stored in: /media/fff/disk1T/yocto/demo3/poky/build/tmp/work/genericx86_64-poky-linux/linux-yocto-rt/5.4.213+gitAUTOINC+2f18e629f7_03cd66d981-r0/temp/log.do_kernel_metadata.1138429
Log data follows:
| DEBUG: Executing python function extend_recipe_sysroot
| NOTE: Direct dependencies are ['/media/fff/disk1T/yocto/demo3/poky/meta/recipes-kernel/kern-tools/kern-tools-native_git.bb:do_populate_sysroot']
| NOTE: Installed into sysroot: []
| NOTE: Skipping as already exists in sysroot: ['kern-tools-native', 'quilt-native']
| DEBUG: Python function extend_recipe_sysroot finished
| DEBUG: Executing shell function do_kernel_metadata
| NOTE: do_kernel_metadata: for summary/debug, set KCONF_AUDIT_LEVEL > 0
| ERROR: Could not locate BSP definition for genericx86-64/preempt-rt and no defconfig was provided
| WARNING: exit code 1 from a shell command.
| ERROR: Execution of '/media/fff/disk1T/yocto/demo3/poky/build/tmp/work/genericx86_64-poky-linux/linux-yocto-rt/5.4.213+gitAUTOINC+2f18e629f7_03cd66d981-r0/temp/run.do_kernel_metadata.1138429' failed with exit code 1
ERROR: Task (/media/fff/disk1T/yocto/demo3/poky/meta/recipes-kernel/linux/linux-yocto-rt_5.4.bb:do_kernel_metadata) failed with exit code '1'
NOTE: Tasks Summary: Attempted 3173 tasks of which 3172 didn't need to be rerun and 1 failed.
Summary: 1 task failed:
/media/fff/disk1T/yocto/demo3/poky/meta/recipes-kernel/linux/linux-yocto-rt_5.4.bb:do_kernel_metadata
Summary: There were 2 ERROR messages shown, returning a non-zero exit code.
My hardware cpu is x86-64 core. It hint genericx86-64/preempt-rt dont exist, what action should I adapt to generate core-image-sato with preempt-RT? Please leave a comment and help me if you are similar with this problem.
I try to checkout various branch of yocto but doesn't matter.These dunfell、langdale、kirkstone.I expect to use dunfell.
Following is my build/conf/bblayers.conf:
POKY_BBLAYERS_CONF_VERSION = "2"
BBPATH = "${TOPDIR}"
BBFILES ?= ""
BBLAYERS ?= " \
/media/fff/disk1T/yocto/demo3/poky/meta \
/media/fff/disk1T/yocto/demo3/poky/meta-poky \
/media/fff/disk1T/yocto/demo3/poky/meta-yocto-bsp \
"

snakemake allocates memory twice

I am noticing that all my rules request memory twice, one at a lower maximum than what I requested (mem_mb) and then what I actually requested (mem_gb). If I run the rules as localrules they do run faster. How can I make sure the default settings do not interfere?
resources: mem_mb=100, disk_mb=8620, tmpdir=/tmp/pop071.54835, partition=h24, qos=normal, mem_gb=100, time=120:00:00
The rules are as follows:
rule bwa_mem2_mem:
input:
R1 = "data/results/qc/{species}.{population}.{individual}_1.fq.gz",
R2 = "data/results/qc/{species}.{population}.{individual}_2.fq.gz",
R1_unp = "data/results/qc/{species}.{population}.{individual}_1_unp.fq.gz",
R2_unp = "data/results/qc/{species}.{population}.{individual}_2_unp.fq.gz",
idx= "data/results/genome/genome",
ref = "data/results/genome/genome.fa"
output:
bam = "data/results/mapped_reads/{species}.{population}.{individual}.bam",
log:
bwa ="logs/bwa_mem2/{species}.{population}.{individual}.log",
sam ="logs/samtools_view/{species}.{population}.{individual}.log",
benchmark:
"benchmark/bwa_mem2_mem/{species}.{population}.{individual}.tsv",
resources:
time = parameters["bwa_mem2"]["time"],
mem_gb = parameters["bwa_mem2"]["mem_gb"],
params:
extra = parameters["bwa_mem2"]["extra"],
tag = compose_rg_tag,
threads:
parameters["bwa_mem2"]["threads"],
shell:
"bwa-mem2 mem -t {threads} -R '{params.tag}' {params.extra} {input.idx} {input.R1} {input.R2} | "
"samtools sort -l 9 -o {output.bam} --reference {input.ref} --output-fmt CRAM -# {threads} /dev/stdin 2> {log.sam}"
and the config is:
cluster:
mkdir -p logs/{rule} && # change the log file to logs/slurm/{rule}
sbatch
--partition={resources.partition}
--time={resources.time}
--qos={resources.qos}
--cpus-per-task={threads}
--mem={resources.mem_gb}
--job-name=smk-{rule}-{wildcards}
--output=logs/{rule}/{rule}-{wildcards}-%j.out
--parsable # Required to pass job IDs to scancel
default-resources:
- partition=h24
- qos=normal
- mem_gb=100
- time="04:00:00"
restart-times: 3
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 100
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True # Required to run with local conda enviroment
cluster-status: status-sacct.sh # Required to monitor the status of the submitted jobs
cluster-cancel: scancel # Required to cancel the jobs with Ctrl + C
cluster-cancel-nargs: 50
Cheers,
Angel
Right now there are two separate memory resource requirements:
mem_mb
mem_gb
From the perspective of snakemake these are different, so both will be passed to the cluster. A quick fix is to use the same units, e.g. if the resource really requires only 100 mb, then the default resource should be changed to:
default-resources:
- partition=h24
- qos=normal
- mem_mb=100

Fail2ban custom regular expresion

Trying to add custom rule on regular expression in order to block the below log.
Mar 17 18:46:52 s21409974 named[1577]: client #0x7g246c107030 1.1.1.1#8523 (.): query (cache) './ANY/IN' denied
I did tried with online tools like this one (https://www.regextester.com) but on the fail2ban-regex test command does display like it miss it.
Any suggestion about the rule or about how to better troubleshoot?
Thank in advance
Why do you try to write a custom regex? This message is pretty well matching with original fail2ban filter named-refused:
$ msg="Mar 17 18:46:52 s21409974 named[1577]: client #0x7g246c107030 1.1.1.1#8523 (.): query (cache) './ANY/IN' denied"
$ fail2ban-regex "$msg" named-refused
Running tests
=============
Use failregex filter file : named-refused
Use single line : Mar 17 18:46:52 s21409974 named[1577]: client #0x7...
Results
=======
Prefregex: 1 total
| ^(?:\s*\S+ (?:(?:\[\d+\])?:\s+\(?named(?:\(\S+\))?\)?:?|\(?named(?:\(\S+\))?\)?:?(?:\[\d+\])?:)\s+)?(?: error:)?\s*client(?: #\S*)? (?:\[?(?:(?:::f{4,6}:)?(?P<ip4>(?:\d{1,3}\.){3}\d{1,3})|(?P<ip6>(?:[0-9a-fA-F]{1,4}::?|::){1,7}(?:[0-9a-fA-F]{1,4}|(?<=:):)))\]?|(?P<dns>[\w\-.^_]*\w))#\S+(?: \([\S.]+\))?: (?P<content>.+)\s(?:denied|\(NOTAUTH\))\s*$
`-
Failregex: 1 total
|- #) [# of hits] regular expression
| 1) [1] ^(?:view (?:internal|external): )?query(?: \(cache\))?
`-
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [1] {^LN-BEG}(?:DAY )?MON Day %k:Minute:Second(?:\.Microseconds)?(?: ExYear)?
`-
Lines: 1 lines, 0 ignored, 1 matched, 0 missed
[processed in 0.01 sec]
But if you need it, here you go (regex interpolated from fail2ban's pref- & failregex):
^\s*\S+\s+named\[\d+\]: client(?: #\S*)? <ADDR>#\S+(?: \([\S.]+\))?: (?:view (?:internal|external): )?query(?: \(cache\))? '[^']+' denied
replace <ADDR> with <HOST> if your fail2ban version is smaller than 0.10.

Setting up Sphinx Search

I've been looking for tutorials online to setup sphinx search, and I have got the test database working. However I am having trouble getting my own database to work.
sphinx.conf
source src1
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = MyPassword
sql_db = MyDatabase
sql_port = 3306
sql_query = \
SELECT listing_id, title, description, image_id \
FROM listings
sql_attr_uint = listing_id
sql_query_info = SELECT listing_id, title, description, image_id FROM listings
}
index test1
{
source = src1
path = /var/lib/sphinxsearch/data/test1
docinfo = extern
charset_type = sbcs
}
searchd
{
listen = 9312
log = /var/log/sphinxsearch/searchd.log
However when I try and run:
sudo indexer --all --rotate
The output in putty is:
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
WARNING: attribute 'listing_id' not found - IGNORING
WARNING: Attribute count is 0: switching to none docinfo
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 24576 kb
collected 3 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 3 docs, 49 bytes
total 0.002 sec, 16740 bytes/sec, 1024.94 docs/sec
total 2 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 6 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=911)
However, when I try and run "search df" for example, I get:
Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
FATAL: 'sql_query_info' value must contain '$id'
I am running Sphinx Search on Ubuntu 14.04 using an account called "user" which is part of the sudoers file.
I have lost my mind with this, so would appreciate someones help.
Thanks
Your sql_query_info is invalid. It needs to contain $id, as the message says.
However, would highly recommend not using search tool - its broken. Skip it. (articles recommending its use, are outdated) - sql_query_info is only used by search.
Move right on to starting searchd, and use test.php if dont have an application yet. Using test.php to test your index is MUCH better.

Computing eigenvalues in parallel for a large matrix

I am trying to compute the eigenvalues of a big matrix on matlab using the parallel toolbox.
I first tried:
A = rand(10000,2000);
A = A*A';
matlabpool open 2
spmd
C = codistributed(A);
tic
[V,D] = eig(C);
time = gop(#max, toc) % Time for all labs in the pool to complete.
end
matlabpool close
The code starts its execution:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
But, after few minutes, I got the following error:
Error using distcompserialize
Out of Memory during serialization
Error in spmdlang.RemoteSpmdExecutor/initiateComputation (line 82)
fcns = distcompMakeByteBufferHandle( ...
Error in spmdlang.spmd_feval_impl (line 14)
blockExecutor.initiateComputation();
Error in spmd_feval (line 8)
spmdlang.spmd_feval_impl( varargin{:} );
I then tried to apply what I saw on tutorial videos from the parallel toolbox:
>> job = createParallelJob('configuration', 'local');
>> task = createTask(job, #eig, 1, {A});
>> submit(job);
waitForState(job, 'finished');
>> results = getAllOutputArguments(job)
>> destroy(job);
But after two hours computation, I got:
results =
Empty cell array: 2-by-0
My computer has 2 Gi memory and intel duoCPU (2*2Ghz)
My questions are the following:
1/ Looking at the first error, I guess my memory is not sufficient for this problem. Is there a way I can divide the input data so that my computer can handle this matrix?
2/ Why is the second result I get empty? (after 2 hours computation...)
EDIT: #pm89
You were right, an error occurred during the execution:
job =
Parallel Job ID 3 Information
=============================
UserName : bigTree
State : finished
SubmitTime : Sun Jul 14 19:20:01 CEST 2013
StartTime : Sun Jul 14 19:20:22 CEST 2013
Running Duration : 0 days 0h 3m 16s
- Data Dependencies
FileDependencies : {}
PathDependencies : {}
- Associated Task(s)
Number Pending : 0
Number Running : 0
Number Finished : 2
TaskID of errors : [1 2]
- Scheduler Dependent (Parallel Job)
MaximumNumberOfWorkers : 2
MinimumNumberOfWorkers : 1

Resources