I have working code, but not sure is this correct way to implement it.
CLi program will accept 4 argument, there can be present zero, all or any combination in between.
This is working code:
#[derive(Parser, Default, Debug)]
struct Arguments {
#[clap(short, long, default_value_t = false)]
/// if areas.csv to download
areas: bool,
#[clap(short, long, default_value_t = false)]
/// if markers.csv to download
markers: bool,
#[clap(short, long, default_value_t = false)]
/// if tracks.csv to download
tracks: bool,
#[clap(short, long)]
/// path to file of GPS tracks to download
gpx_list_file: Option<String>,
}
what I do not like is that in -h section there is no way to know that they are optional.
Options:
-a, --areas if areas.csv to download
-m, --markers if markers.csv to download
-t, --tracks if tracks.csv to download
-g, --gpx-list-file <GPX_LIST_FILE> path to file of GPS tracks to download
-h, --help Print help information
-V, --version Print version information
I know that I could add text like "This is optional field", but I have feeling that there is better way.
Also what is best way to know if I have 0 argument.
Currently I use this.
if (args.areas, args.markers, args.tracks, args.gpx_list_file.is_none()) == (false, false, false, true) {
println!("Nothing to download.");
exit(5);
}
It's perhaps a bit confusing. When a named argument is optional, you don't see anything telling you that specifically; you just get something like this:
Usage: command [OPTIONS]
Options:
-a, --areas if areas.csv to download
-n, --number <NUMBER> [default: 1]
-h, --help Print help information
-V, --version Print version information
When --areas is required, the help output now includes --areas in the first line:
Usage: command [OPTIONS] --areas
Options:
-a, --areas if areas.csv to download
-n, --number <NUMBER> [default: 1]
-h, --help Print help information
-V, --version Print version information
The presence of --areas in the Usage section lets you know it's required, and its absence tells you it's optional.
Related
I have a nextflow script that runs a couple of processes on a single vcf file. The name of the file is 'bos_taurus.vcf' and it is located in the directory /input_files/bos_taurus.vcf. The directory input_files/ contains also another file 'sacharomyces_cerevisea.vcf'. I would like my nextflow script to process both files. I was trying to use a glob pattern like ch_1 = channel.fromPath("/input_files/*.vcf"), but sadly I can't find a working solution. Any help would be really appreciated.
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
// here I tried to use globbing
params.input_files = "/mnt/c/Users/Lenovo/Desktop/STUDIA/BIOINFORMATYKA/SEMESTR_V/PRACOWNIA_INFORMATYCZNA/nextflow/projekt/input_files/*.vcf"
params.results_dir = "/mnt/c/Users/Lenovo/Desktop/STUDIA/BIOINFORMATYKA/SEMESTR_V/PRACOWNIA_INFORMATYCZNA/nextflow/projekt/results"
file_channel = Channel.fromPath( params.input_files, checkIfExists: true )
// how can I make this process work on two files simultanously
process FILTERING {
publishDir("${params.results_dir}/after_filtering", mode: 'copy')
input:
path(input_files)
output:
path("*")
script:
"""
vcftools --vcf ${input_files} --mac 1 --minQ 20 --recode --recode-INFO-all --out after_filtering.vcf
"""
}
Note that if your VCF files are actually bgzip compressed and tabix indexed, you could instead use the fromFilePairs factory method to create your input channel. For example:
params.vcf_files = "./input_files/*.vcf.gz{,.tbi}"
params.results_dir = "./results"
process FILTERING {
tag { sample }
publishDir("${params.results_dir}/after_filtering", mode: 'copy')
input:
tuple val(sample), path(indexed_vcf)
output:
tuple val(sample), path("${sample}.filtered.vcf")
"""
vcftools \\
--vcf "${indexed_vcf.first()}" \\
--mac 1 \\
--minQ 20 \\
--recode \\
--recode-INFO-all \\
--out "${sample}.filtered.vcf"
"""
}
workflow {
vcf_files = Channel.fromFilePairs( params.vcf_files, checkIfExists: true )
FILTERING( vcf_files ).view()
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [thirsty_torricelli] DSL2 - revision: 8f69ad5638
executor > local (3)
[7d/dacad6] process > FILTERING (C) [100%] 3 of 3 ✔
[A, /path/to/work/84/f9f00097bcd2b012d3a5e105b9d828/A.filtered.vcf]
[B, /path/to/work/cb/9f6f78213f0943013990d30dbb9337/B.filtered.vcf]
[C, /path/to/work/7d/dacad693f06025a6301c33fd03157b/C.filtered.vcf]
Note that BCFtools is actively maintained and is intended as a replacement for VCFtools. In a production pipeline, BCFtools should be preferred.
Here is a little example for starters. First, you should specify a unique output name in each process. Currently, after_filtering.vcf is hardcoded so this will overwrite each other once copied to the publishDir. You can do that with the baseName operator as below and permanently store it in the input file channel, first element being the sample name and second one the actual file. I made an example process that just runs head on the vcf, you can then adapt as needed for what you actually need.
#! /usr/bin/env nextflow
nextflow.enable.dsl = 2
params.input_files = "/Users/atpoint/vcf/*.vcf"
params.results_dir = "/Users/atpoint/vcf/"
// A channel that contains a map with sample name and the file itself
file_channel = Channel.fromPath( params.input_files, checkIfExists: true )
.map { it -> [it.baseName, it] }
// An example process just head-ing the vcf
process VcfHead {
publishDir("${params.results_dir}/after_filtering", mode: 'copy')
input:
tuple val(name), path(vcf_in)
output:
path("*_head.vcf")
script:
"""
head -n 1 $vcf_in > ${name}_head.vcf
"""
}
// Run it
workflow {
VcfHead(file_channel)
}
The file_channel channel looks like this if you add a .view() to it:
[one, /Users/atpoint/vcf/one.vcf]
[two, /Users/atpoint/vcf/two.vcf]
I've got a Nextflow process that looks like:
process my_app {
publishDir "${outdir}/my_app", mode: params.publish_dir_mode
input:
path input_bam
path input_bai
val output_bam
val max_mem
val threads
val container_home
val outdir
output:
tuple env(output_prefix), path("${output_bam}"), path("${output_bam}.bai"), emit: tuple_ch
shell:
'''
my_script.sh \
!{input_bam} \
!{output_bam} \
!{max_mem} \
!{threads}
output_prefix=$(echo !{output_bam} | sed "s#.bam##")
'''
}
This process is only creating two .bam .bai files but my_script.sh is also creating other .vcf that are not being published in the output directory.
I tried it by doing in order to retrieve the files created by the script but without success:
output:
tuple env(output_prefix), path("${output_bam}"), path("${output_bam}.bai"), path("${output_prefix}.*.vcf"), emit: mt_validation_simulation_tuple_ch
but in logs I can see:
Error executing process caused by:
Missing output file(s) `null.*.vcf` expected by process `my_app_wf:my_app`
What I am missing? Could you help me? Thank you in advance!
The problem is that the output_prefix has only been defined inside of the shell block. If all you need for your output prefix is the file's basename (without extension), you can just use a regular script block to check file attributes. Note that variables defined in the script block (but outside the command string) are global (within the process scope) unless they're defined using the def keyword:
process my_app {
...
output:
tuple val(output_prefix), path("${output_bam}{,.bai}"), path("${output_prefix}.*.vcf")
script:
output_prefix = output_bam.baseName
"""
my_script.sh \\
"${input_bam}" \\
"${output_bam}" \\
"${max_mem}" \\
"${threads}"
"""
}
If the process creates the BAM (and index) it might even be possible to refactor away the multiple input channels if an output prefix can be supplied up front. Usually this makes more sense, but I don't have enough details to say one way or the other. The following might suffice as an example; you may need/prefer to combine/change the output declaration(s) to suit, but hopefully you get the idea:
params.publish_dir = './results'
params.publish_mode = 'copy'
process my_app {
publishDir "${params.publish_dir}/my_app", mode: params.publish_mode
cpus 1
memory 1.GB
input:
tuple val(prefix), path(indexed_bam)
output:
tuple val(prefix), path("${prefix}.bam{,.bai}"), emit: bam_files
tuple val(prefix), path("${prefix}.*.vcf"), emit: vcf_files
"""
my_script.sh \\
"${indexed_bam.first()}" \\
"${prefix}.bam" \\
"${task.memory.toGiga()}G" \\
"${task.cpus}"
"""
}
Note that the indexed_bam expects a tuple in the form: tuple(bam, bai)
My intention is to create a language-learning project that uses gettext() for its translations, showing each string in both the user's primary language and a secondary target language.
I am new to Rust and also to GNU gettext(). I am using gettext-rs which appears to be a Rust foreign function interface that wraps the C gettext() implementation fairly directly. I don't believe my problem is Rust-specific but I haven't tested another language yet. I am using Ubuntu 20.04.
It appears that the examples in gettext() documentation suggest that setlocale() is not required/advised, but my translations don't appear to work without a call to it. Furthermore, the locale string for the setlocale() function doesn't appear to be obeyed: the system locale is used instead.
Perhaps it is an inefficient approach, but I was first going to test a proof of concept for my project by switching locales with setlocale() to generate two different translations for the same msgid between gettext() calls. It appears that, because setlocale() doesn't obey the passed-in locale string, this approach is not working.
After installing Rust, I created a new project using the Terminal:
cd /home/timotheos/dev/rust/
cargo new two-locales
I updated two-locales/Cargo.toml to:
[package]
name = "two-locales"
version = "0.1.0"
edition = "2021"
[dependencies]
gettext-rs = "0.7.0"
gettext-sys = "0.21.3"
I updated two-locales/src/main.rs to:
extern crate gettext_sys as ffi;
use std::ffi::CStr;
use std::ptr;
use gettextrs::LocaleCategory;
// gettext_rs doesn't currently expose a way to call setlocale() with a null parameter.
// The plan is to later open a pull request that adds this function to their getters.rs:
pub fn getlocale(category: LocaleCategory) -> Option<Vec<u8>> {
unsafe {
let result = ffi::setlocale(category as i32, ptr::null());
if result.is_null() {
None
} else {
Some(CStr::from_ptr(result).to_bytes().to_owned())
}
}
}
fn main() {
let new_locale = "en_GB.UTF-8";
let domain_name = "two-locales";
let locale_directory = "/home/timotheos/dev/rust/two-locales";
// Specify the name of the .mo file to use, and where to find it:
let locale_directory_path = std::path::PathBuf::from(locale_directory);
let result_path = gettextrs::bindtextdomain(domain_name, locale_directory);
if result_path.is_err() {
println!("bindtextdomain() didn't work: {:?}", result_path);
}
else {
let result_path = result_path.unwrap();
if locale_directory_path != result_path {
println!("bindtextdomain() worked but the output path didn't match: {:?}", result_path);
}
}
if gettextrs::textdomain(domain_name).is_err() {
println!("textdomain() didn't work");
}
// Ask gettext for UTF-8 strings:
let result_charset = gettextrs::bind_textdomain_codeset(domain_name, "UTF-8");
if result_charset.is_err() {
println!("bind_textdomain_codeset() didn't work: {:?}", result_charset);
}
let current_locale = getlocale(LocaleCategory::LcAll);
let locale_str = String::from_utf8(current_locale.unwrap()).unwrap();
println!("Current locale is {:?}", locale_str);
use gettextrs::*;
// This does not translate because the locale has not been set:
println!("{}", gettext("Hello (ID)"));
println!("Setting locale to {:?}", new_locale);
let new_locale = setlocale(LocaleCategory::LcAll, new_locale.as_bytes().to_vec());
if new_locale.is_some() {
let new_locale = String::from_utf8(new_locale.unwrap()).unwrap();
println!("setlocale() set the locale to {:?}", new_locale);
} else {
println!("setlocale() failed: try seeing if the specified locale is in `locale -a`");
}
// This does translate, but it is using system locale ("en_AU.UTF-8" and not the specified locale):
println!("{}", gettext("Hello (ID)"));
}
I then generated the .po files for both en_AU and en_GB for testing:
cd two-locales
cargo install xtr
find . -name "*.rs" -exec xtr {} \; # Create messages.po from main.rs
cp messages.po messages_en_AU.po
mv messages.po messages_en_GB.po
I modified the contents (only changing the msgstr) of messages_en_AU.po to:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL#ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2022-08-13 05:49+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL#ADDRESS>\n"
"Language-Team: LANGUAGE <LL#li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: ./src/main.rs:21
msgid "Hello (ID)"
msgstr "Hello (translated AU)"
and the contents (only changing the msgstr) of messages_en_GB.po to:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL#ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2022-08-13 05:49+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL#ADDRESS>\n"
"Language-Team: LANGUAGE <LL#li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: ./src/main.rs:21
msgid "Hello (ID)"
msgstr "Hello (translated GB)"
then made these into .mo files in the directory:
mkdir en_AU/
mkdir en_AU/LC_MESSAGES/
msgfmt -v messages_en_AU.po -o en_AU/LC_MESSAGES/two-locales.mo
mkdir en_GB/
mkdir en_GB/LC_MESSAGES/
msgfmt -v messages_en_GB.po -o en_GB/LC_MESSAGES/two-locales.mo
so the final relevant directory structure is:
src/main.rs
Cargo.toml
en_AU/LC_MESSAGES/two-locales.mo
en_GB/LC_MESSAGES/two-locales.mo
and then I ran my software using:
cargo run
The output of this was:
Compiling two-locales v0.1.0 (/home/timotheos/dev/rust/two-locales)
Finished dev [unoptimized + debuginfo] target(s) in 0.56s
Running `target/debug/two-locales`
Current locale is "C"
Hello (ID)
Setting locale to "en_GB.UTF-8"
setlocale() set the locale to "en_GB.UTF-8"
Hello (translated AU)
As seen from the output, the setlocale() call is required before gettext("Hello (ID)") will translate its passed-in string. When setlocale() is called, it does translate the string, but it grabs it from the en_AU/LC_MESSAGES/two-locales.mo file instead of the en_GB/LC_MESSAGES/two-locales.mo as I would expect.
Why doesn't my approach work? Is there a bug in setlocale() or am I missing something to make it correctly switch to the locale string specified?
I assume there might be some caching involved and setlocale() switching is not advised anyway. If my approach is incorrect, what is the best strategy for accessing GNU gettext's .po or .mo files with multiple languages concurrently, ideally efficiently?
Any improvements to my Rust code are also welcomed.
For clarity, my system locale is en_AU. If I run locale in the Terminal then it outputs:
LANG=en_AU.UTF-8
LANGUAGE=en_AU:en
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=
and if I run locale -a in the Terminal then it outputs:
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
fr_BE.utf8
fr_CA.utf8
fr_CH.utf8
fr_FR.utf8
fr_LU.utf8
POSIX
so I assume en_GB.UTF-8 should be a valid locale to set using setlocale() (and it did output as if it succeeded).
You absolutely need setlocale() for switching locales. There is no other way. You have two options for that. In C they look like this:
setlocale(LC_MESSAGES, "fr_FR.UTF-8");
Or:
putenv("LC_ALL=fr_FR.UTF-8");
putenv("LANG=fr_FR.UTF-8");
putenv("LANGUAGE=fr_FR.UTF-8");
setlocale(LC_MESSAGES, ""):
None of these approaches is thread-safe because setlocale() is not thread-safe (the xlocale API is thread-safe).
Getting this to work platform-independently is a major nightmare because locale identifiers are not portable. My version above (lowercase language, uppercase region, uppercase codeset) works reasonably well across platforms, also on systems using GNU libc like yours.
I am using the rust Clap library to parse command line arguments. When displaying my help text I want to separate required arguments from optional arguments and put them under separate headings. Something along the lines of this:
HELP:
Example header 1:
Arg 1
Arg 2
Example header 2:
Arg 3
Arg 4
Is this possible.
After reading this, this and this I think it might be but I am not confident of how to go about doing so.
EDIT:
So a commentor has asked me to update the post with some desired output so below is an example from one of the links above. I would like to be able to have two options sections and name them.
$ myprog --help
My Super Program 1.0
Kevin K. <kbknapp#gmail.com>
Does awesome things
USAGE:
MyApp [FLAGS] [OPTIONS] <INPUT> [SUBCOMMAND]
FLAGS:
-h, --help Prints this message
-v Sets the level of verbosity
-V, --version Prints version information
OPTIONS:
-c, --config <FILE> Sets a custom config file
ARGS:
INPUT The input file to use
SUBCOMMANDS:
help Prints this message
test Controls testing features
So changing the OPTIONS section above to be:
OPTIONS-1:
-c, --config <FILE> Sets a custom config file.
OPTIONS-2:
-a, --another <FILE> Another example command.
I think you might be looking for help_heading. It seems this has been added recently, so you'll have to grab the very latest commit.
cargo.toml
[dependencies]
clap = { git = "https://github.com/clap-rs/clap", rev = "8145717" }
main.rs
use clap::Clap;
#[derive(Clap, Debug)]
#[clap(
name = "My Application",
version = "1.0",
author = "Jason M.",
about = "Stack Overflow"
)]
struct Opts {
#[clap(
help_heading = Some("OPTIONS-1"),
short,
long,
value_name="FILE",
about = "Sets a custom config file"
)]
config: String,
#[clap(
help_heading = Some("OPTIONS-2"),
short,
long,
value_name="FILE",
about = "Another example command"
)]
another: String,
}
fn main() {
let opts: Opts = Opts::parse();
}
use clap::{App, Arg};
fn main() {
let app = App::new("My Application")
.version("1.0")
.author("Jason M.")
.about("Stack Overflow")
.help_heading("OPTIONS-1")
.arg(
Arg::new("config")
.short('c')
.long("config")
.value_name("FILE")
.about("Sets a custom config file"),
)
.help_heading("OPTIONS-2")
.arg(
Arg::new("another")
.short('a')
.long("another")
.value_name("FILE")
.about("Another example command"),
);
app.get_matches();
}
Either of the above will generate the following upon running cargo run -- --help:
My Application 1.0
Jason M.
Stack Overflow
USAGE:
clap_headings --config <FILE> --another <FILE>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS-1:
-c, --config <FILE> Sets a custom config file
OPTIONS-2:
-a, --another <FILE> Another example command
I'm trying to use the groovy CliBuilder to parse command line options. I'm trying to use multiple long options without a short option.
I have the following processor:
def cli = new CliBuilder(usage: 'Generate.groovy [options]')
cli.with {
h longOpt: "help", "Usage information"
r longOpt: "root", args: 1, type: GString, "Root directory for code generation"
x args: 1, type: GString, "Type of processor (all, schema, beans, docs)"
_ longOpt: "dir-beans", args: 1, argName: "directory", type: GString, "Custom location for grails bean classes"
_ longOpt: "dir-orm", args: 1, argName: "directory", type: GString, "Custom location for grails domain classes"
}
options = cli.parse(args)
println "BEANS=${options.'dir-beans'}"
println "ORM=${options.'dir-orm'}"
if (options.h || options == null) {
cli.usage()
System.exit(0)
}
According to the groovy documentation I should be able to use multiple "_" values for an option when I want it to ignore the short option name and use a long option name only. According to the groovy documentation:
Another example showing long options (partial emulation of arg
processing for 'curl' command line):
def cli = new CliBuilder(usage:'curl [options] <url>')
cli._(longOpt:'basic', 'Use HTTP Basic Authentication')
cli.d(longOpt:'data', args:1, argName:'data', 'HTTP POST data')
cli.G(longOpt:'get', 'Send the -d data with a HTTP GET')
cli.q('If used as the first parameter disables .curlrc')
cli._(longOpt:'url', args:1, argName:'URL', 'Set URL to work with')
Which has the following usage message:
usage: curl [options] <url>
--basic Use HTTP Basic Authentication
-d,--data <data> HTTP POST data
-G,--get Send the -d data with a HTTP GET
-q If used as the first parameter disables .curlrc
--url <URL> Set URL to work with
This example shows a common convention. When mixing short and long
names, the short names are often one
character in size. One character
options with arguments don't require a
space between the option and the
argument, e.g. -Ddebug=true. The
example also shows the use of '_' when
no short option is applicable.
Also note that '_' was used multiple times. This is supported but
if any other shortOpt or any longOpt is repeated, then the behavior is undefined.
http://groovy.codehaus.org/gapi/groovy/util/CliBuilder.html
When I use the "_" it only accepts the last one in the list (last one encountered). Am I doing something wrong or is there a way around this issue?
Thanks.
not sure what you mean it only accepts the last one. but this should work...
def cli = new CliBuilder().with {
x 'something', args:1
_ 'something', args:1, longOpt:'dir-beans'
_ 'something', args:1, longOpt:'dir-orm'
parse "-x param --dir-beans beans --dir-orm orm".split(' ')
}
assert cli.x == 'param'
assert cli.'dir-beans' == 'beans'
assert cli.'dir-orm' == 'orm'
I learned that my original code works correctly. What is not working is the function that takes all of the options built in the with enclosure and prints a detailed usage. The function call built into CliBuilder that prints the usage is:
cli.usage()
The original code above prints the following usage line:
usage: Generate.groovy [options]
--dir-orm <directory> Custom location for grails domain classes
-h,--help Usage information
-r,--root Root directory for code generation
-x Type of processor (all, schema, beans, docs)
This usage line makes it look like I'm missing options. I made the mistake of not printing each individual item separate from this usage function call. That's what made this look like it only cared about the last _ item in the with enclosure. I added this code to prove that it was passing values:
println "BEANS=${options.'dir-beans'}"
println "ORM=${options.'dir-orm'}"
I also discovered that you must use = between a long option and it's value or it will not parse the command line options correctly (--long-option=some_value)