Groovy File() not reporting correct size / length - groovy

I have a Jenkins post build Groovy script running out of the "Post build task plugin". From the same plugin, immediately before running the Groovy script, I check for the existence of the file and its size. The log shows:
09:14:53 -rw-r--r-- 1 aaa users 978243 Nov 4 08:53 /jk/workspace/xxxx/output/delta.txt
09:14:53 cppcheck.groovy: Checking build result: SUCCESS
09:14:53 cppcheck.groovy: workspace = /jk/workspace/xxxx
09:14:53 cppcheck.groovy: delta = /jk/workspace/xxxx/output/delta.txt
09:14:53 cppcheck.groovy: delta.txt length = 0
The groovy script is as follows:
import hudson.model.*
def build = Thread.currentThread().executable
def result = build.getResult()
println("cppcheck.groovy: Checking build result: " + result.toString())
if (result.isBetterOrEqualTo(hudson.model.Result.SUCCESS)) {
def workspace = build.getEnvVars()["WORKSPACE"]
def delta = workspace + "/output/delta.txt"
println("cppcheck.groovy: workspace = " + workspace)
println("cppcheck.groovy: delta = " + delta)
def f = new File(delta)
println("cppcheck.groovy: delta.txt length = " + f.length())
if (f.length() > 0) {
build.setResult(hudson.model.Result.UNSTABLE)
}
}
What am I doing wrong here?
Update: There seems to be some scepticism that the file exists and that there is some sort of race condition. To put your minds at rest, let's rule that out. I have modified the build to execute the same ls -l command after it runs the groovy script, to prove the file does exist and that this problem is ultimately Groovy not being able to open the file. I also added the file exists() check to the above Groovy script, which as I suspected it would, reports the file doesn't exist. I don't dispute that Groovy thinks the file doesn't exist. What I am trying to work out is why?
10:31:39 [xxxx] $ /bin/sh -xe /tmp/hudson8964729240493636268.sh
10:31:39 + ls -l /jk/workspace/xxxx/output/delta.txt
10:31:39 -rw-r--r-- 1 aaa users 978243 Nov 4 08:53 /jk/workspace/xxxx/output/delta.txt
10:31:40 cppcheck.groovy: Checking build result: SUCCESS
10:31:40 cppcheck.groovy: workspace = /jk/workspace/xxxx
10:31:40 cppcheck.groovy: delta = /jk/workspace/xxxx/output/delta.txt
10:31:40 cppcheck.groovy: delta.txt length = 0
10:31:40 cppcheck.groovy: delta.txt exists = false
10:31:40 [xxxx] $ /bin/sh -xe /tmp/hudson8007562636561507409.sh
10:31:40 + ls -l /jk/workspace/xxxx/output/delta.txt
10:31:40 -rw-r--r-- 1 aaa users 978243 Nov 4 08:53 /jk/workspace/xxxx/output/delta.txt
Also, notice the timestamp on said file, is still 08:53 when it was created.

I suspected that the Groovy script was running on the build master as opposed to the build node that this particular build was running on. I added some debug to print the hostname for which the Groovy script was running and sure enough it wasn't the same host that the shell variant of the script was running.

Related

How do I change the line endings used by PExpect output

The returned output from pexpect.run() includes \r\n at the end of every line. Printing to the terminal using print(returnVal.decode()) correctly prints one line for each line returned. When I examine the output I see that the byte string contains \r\n. When I log that to a file I get double returns to the log file. I'm on a Mac using Python 3.7. Is there a way to set the preferred new line when writing the output? I am using pythons logging class and using the info() method to write the string. Output looks like this:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
When it should look like:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
Here is a simplified version of my original Logger class:
class Logger():
def __init__( self, path ):
msgFormat = '%(asctime)s.%(msecs)d\t%(message)s'
dateFormat = '%m/%d/%Y %H:%M:%S'
logging.basicConfig( format=msgFormat, datefmt=dateFormat, filename=path, level=logging.INFO )
def Log ( self, theStr ):
logging.info( str( theStr ))
The string being returned from Pexpect looks something like:
Line1\r\nLine2
Depending on how you log the output, it's advisable to format the newlines before sending to logger. However, if you must override the logging module's newline parameter for FileHandler, and as an experiment, you can do so by monkey patching its _open method as the functionality isn't available by default.
I used source code for Python version 3.8 to get _open function's definition.
import logging
def custom_open(self):
"""
Monkey patched _open function of class logging.FileHandler (Python 3.8)
"""
return open(self.baseFilename, self.mode, encoding=self.encoding, newline='')
logging.FileHandler._open = custom_open
if __name__ == "__main__":
pexpect_return = "Output\nTest"
my_log = logging.getLogger("test_logger")
my_log.setLevel(logging.INFO)
my_log.addHandler(logging.FileHandler("test.log"))
my_log.info(pexpect_return)
How it works
Python's logging module has a class FileHandler, which uses a method _open to create a file handler object to write and append to log files on disk. Its default implementation as of version 3.8 does not have the newline parameter so it uses default newlines.
Monkey patching is when you replace or update a method/function in one of your imported classes, as the program is running. This line logging.FileHandler._open = custom_open tells python to replace the _open method of the FileHandler class, with my custom_open method. Then later when I use my_log.addHandler(logging.FileHandler("test.log")), the new custom_open method is used to open the file with newline paramater.
You can further confirm that the new method is used to open the file by adding a suffix to the file name like this:
return open(self.baseFilename+"__Monkey_Patched", self.mode, encoding=self.encoding, newline='')
If you will now run that demo code, the filename will be "test.log__Monkey_Patched".
This code, however, will not replace any newline characters which you pass to the logger as part of the string to log. You need to process that beforehand.

How do I zip files in bazel

I have a set of files as part of the my repository. How do I produce a zip file out of those files in bazel. I found a rules for tar.gz etc. but cannot find a way how to achive a zip archive.
Found references mentioning zipper but couldn't figure out how to load it and use it. Can someone more experienced with bazel help?
A basic pkg_zip rule was added to rules_pkg recently. Here is a basic usage example from the unit tests:
load("#rules_pkg//:pkg.bzl", "pkg_zip")
pkg_zip(
name = "test_zip_basic",
srcs = [
"testdata/hello.txt",
"testdata/loremipsum.txt",
],
)
You can specify the paths using the extra rules in mappings.bzl. Here is the example given by the Bazel team:
load("#rules_pkg//:mappings.bzl", "pkg_attributes", "pkg_filegroup", "pkg_files", "pkg_mkdirs", "strip_prefix")
load("#rules_pkg//:pkg.bzl", "pkg_tar", "pkg_zip")
# This is the top level BUILD for a hypothetical project Foo. It has a client,
# a server, docs, and runtime directories needed by the server.
# We want to ship it for Linux, macOS, and Windows.
#
# This example shows various techniques for specifying how your source tree
# transforms into the installation tree. As such, it favors using a lot of
# distict features, at the expense of uniformity.
pkg_files(
name = "share_doc",
srcs = [
"//docs",
],
# Required, but why?: see #354
strip_prefix = strip_prefix.from_pkg(),
# Where it should be in the final package
prefix = "usr/share/doc/foo",
)
pkg_filegroup(
name = "manpages",
srcs = [
"//src/client:manpages",
"//src/server:manpages",
],
prefix = "/usr/share",
)
pkg_tar(
name = "foo_tar",
srcs = [
"README.txt",
":manpages",
":share_doc",
"//resources/l10n:all",
"//src/client:arch",
"//src/server:arch",
],
)
pkg_zip(
name = "foo_zip",
srcs = [
"README.txt",
":manpages",
":share_doc",
"//resources/l10n:all",
"//src/client:arch",
"//src/server:arch",
],
)
The zipper utility is at #bazel_tools//tools/zip:zipper, this is its usage:
Usage: zipper [vxc[fC]] x.zip [-d exdir] [[zip_path1=]file1 ... [zip_pathn=]filen]
v verbose - list all file in x.zip
x extract - extract files in x.zip to current directory, or
an optional directory relative to the current directory
specified through -d option
c create - add files to x.zip
f flatten - flatten files to use with create or extract operation
C compress - compress files when using the create operation
x and c cannot be used in the same command-line.
For every file, a path in the zip can be specified. Examples:
zipper c x.zip a/b/__init__.py= # Add an empty file at a/b/__init__.py
zipper c x.zip a/b/main.py=foo/bar/bin.py # Add file foo/bar/bin.py at a/b/main.py
If the zip path is not specified, it is assumed to be the file path.
So it can be used in a genrule like this:
$ tree
.
├── BUILD
├── dir
│   ├── a
│   ├── b
│   └── c
└── WORKSPACE
1 directory, 5 files
$ cat BUILD
genrule(
name = "gen_zip",
srcs = glob(["dir/*"]),
tools = ["#bazel_tools//tools/zip:zipper"],
outs = ["files.zip"],
cmd = "$(location #bazel_tools//tools/zip:zipper) c $# $(SRCS)",
)
$ bazel build :files.zip
INFO: Analyzed target //:files.zip (7 packages loaded, 41 targets configured).
INFO: Found 1 target...
Target //:files.zip up-to-date:
bazel-bin/files.zip
INFO: Elapsed time: 0.653s, Critical Path: 0.08s
INFO: 1 process: 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
$ unzip -l bazel-bin/files.zip
Archive: bazel-bin/files.zip
Length Date Time Name
--------- ---------- ----- ----
0 2010-01-01 00:00 dir/a
0 2010-01-01 00:00 dir/b
0 2010-01-01 00:00 dir/c
--------- -------
0 3 files
It can similarly be used in Starlark:
def _some_rule_impl(ctx):
zipper_inputs = []
zipper_args = ctx.actions.args()
zipper_args.add("c", ctx.outputs.zip.path)
....
ctx.actions.run(
inputs = zipper_inputs,
outputs = [ctx.outputs.zip],
executable = ctx.executable._zipper,
arguments = [zipper_args],
progress_message = "Creating zip...",
mnemonic = "zipper",
)
some_rule = rule(
implementation = _some_rule_impl,
attrs = {
"deps": attr.label_list(),
"$zipper": attr.label(default = Label("#bazel_tools//tools/zip:zipper"), cfg = "host", executable=True),
},
outputs = {"zip": "%{name}.zip"},
)

--latency-wait Error MissingOutputException

I am working on snakemake to make a pipeline for my work.
My input files are (Example fastq files)
/project/ateeq/PROJECT/snakemake-example/raw_data/H667-1_R1.fastq.gz
/project/ateeq/PROJECT/snakemake-example/raw_data/H667-1_R2.fastq.gz
/project/ateeq/PROJECT/snakemake-example/raw_data/H667-2_R1.fastq.gz
/project/ateeq/PROJECT/snakemake-example/raw_data/H667-2_R2.fastq.gz
I have written the following code for trimming the data using FastP
"""
Author: Dave Amir
Affiliation: St Lukes
Aim: A simple Snakemake workflow to process paired-end stranded RNA-Seq.
Date: 11 June 2015
Run: snakemake -s snakefile
Latest modification:
- todo
"""
# This should be placed in the Snakefile.
##-----------------------------------------------##
## Working directory ##
## Adapt to your needs ##
##-----------------------------------------------##
BASE_DIR = "/project/ateeq/PROJECT"
WDIR = BASE_DIR + "/snakemake-example"
workdir: WDIR
#message("The current working directory is " + WDIR)
##--------------------------------------------------------------------------------------##
## Variables declaration
## Declaring some variables used by topHat and other tools...
## (GTF file, INDEX, chromosome length)
##--------------------------------------------------------------------------------------##
INDEX = BASE_DIR + "/ref_files/hg19/assembly/"
GTF = BASE_DIR + "/hg19/Hg19_CTAT_resource_lib/ref_annot.gtf"
CHR = BASE_DIR + "/static/humanhg19.annot.csv"
FASTA = BASE_DIR + "/ref_files/hg19/assembly/hg19.fasta"
##--------------------------------------------------------------------------------------##
## The list of samples to be processed
##--------------------------------------------------------------------------------------##
SAMPLES, = glob_wildcards("/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R1.fastq.gz")
NB_SAMPLES = len(SAMPLES)
for smp in SAMPLES:
message:("Sample " + smp + " will be processed")
##--------------------------------------------------------------------------------------##
## Our First rule - sample trimming
##--------------------------------------------------------------------------------------##
rule final:
input: expand("/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R1_trimmed.fastq", smp=SAMPLES)
rule trimming:
input: fwd="/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R1.fastq.gz",rev="/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R2.fastq.gz"
output: fwd="/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R1_trimmed.fastq", rev="/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R2_trimmed.fastq", rep="/project/ateeq/PROJECT/snakemake-example/report/{smp}.html"
threads: 30
message: """--- Trimming."""
shell: """
fastp -i {input.fwd} -I {input.rev} -o {output.fwd} -O {output.rev} --detect_adapter_for_pe --disable_length_filtering --correction --qualified_quality_phred 30 --thread 16 --html {output.rep} --report_title "Fastq Quality Control Report" &>>{input.fwd}.log
"""
when I am running the pipeline it shows the following error
MissingOutputException in line 51 of /project/ateeq/PROJECT/snakemake-example/snakefile:
Missing files after 5 seconds:
/project/ateeq/PROJECT/snakemake-example/trimmed/H667-1_R1_trimmed.fastq
/project/ateeq/PROJECT/snakemake-example/trimmed/H667-1_R2_trimmed.fastq
/project/ateeq/PROJECT/snakemake-example/report/H667-1.html
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Though I have increased the latency wait period for 2000 sec it still ends up throwing an error
snakemake -s snakefile -j 30 --latency-wait 2000
I am using snakemake version 5.4.5 and Python 3.6.8. Please let me know where I am going wrong, it would be a great help to me
Thanks for kind Help,
Sincerely,
Dave
I don't know if this is due to the copy & paste but the indentation is wrong:
rule trimming:
input: fwd="/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R1.fastq.gz",rev="/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R2.fastq.gz"
output: fwd="/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R1_trimmed.fastq", rev="/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R2_trimmed.fastq", rep="/project/ateeq/PROJECT/snakemake-example/report/{smp}.html"
threads: 30
message: """--- Trimming."""
shell: """
fastp -i {input.fwd} -I {input.rev} -o {output.fwd} -O {output.rev} --detect_adapter_for_pe --disable_length_filtering --correction --qualified_quality_phred 30 --thread 16 --html {output.rep} --report_title "Fastq Quality Control Report" &>>{input.fwd}.log
"""
Shouldn't it be:
rule trimming:
input:
fwd="/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R1.fastq.gz",
rev="/project/ateeq/PROJECT/snakemake-example/raw_data/{smp}_R2.fastq.gz"
output:
fwd="/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R1_trimmed.fastq",
rev="/project/ateeq/PROJECT/snakemake-example/trimmed/{smp}_R2_trimmed.fastq",
rep="/project/ateeq/PROJECT/snakemake-example/report/{smp}.html"
threads: 30
message: """--- Trimming."""
shell: """
fastp -i {input.fwd} -I {input.rev} -o {output.fwd} -O {output.rev} --detect_adapter_for_pe --disable_length_filtering --correction --qualified_quality_phred 30 --thread 16 --html {output.rep} --report_title "Fastq Quality Control Report" &>>{input.fwd}.log
"""

Python PATH .is_file() evaluates symlink as a file

In my Python3 program, I take a bunch of paths and do things based on what they are. When I evaluate the following symlinks (snippet):
lrwxrwxrwx 1 513 513 5 Aug 19 10:56 console -> ttyS0
lrwxrwxrwx 1 513 513 11 Aug 19 10:56 core -> /proc/kcore
lrwxrwxrwx 1 513 513 13 Aug 19 10:56 fd -> /proc/self/fd
the results are:
symlink console -> ttyS0
file core -> /proc/kcore
symlink console -> ttyS0
It evaluates core as if it were a file (vs a symlink). What is the best way for me to evaluate it as a symlink vs a file? code below
#!/usr/bin/python3
import sys
import os
from pathlib import Path
def filetype(filein):
print(filein)
if Path(filein).is_file():
return "file"
if Path(filein).is_symlink():
return "symlink"
else:
return "doesn't match anything"
if __name__ == "__main__":
file = sys.argv[1]
print(str(file))
print(filetype(file))
The result of is_file is intended to answer the question "if I open this name, will I open a file". For a symlink, the answer is "yes" if the target is a file, hence the return value.
If you want to know if the name is a symlink, ask is_symlink.

Torque cannot communicate with host

I have been attempting to setup the torque scheduler for a small cluster. I followed the steps to setup the scheduler from http://docs.adaptivecomputing.com/torque/archive/3-0-2/1.2configuring_torque_on_server.php
However when i attempt
qterm -t quick
I get the following error
$ sudo qterm -t quick
Unable to communicate with Terra(192.168.1.25)
Cannot connect to specified server host 'Terra'.
qterm: could not connect to server '' (111) Connection refused
but the server starts just fine. However when I attempt to run a command that runs on multiple nodes such as
qsub -l nodes=2:ppn=4 /home/user/scripts/someScript
it prints out somethign like
7.Terra
where Terra is the name of the head node, but is also a node in the cluster. This isn't the problem. The problem is that it does not run. nor does it have any output anywhere :/
The torque server log: https://ptpb.pw/EaKo
The terra node log: https://ptpb.pw/9w5M
and the Marte log: https://ptpb.pw/o4PT
I can get it to run with a pbs script but only with one node....
#!/bin/bash
#PBS -l pmem=1gb,nodes=1:ppn=4
#PBS -m abe
cd Documents/
wc -l largeTest.csv
Here is the ouput of qstat after submitting a job
Job ID Name User Time Use S
Queue
------------------------- ---------------- --------------- -------- - -----
16.Terra testPerformance justin 0 R batch
the output of pbsnodes -a
Terra
state = free
power_state = Running
np = 4
properties = Tower
ntype = cluster
status = opsys=linux,uname=Linux Terra 4.17.14-arch1-1-ARCH #1 SMP PREEMPT Thu Aug 9 11:56:50 UTC 2018 x86_64,sessions=11525 22029,nsessions=2,nusers=1,idletime=57964,totmem=8111556kb,availmem=7539284kb,physmem=8111556kb,ncpus=4,loadave=0.00,gres=,netload=30570521372,state=free,varattr= ,cpuclock=Fixed,macaddr=e0:3f:49:44:72:20,version=6.1.1.1,rectime=1534937388,jobs=
mom_service_port = 15002
mom_manager_port = 15003
gpus = 1
Marte
state = free
power_state = Running
np = 4
properties = NFSServer
ntype = cluster
status = opsys=linux,uname=Linux Marte 4.18.1-arch1-1-ARCH #1 SMP PREEMPT Wed Aug 15 21:11:55 UTC 2018 x86_64,sessions=366 556 563,nsessions=3,nusers=2,idletime=58140,totmem=7043404kb,availmem=6703808kb,physmem=7043404kb,ncpus=4,loadave=0.02,gres=,netload=36500663511,state=free,varattr= ,cpuclock=Fixed,macaddr=c8:5b:76:4a:65:91,version=6.1.1.1,rectime=1534937359,jobs=
mom_service_port = 15002
mom_manager_port = 15003
and the /var/spool/torque/server_priv/nodes
Terra np=4 gpus=1 Tower
Marte np=4 NFSServer
Edit: Here are the most recent logs as well
Mom Log for Node: https://ptpb.pw/DhKi
Mom Log for head node: https://ptpb.pw/MTlD
and the server log: https://ptpb.pw/HPkE

Resources