Why doesn't the Linux redirection operator capture the output of my command? - linux

Context: I have a program (go-sigma-rule-engine by Markus Kont) on my EC2 instance that runs against a logfile and produces some output to screen.
The command used to run this program is ./gsre/go-sigma-rule-engine run --rules-dir ./gsre/rules/ --sigma-input ./logs/exampleLog.json
The program produces output of the form:
INFO[2021-09-22T21:51:06Z] MATCH at offset 0 : [{[] Example Activity Found}]
INFO[2021-09-22T21:51:06Z] All workers exited, waiting on loggers to finish
INFO[2021-09-22T21:51:06Z] Stats logger done
INFO[2021-09-22T21:51:06Z] Done
Goal: I would like to capture this output and store it in a file.
Attempted Solution: I used the redirection operator to capture the output like so:
./gsre/go-sigma-rule-engine run --rules-dir ./gsre/rules/ --sigma-input ./logs/exampleLog.json > output.txt
Problem: The output.txt file is empty and didn't capture the output of the command invoking the rule engine.

Maybe the output you want to capture goes to standard error rather than standard output. Try using 2> instead of > to redirect stderr.

Related

How to use set -x without showing stdout?

Within CI, I am running a bash script that calls many bash scripts.
./internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" > /dev/null
This doest not disable the stdout returned by the script.
The Gitlabi-CI runners stop logging after 100MB of log, It says Job's log exceeded limit of 10240000 bytes.
I know the log script can only grow up.
How can I optimize the output log size?
I don't need to have all the stdout, I can have stderr but then it will be a long running script without information.
Is there a way to display the commands which is running like when doing set -x?
Edit
Reading the answers, I was not able to solve my issue. I need to add that I am using nodejs to run the bash script that run the long bash script.
This is how I call my node script within .gitlab-ci.yml:
scripts:
- node my_script.js
Within my_script.js, I have:
exports.handler = () => {
const ls = spawn('bash', [path.join(__dirname, 'release.sh')], { stdio: 'inherit' });
ls.on('close', (code) => {
if (code !== 0) {
console.log(`ps process exited with code ${code}`);
process.exitCode = code;
}
});
};
Within my_script.sh, I have:
./internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" > /dev/null
You can selectively redirect file handles with exec.
exec >stdout 2>stderr
This however loses the connection to the terminal, so there is no simple way to output anything to the terminal after this point.
You can instead duplicate a file handle with m>&n where m is the number of the file descriptor to duplicate and n is the number of the new one (choose a big number like 99 to not accidentally clobber an existing handle).
exec 98<&1 # stdout
exec 99<&2 # stderr
exec >/dev/null 2>&1
:
To re-enable output,
exec 1<&98 2<&99
If you redirected to a temporary file instead of /dev/null you could obviously now show the tail of those files to the caller.
tail -n 100 "$TMPDIR"/stdout "$TMPDIR"/stderr
(On a shared server, probably use mktemp to create a unique temporary directory at the beginning of your script; static hard-coded file names make it impossible to run two builds at the same time.)
As you usually can't predict where the next error will happen, probably put all of this in a wrapper script which performs the redirection, runs the build, and finally displays the tail end of the temporary log files. Some build servers probably want to see some signs of life in the log file every few minutes, so perhaps tail a few lines every once in a while in a loop, too.
On the other hand, if there is just a single build command, the whole build job's stdout and stderr can simply be redirected to a log file, and you don't need to exec things back and forth. If you need to enable output selectively for portions of the script, use exec as above; but for wholesale redirection, just redirect the one command.
In summary, maybe your build script would look something like this.
#!/bin/sh
t=$(mktemp -t -d cibuild.XXXXXXXX) || exit
trap 'kill $buildpid; wait $buildpid; tail -n 500 "$t"/*; rm -rf "$t"' 0 1 2 3 5 15
# Your original commands here
${initial_process_wd}/internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}">"$t"/stdout 2>"$t"/stderr &
buildpid=$!
while kill -0 $buildpid; do
sleep 180
date
tail -n 1 "$t"/*
done
wait
A flaw with this approach is that you lose timing information. A proper solution woud let you see when each line was produced, and display standard output and standard error intermixed in the order the messages were printed, perhaps with visible time stamps, and even with coloring hints (red time stamps for stderr?)
Option 1
If your script will output the error message to stderr, you can ignore all output to stdout by using command > /dev/null, where /dev/null is a black hole that will take away any output to it.
Option 2
If there's any pattern on your error message, you can use grep to filter out those error messages.
Edit 1:
To show the command that is running, you can supply -x command to bash; therefore, your command will be
bash -x ${initial_process_wd}/internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" > /dev/null
bash will print the command executed to stderr
Edit 2:
If you want to reduce the size of the output file, you can pass it to gzip by using ${initial_process_wd}/internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" | gzip > logfile.
To read the content of the logfile, you can use zcat logfile.

Prevent script running with same arguments twice

We are looking into building a logcheck script that will tail a given log file and email when the given arguments are found. I am having trouble accurately determining if another version of this script is running with at least one of the same arguments against the same file. Script can take the following:
logcheck -i <filename(s)> <searchCriterion> <optionalEmailAddresses>
I have tried to use ps aux with a series of grep, sed, and cut, but it always ends up being more code than the script itself and seldom works very efficiently. Is there an efficient way to tell if another version of this script is running with the same filename and search criteria? A few examples of input:
EX1 .\logcheck -i file1,file2,file3 "foo string 0123" email#address.com
EX2 .\logcheck -s file1 Hello,World,Foo
EX3 .\logcheck -i file3 foo email#address1.com,email#address2.com
In this case 3 should not run because 1 is already running with parameters file3 and foo.
There are many solutions for your problem, I would recommend creating a lock file, with the following format:
arg1Ex1 PID#(Ex1)
arg2Ex1 PID#(Ex1)
arg3Ex1 PID#(Ex1)
arg4Ex1 PID#(Ex1)
arg1Ex2 PID#(Ex2)
arg2Ex2 PID#(Ex2)
arg3Ex2 PID#(Ex2)
arg4Ex2 PID#(Ex2)
when your script starts:
It will search in the file for all the arguments it has received (awk command or grep)
If one of the arguments is present in the list, fetch the process PID (awk 'print $2' for example) to check if it is still running (ps) (double check for concurrency and in case of process ended abnormally previously garbage might remain inside the file)
If the PID is still there, the script will not run
Else append the arguments to the lock file with the current process PID and run the script.
At the end, of the execution you remove the lines that contains the arguments that have been used by the script, or remove all lines with its PID.

Bash: pipe command output to function as the second argument

In my bash script I have a function for appending messages to the log file. It is used as follows:
addLogEntry (debug|info|warning|error) message
It produces nicely formatted lines with severity indication, timestamp and calling function name.
I've been looking for a way to pass output of some standard commands like rm to this function, while still being able to specify severity as the first argument. I'd also like to capture both stdout and stderr.
Is this possible without using a variable? It just feels excessive to involve variables to record a measly log message, and it encumbers the code too.
You have two choices:
You can add support to your addLogEntry function to have it accept the message from standard input (when no message argument is given or when - is given as the message).
You can use Command Substitution to run the command and capture its output as an argument to your function:
addLogEntry info "$(rm -v .... 2>&1)"
Note that this will lose any trailing newlines in the output however (in case that matters).
You can also use xargs to accomplish this
$ rm -v ... 2>&1 | xargs -I% addLogEntry info %
info removed 'blah1'
info removed 'blah2'
...
In the case of this command, the addLogEntry is called for every line in the input.

Unix: What does cat by itself do?

I saw the line data=$(cat) in a bash script (just declaring an empty variable) and am mystified as to what that could possibly do.
I read the man pages, but it doesn't have an example or explanation of this. Does this capture stdin or something? Any documentation on this?
EDIT: Specifically how the heck does doing data=$(cat) allow for it to run this hook script?
#!/bin/bash
# Runs all executable pre-commit-* hooks and exits after,
# if any of them was not successful.
#
# Based on
# http://osdir.com/ml/git/2009-01/msg00308.html
data=$(cat)
exitcodes=()
hookname=`basename $0`
# Run each hook, passing through STDIN and storing the exit code.
# We don't want to bail at the first failure, as the user might
# then bypass the hooks without knowing about additional issues.
for hook in $GIT_DIR/hooks/$hookname-*; do
test -x "$hook" || continue
echo "$data" | "$hook"
exitcodes+=($?)
done
https://github.com/henrik/dotfiles/blob/master/git_template/hooks/pre-commit
cat will catenate its input to its output.
In the context of the variable capture you posted, the effect is to assign the statement's (or containing script's) standard input to the variable.
The command substitution $(command) will return the command's output; the assignment will assign the substituted string to the variable; and in the absence of a file name argument, cat will read and print standard input.
The Git hook script you found this in captures the commit data from standard input so that it can be repeatedly piped to each hook script separately. You only get one copy of standard input, so if you need it multiple times, you need to capture it somehow. (I would use a temporary file, and quote all file name variables properly; but keeping the data in a variable is certainly okay, especially if you only expect fairly small amounts of input.)
Doing:
t#t:~# temp=$(cat)
hello how
are you?
t#t:~# echo $temp
hello how are you?
(A single Controld on the line by itself following "are you?" terminates the input.)
As manual says
cat - concatenate files and print on the standard output
Also
cat Copy standard input to standard output.
here, cat will concatenate your STDIN into a single string and assign it to variable temp.
Say your bash script script.sh is:
#!/bin/bash
data=$(cat)
Then, the following commands will store the string STR in the variable data:
echo STR | bash script.sh
bash script.sh < <(echo STR)
bash script.sh <<< STR

Difference between "command > log.txt" and "command 1>& log.txt" in Linux command shell?

When I run the command haizea -c simulated.conf > result.txt, the program (haizea) still prints its output to the screen. But when I try haizea -c simulated.conf 1>& result.txt, the output is now on the file result.txt. I'm quite confused about this situation. What is the difference between > and 1>&, then?
What you're seeing on the terminal is the standard error of your process. Both of these are directed to the same terminal device by default (assuming no redirection put into effect).
The redirection >&xyz redirects both standard output and error to the file xyz.
I've never used it but I would think, by extension, that N>&xyz would redirect file handle N and standard error to your file. So 1>&xyz is equivalent to >&xyz which is also equivalent to >xyz 2>&1.
The number before the > stands for the descriptor.
Standard Input - 0
Standard Output - 1
Standard Error - 2
The & will direct both standard output and standard error.
http://linuxdevcenter.com/pub/a/linux/lpt/13_01.html#doc2ac15b1c13
> redirects standard output alone.
>& or &> or 1>& redirect both standard output and standard error.
Your program is printing on standard error which is not getting redirected in case 1.

Resources