Find patterns and rename multiple files - linux

I have a list of machine names and hostnames
ex)
# cat /tmp/machine_list.txt
[one]apple machine #1 myserver1
[two]apple machine #2 myserver2
[three]apple machine #3 myserver3
and, server each directory
and each directory contains an tar file and a file with the host name written on it.
# ls /tmp/sos1/*
sosreport1.tar.gz
hostname_map.txt
# cat /tmp/sos1/hostname_map.txt
myserver1
# ls /tmp/sos2/*
sosreport2.tar.gz
hostname_map.txt
# cat /tmp/sos2/hostname_map.txt
myserver2
# ls /tmp/sos3/*
sosreport3.tar.gz
hostname_map.txt
# cat /tmp/sos3/hostname_map.txt
myserver3
Is it possible to rename the sosreport*.tar.gz by referencing the hostname_map in each directory relative to the /tmp/machine_list.txt file? (like below)
# ls /tmp/sos1/*
[one]apple_machine_#1_myserver1_sosreport1.tar.gz
# ls /tmp/sos2/*
[two]apple_machine_#2_myserver2_sosreport2.tar.gz
# ls /tmp/sos3/*
[three]apple_machine_#3_myserver3_sosreport3.tar.gz
A single change is possible, but what about multiple changes?

Something like this?
srvname () {
awk -v srv="$(cat "$1")" -F '\t' '$2==srv { print $1; exit }' machine_list.txt
}
for dir in /tmp/sos*/; do
server=$(srvname "$dir"/hostname_map.txt)
mv "$dir"/sosreport*.tar.gz "$dir/$server.tar.gz"
done
Demo: https://ideone.com/TS5VyQ
The function assumes your mapping file is tab-delimited. If you want underscores instead of spaces in the server names, change the mapping file.
This should be portable to POSIX sh; the cat could be replaced with a Bash redirection, but I feel that it's not worth giving up portability for such a small change.
If this were my project, I'd probably make the function into a self-contained reusable script (with the input file replaced with a here document in the script itself) since there will probably be more situations where you need to perform the same mapping.

Related

How does one create a wrapper around a program?

I want to learn to create a wrapper around a program in linux. How does one do this? A tutorial reference web-page/link or example will do. To clarify what I want to learn, I will explain with an example.
I use vim for editing text files. And use rcs as my simple revision control system. rcs allows you to check-in and checkout-files. I would like to create a warpper program named vir which when I type in the shell as:
$ vir temp.txt
will load the file temp.txt into rcs with ci -u temp.txt and then allows me to edit the file using vim.
When I get out and go back in, It will need to check out the file first, using ci -u temp.txt and allow me to edit the file as one normally does with vim, and then when I save and exit, it should check-in the file using co -u temp.txt and as part of that I should be able to add a version control comment.
Basically, all I want to be doing on the command line is:
$ vir temp.txt
as one would with vim. And the wrapper should take care of the version control for me.
Take a look at rcsvers.vim, a vim plugin for automatically saving versions in RCS; you could modify that. There are also other RCS plugins for vim at vim.org
I have a wrapper to enhance the ping command (using zsh) it could, maybe help you:
# ping command wrapper - Last Change: out 27 2019 18:47
# source: https://www.cyberciti.biz/tips/unix-linux-bash-shell-script-wrapper-examples.html
ping(){
# Name: ping() wrapper
# Arg: (url|domain|ip)
# Purpose: Send ping request to domain by removing urls, protocol, username:pass using system /usr/bin/ping
local array=( $# ) # get all args in an array
local host=${array[-1]} # get the last arg
local args=${array[1,-2]} # get all args before last arg in $#
#local _ping="/usr/bin/ping"
local _ping="/bin/ping"
local c=$(_getdomainnameonly "$host")
[ "$host" != "$c" ] && echo "Sending ICMP ECHO_REQUEST to \"$c\"..."
# pass args and host
# $_ping $args $c
# default args for ping
$_ping -n -c 2 -i 1 -W1 $c
}
_getdomainnameonly(){
# Name: _getdomainnameonly
# Arg: Url/domain/ip
# Returns: Only domain name
# Purpose: Get domain name and remove protocol part, username:password and other parts from url
# get url
local h="$1"
# upper to lowercase
local f="${h:l}"
# remove protocol part of hostname
f="${f#http://}"
f="${f#https://}"
f="${f#ftp://}"
f="${f#scp://}"
f="${f#scp://}"
f="${f#sftp://}"
# Remove username and/or username:password part of hostname
f="${f#*:*#}"
f="${f#*#}"
# remove all /foo/xyz.html*
f=${f%%/*}
# show domain name only
echo "$f"
}
What it hides the local ping using a function called "ping", so if your script has precedence on your path it will find at first the function ping. Then inside the script I define an internal variable called ping that points out to the real ping command:
local _ping="/bin/ping"
You can also notice that the args are stored in one array.

Iterate through files in a directory, create output files, linux

I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done

One liner to append a file into another file but only if it hasn't already been added

I have an automated process that has a number of lines like the following pattern:
sudo cat /some/path/to/a/file >> /some/other/file
I'd like to transform that into a one liner that will only append to /some/other/file if /some/path/to/a/file has not already been added.
Edit
It's clear I need some examples here.
example 1: Updating a .bashrc script for a specific login
example 2: Creating a .screenrc for different logins
example 3: Appending to the end of a /etc/ config file
Some other caveats. The text is going to be added in a block (>>). Consequently, it should be relatively straight forward to see if the entire code block is added or not near the end of a file. I am trying to come up with a simple method for determining whether or not the file has already been appended to the original.
Thanks!
Example python script...
def check_for_appended(new_file, original_file):
""" Checks original_file to see if it has the contents of new_file """
new_lines = reversed(new_file.split("\n"))
original_lines = reversed(original_file.split("\n"))
appended = None
for new_line, orig_line in zip(new_lines, original_lines):
if new_line != orig_line:
appended = False
break
else:
appended = True
return appended
Maybe this will get you started - this GNU awk script:
gawk -v RS='^$' 'NR==FNR{f1=$0;next} {print (index($0,f1) ? "present" : "absent")}' file1 file2
will tell you if the contents of "file1" are present in "file2". It cannot tell you why, e.g. because you previously concatenated file1 onto the end of file2.
Is that all you need? If not update your question to clarify/explain.
Here's a technique to see if a file contains another file
contains_file_in_file() {
local small=$1
local big=$2
awk -v RS="" '{small=$0; getline; exit !index($0, small)}' "$small" "$big"
}
if ! contains_file_in_file /some/path/to/a/file /some/other/file; then
sudo cat /some/path/to/a/file >> /some/other/file
fi
EDIT: Op just told me in the comments that the files he wants to concatenate are bash scripts -- this brings us back to the good ole C preprocessor include guard tactics:
prepend every file with
if [ -z "$__<filename>__" ]; then __<filename>__=1; else
(of course replacing <filename> with the name of the file) and at the end
fi
this way, you surround the script in each file with a test for something that's only true once.
Does this work for you?
sudo (set -o noclobber; date > /tmp/testfile)
noclobber prevents overwriting an existing file.
I think it doesn't, since you wrote you want to append something but this technique might help.
When the appending all occurs in one script, then use a flag:
if [ -z "${appended_the_file}" ]; then
cat /some/path/to/a/file >> /some/other/file
appended_the_file="Yes I have done it except for permission/right issues"
fi
I would continue into writing a function appendOnce { .. }, with the content above. If you really want an ugly oneliner (ugly: pain for the eye and colleague):
test -z "${ugly}" && cat /some/path/to/a/file >> /some/other/file && ugly="dirt"
Combining this with sudo:
test -z "${ugly}" && sudo "cat /some/path/to/a/file >> /some/other/file" && ugly="dirt"
It appears that what you want is a collection of script segments which can be run as a unit. Your approach -- making them into a single file -- is hard to maintain and subject to a variety of race conditions, making its implementation tricky.
A far simpler approach, similar to that used by most modern Linux distributions, is to create a directory of scripts, say ~/.bashrc.d and keep each chunk as an individual file in that directory.
The driver (which replaces the concatenation of all those files) just runs the scripts in the directory one at a time:
if [[ -d ~/.bashrc.d ]]; then
for f in ~/.bashrc.d/*; do
if [[ -f "$f" ]]; then
source "$f"
fi
done
fi
To add a file from a skeleton directory, just make a new symlink.
add_fragment() {
if [[ -f "$FRAGMENT_SKELETON/$1" ]]; then
# The following will silently fail if the symlink already
# exists. If you wanted to report that, you could add || echo...
ln -s "$FRAGMENT_SKELETON/$1" "~/.bashrc.d/$1" 2>>/dev/null
else
echo "Not a valid fragment name: '$1'"
exit 1
fi
}
Of course, it is possible to effectively index the files by contents rather than by name. But in most cases, indexing by name will work better, because it is robust against editing the script fragment. If you used content checks (md5sum, for example), you would run the risk of having an old and a new version of the same fragment, both active, and without an obvious way to remove the old one.
But it should be straight-forward to adapt the above structure to whatever requirements and constraints you might have.
For example, if symlinks are not possible (because the skeleton and the instance do not share a filesystem, for example), then you can copy the files instead. You might want to avoid the copy if the file is already present and has the same content, but that's just for efficiency and it might not be very important if the script fragments are small. Alternatively, you could use rsync to keep the skeleton and the instance(s) in sync with each other; that would be a very reliable and low-maintenance solution.

How do you format output string in bash script for input by another script?

I need to unzip a bunch of student assignment (jar) files so that I can use a script to submit the contents to the Moss (Stanford) plagiarism detection server. I did the same thing in Java which was trivial but I'm trying to re-implement to as a bash script.
I am trying to do the following:
Get a list of student names (each student has a directory).
In each student directory, sub-directories exist numbered from 1 to the
latest submission. I need to get the directory with the highest
number.
Inside of each of those submission directories contains a
jar file that I need. I copy each jar into a temp directory with the
same name as the student and unzip it.
I need that temp directory listing formatted as a string in the form
/tempDir/studentName1/.languageExt /tempDir/studentName2/.languageExt
The student directory has the basic structure:
Student_Root_Directory:
Student1
Student2
Student1
Sub-Directories: 1 2 3 4 5
1: student1.jar
2: student1.jar
...
Student2
Sub-Directories: 1 2 3
1. student2.jar
...
To do the first 3 steps above I did:
#!/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: .c, .cpp, .java, .py
students=`ls $1`
student_dir=$1
languageExt=$2
mossDir="/home/moss"
tempDir="/home/moss/tempJarStorage"
for student in $students
do
latestSubmissionDir=`ls -t $student_dir/$student | head -1`
for jarDir in $latestSubmissionDir
do
mkdir $tempDir/$student
cp $student_dir/$student/$jarDir/*.jar $tempDir/$student
unzip -d $tempDir/$student/ -o -j $tempDir/$student/$student.jar *.$languageExt
rm $tempDir/$student/$student.jar
done
done
...which results in a number of student directories being created in a temp directory that contains only the unzipped contents for the student submissions.
I need the ls output of the new temp directories formatted as a string that contains:
/tempDir/studentName1/\*.languageExt /tempDir/studentName2/\*.languageExt
I have tried variations on
find "$tempDir" -iname "*.$languageExt" -printf "%p/*.$languageExt"
using iname and not - but I either have output that contains extra directory information such as $tempDir/*.languageExt (when I just need the subdirectories $tempDir/$studentName/*.languageExt) or I have output where the path for every source file is also listed such as:
$tempDir/$studentName/studentNameA.java
$tempDir/$studentName/studentNameB.java
when I only need
$tempDir/$studentName/*.java
I think this should be really easy and I'm just over thinking it. Any hints for improving the script also appreciated.
Here's a revised version of the script hat may work:
#/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: c, cpp, java, py
students_dir=$1
languageExt=$2
studentPathsT=( "$students_dir"/*/ )
mossDir='/home/moss'
tempDir='/home/moss/tempJarStorage'
for studentPathT in "${studentPathsT[#]}"; do
student=$(basename "$studentPathT")
mkdir "$tempDir/$student"
submissionDirsT=( "$studentPathT"*/ )
latestSubmissionDirT=${submissionDirsT[${#submissionDirsT[#]-1]}
cp "$latestSubmissionDirT"*.jar "$tempDir/$student/"
unzip -d "$tempDir/$student/" -o -j "$tempDir/$student/*.jar" "*.$languageExt"
rm "$tempDir/$student"/*.jar
done
# Note that at this point `"$tempDir"/*/*.$languageExt` would expand
# to all extracted submission files, across all students.
# Finally, output each student's extracted files as an unexpanded glob à la
# /{tempDir}/{studentName1}/*.{languageExt}
for pT in "$tempDir"/*/; do
echo "$pT*.$languageExt"
# Note: If there is a chance that your filenames contain
# embedded newlines (rare in practice) using `echo` won't work properly
# as #Charles Duffy points out.
# If that is a concern, use
# printf '%s\0' "$pT*.$languageExt"
# and process the output with a utility that can process NUL characters
# as separators, such as `xargs -0`.
done
It avoids using ls and only uses pathname expansion and array variables so as to properly deal with paths that contain embedded spaces and other shell metacharacters.
suffix ...T in variable names indicates that a particular path or array of paths is *T*erminated, i.e, that it ends in a /.
The assumption is that the numbered subdirectories do not go beyond 9, as the implicit lexical sorting of pathname expansion is relied upon; if the numbers go higher, explicit numerical sorting must be applied.
Note that the globs (pathname patterns) passed to unzip are intentionally double-quoted, as they should be interpreted by unzip, not the shell.
Note that, based on your original code, I've assumed that $languageExt does NOT start with . (e.g., cpp rather than .cpp), despite what your comment says.

cat multiple files over one ssh connection and get return value for each

As said in the title, i'm trying to cat multiple files (content needs to be appended to existing files on host) over one ssh connection and get return value for each, i.e. if that cat for the particular file was successful or not.
Up to now, i did this for each file individually, by just repeating the following command for each and checking the return value.
cat specific_file | ssh user#host -i /root/.ssh/id_rsa "cat >> result/specific_file"
I then just checked the return value for each transfer (automatically) and thereby could determine the status for each file. My question is: is it possible to to this over one single ssh connection, but to obtain a return value for every single file ?
Thanks in advance !
EDIT:
(b) As you can see in the following, i generate a command for one specific file and then check the return code. The method containing this part of the program is then called for the different types o
If the file was successfully logged, it can be removed from the client, and that further logging to the type of file is (still) allowed. If the file is exceeding its limits, this bit is unset in another part of the program.
sprintf(command,"/bin/cat /root/%s%s | /usr/bin/ssh log-bot#192.168.5.1 -i"
" /root/.ssh/id_rsa \"/bin/cat >> result/%s%s\"", apmac,
ending_str[source], apmac, ending_str[dest]);
rc = system(command);
if(rc != 0)
{
sprintf(buffer, "[%i] - LOG ERROR from %s to %s CODE %i ", (int)time(0),
ending_str[source], ending_str[dest], rc);
sprintf(filename,"%s%s%s", LOCAL, apmac, ending_str[source]);
}
else
{
sprintf(filename,"%s%s%s", LOCAL, apmac, ending_str[source]);
remove(filename);
sprintf(buffer, "[%i] - LOG process from %s to %s ok", (int)time(0),
ending_str[source], ending_str[dest]);
switch(source)
{
case PROBE_FILE:
LOG_MASK_SET(globals->log_mask, LOG_MASK_PROB);
break;
case LIST_FILE:
LOG_MASK_SET(globals->log_mask, LOG_MASK_LIST);
break;
case SCAN_FILE:
LOG_MASK_SET(globals->log_mask, LOG_MASK_SCAN);
break;
default:
/* Other cases not YET considered */
break;
}
}
SECOND EDIT:
I'm running this code on a router as a part of a program. Please note that i'm not allowed to add new libraries or non basic functions to the system. In addition, the ssh client on the system itself doesn't allow "-M" mode.
Edit in response to the added info (and code):
To the code: I'd strongly consider writing a script/program on the receiving end to talk to the sending process over the ssh pipe. That way you have full flexibility.
The simplest thing that could work, would still appear to be sending an archive over to the receiving host. On the receiving end, filter the archive with a script that
untars each file into a temporary location
tries the appending operation cat >> specific_file
prints a 'result record' to stdout as feedback to the sender
So you'd do:
tar c file1 file2 file3 |
ssh log-bot#remote /home/log-bot/handle_logappends.sh |
while read resultcode filename
do
echo "$filename" resulted in code "resultcode"
done
To handle the feedback in C/C++ you'd look at popen, that will allow you to read the streaming feedback as if from a file, simple!
An example of such a handle_logappends.sh script on the receiving end:
#!/bin/bash
set -e # bail on error
TEMPDIR="/tmp/.receiving_$RANDOM"
mkdir "$TEMPDIR"
trap "rm -rf '$TEMPDIR/'" INT ERR EXIT
tar x -v -C "$TEMPDIR/" | while read filename
do
echo "unpacked file $filename" > /dev/stderr
## implement your file append logic here :)
## e.g. (?):
cat "$TEMPDIR/$filename" >> "result/$filename"
## HERE COMES THE FEEDBACK PART: '<code> <filename>'
echo "$?" "$filename"
done
The really neat part of this is, that since everything is in streaming mode, the feedback for the first file(s) may be arriving while the sending tar is still sending the later files to the receiving host. No unnecessary delays!
I included a tiny bit of sane error handling/cleanup but I would suggest
perhaps receiving the whole archive first, then iterating through the files?
doing the appends in atomic fashion (i.e. on a copy, then move the copy into place only if the whole append operation succeeded; this prevents partially appended logs)
Hope that helps!
Older answer:
You'd usually employ devious little tricks (not) like:
tar c file1 file2 file3 | ssh user#host -i /root/.ssh/id_rsa "tar x -C result/ -"
Add a verbose flag to see progress details
tar c file1 file2 file3 | ssh user#host -i /root/.ssh/id_rsa "tar xvC result/ -"
If you want, you can substitute cpio for tar. Add options to get more functionality (-p for preserve permissions, e.g.)
To do various separate steps over a single logical connection, you can use a ssh Master connection:
ssh user#host -i /root/.ssh/id_rsa -MNf # login, master, background without a command
for specific_file in file1 file2 file3
do
cat "$specific_file" |
ssh user#host -Mi /root/.ssh/id_rsa "cat >> 'result/$specific_file'"
# check/use error code
done
How about building on libssh2 instead of scripting ssh, and using the sftp subsystem instead of building your own file-transfer system in shell?
There's an example of performing one file append in libssh2/examples/sftp_append.c, just repeat it for the multiple files you want.
if you look at the problem from a different tactical view, you could cat all the files over from another master file. That master file is a shell script that has here documents embedded with the files' contents. Then exec the master shell script and ls the files - all in one ssh session. It's not pretty or elegant but will be successful.

Resources