Take parts of the standard output value and put it into a array variable - linux

I'm currently working on a script (using BASH) which backups VM file to a remote server.
I want to try and make the script a bit more dynamic by being able to just looping though each VM from a "show VM command". my idea is to take the standard output of a command which show all the VM and break up and turn it to useful variables. possibly a multi-array.
the Output comes out like this is there anyway to break it all up? say by spaces and line breaks?
Vmid Name File Guest OS Version Annotation
10 FREEPBX [datastore2] FREEPBX/FREEPBX.vmx other26xLinux64Guest vmx-08
13 AdaptivNICE2Cloud [datastore2] AdaptivNICE2Cloud/AdaptivNICE2Cloud.vmx other26xLinux64Guest vmx-08
15 IVSTelManager [datastore2] IVSTelManager/IVSTelManager.vmx debian6Guest vmx-08
4 Neptune [datastore1] Neptune/Neptune.vmx winNetEnterprise64Guest vmx-08
9 Kayako [datastore2] Kayako/Kayako.vmx other26xLinux64Guest vmx-08

I guess you need this:
$ vim-cmd vmsvc/getallvms | sed -n 's|.*\[|/vmfs/volumes/|;s|\] *|/|;s|\.vmx .*|.vmx|p'
/vmfs/volumes/datastore2/FREEPBX/FREEPBX.vmx
/vmfs/volumes/datastore2/AdaptivNICE2Cloud/AdaptivNICE2Cloud.vmx
/vmfs/volumes/datastore2/IVSTelManager/IVSTelManager.vmx
/vmfs/volumes/datastore1/Neptune/Neptune.vmx
/vmfs/volumes/datastore2/Kayako/Kayako.vmx
# Prints all VMX files paths
OR
$ vim-cmd vmsvc/getallvms | sed -n 's|.*\[|/vmfs/volumes/|;s|\] *|/|;s|/[^/]*\.vmx .*||p'
/vmfs/volumes/datastore2/FREEPBX
/vmfs/volumes/datastore2/AdaptivNICE2Cloud
/vmfs/volumes/datastore2/IVSTelManager
/vmfs/volumes/datastore1/Neptune
/vmfs/volumes/datastore2/Kayako
# Prints all directories having VMX files. These directories also contain the virtual HDDs, which you would want to backup.
(Ignore the $ in the prompt; it is still root prompt. SO would interpret it as comment if I use # in place if $..)

Related

Recursively grep string in a certain variable directory structure

I have a directory structure on a shared Windows network like this:
\\\server\share\directory\anotherDirectory\[variable serial number]\config\
where the variable is a 4 digit integer. And there are hundreds of the [variable serial number] directories I have to go through recursively.
The drive share is mapped in Linux so it sees the Windows share and can traverse it.
Once inside the [variable serial number]\config\ directory, I need to grep in a .csv file that's named:
[variable serial number]_config_v1.csv
so the full path for an individual serial number file might look like this:
\\server\share\directory\anotherDirectory\1234\config\1234_config_v1.csv
There are hundreds of serial number directories I have to search through. I've tried adapting the answer from this SO question with no luck so far.
If it makes any difference, I'm doing this over VPN using Win10's Windows Subsystem for Linux with an Ubuntu distro.
Can I do something along the lines of:
for i in [list of serial numbers]
do
grep -in "string" $i_config_v1.csv >> log.txt
done
?? I'm not sure where to work the leading path in, or I could run the script from the root of where the serial numbered directories start?
You need to put i inside {} to delimit it from _config_v1.csv. Otherwise it will try to read the variable named i_config_v1.
for i in $list_of_serial_numbers
do
grep -in "string" "/path/to/share/$i/config/${i}_config_v1.csv"
done

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

split a string and get the third last field from it?

I am running my shell script on multiple machines and all those machines can be in different datacenters.
If machine is in abc datacenter, then I don't want to sleep at all and move forward to next line in the shell script.
If machine is in def datacenter, then I want to sleep for 30 minutes and after that I will move to the next line in the shell script.
If machine is in pqr datacenter, then I want to sleep for 60 minutes and after that I will move to the next line in the shell script.
My machine name is like this and it will be always and as you can see, datacenter name is always before .host.com and it will be like this only.
machineA.abc.host.com
machineB.def.host.com
machineC.pqr.host.com
machinef-12341.testra.abc.host.com
.....
In my below shell script, I already have machine name stored in HOSTNAME variable so how can I extract the datacenter name from that in shell script and apply above conditions? I need to extract datacenter name which is just before .host.com so I need to do start from at the end?
#!/usr/bin/env bash
HOSTNAME=$hostname
.....
// I want to execute this line after the above if/else if logic
echo "Hello World"
What is the best way to do this? I can split the lines into variables but how to get relevant portion which I need and then apply if/elseif logic here?
Two different solutions in a testloop:
hosts="machineA.abc.host.com machineB.def.host.com machineC.pqr.host.com machinef-12341.testra.abc.host.com"
for testhost in ${hosts}; do
echo "sed ${testhost}: $(sed 's/.*\.\([^.]*\).host.com$/\1/' <<< "${testhost}")"
echo "cut ${testhost}: $(rev <<< "${testhost}"|cut -d"." -f3 | rev)"
done

Iterate through files in a directory, create output files, linux

I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done

How to use sed command to delete lines without backup file?

I have large file with size of 130GB.
# ls -lrth
-rw-------. 1 root root 129G Apr 20 04:25 syslog.log
So I need to reduce file size by deleting line which starts with "Nov 2" , So I have given the following command,
sed -i '/Nov 2/d' syslog.log
So I can't edit file using VIM editor also.
When I trigger SED command , its creating backup file also. But I don't have much space in root. Please try to give alternate solution to delete particular line from this file without increasing space in server.
It does not create a real backup file. sed is a stream editor. When applied to a file with option -i it will stream that file through the sed process, write the output to a new file (a temporary one), when everything is done, it will rename the new file to the original name.
(There are options to create backup files also, but you didn't give them, so I won't mention that further.)
In your case you have a very large file and don't want to create any copy, however temporary. For this you need to open the file for reading and writing at the same time, then your sed process can overwrite the original. After this, you will have to truncate the file at the end of the writing.
To demonstrate how this can be done, we first perform a test case.
Create a test file, containing lots of lines:
seq 0 999999 > x
Now, lets say we want to remove all lines containing the digit 4:
grep -v 4 1<>x <x
This will open the file for reading and writing as STDOUT (1), and for reading as STDIN. The grep command will read all lines and will output only the lines not containing a 4 (option -v).
This will effectively overwrite the beginning of the original file.
You will not know how long the output is, so after the output the original contents of the file will appear:
…
999991
999992
999993
999995
999996
999997
999998
999999
537824
537825
537826
537827
537828
537829
…
You can use the Unix tool truncate to shorten your file manually afterwards. In a real scenario you will have trouble finding the right spot for this, so it makes sense to count the number of bytes written (using wc):
(Don't forget to recreate the original x for this test.)
(grep -v 4 <x | tee /dev/stderr 1<>x) |& wc -c
This will preform the step above and additionally print out the number of bytes written to the terminal, in this example case the output will be 3653658. Now use truncate:
truncate -s 3653658 x
Now you have the result you want.
If you want to do this in a script, i. e. without interaction, you can use this:
length=$((grep -v 4 <x | tee /dev/stderr 1<>x) |& wc -c)
truncate -s "$length" x
I cannot guarantee that this will work for files >2GB or >4GB on your machine; depending on your operating system (32bit?) and the versions of the installed tools you might run into largefile issues. I'd perform tests with large files first (>4GB as this is typically a limit for many things) and then cross your fingers and give it a try :)
Some caveats you have to keep in mind:
Of course, nobody is supposed to append log entries to that log file while the procedure is running.
Also, any abort during the running of the process (power failure, signal caught, etc.) will leave the file in an undefined state. But re-running the command again after such a mishap will in most cases produce the correct output; some lines might be doubled, but not more than a single line should be corrupted then.
The output must be smaller than the input, of course, otherwise the writing will overtake the reading, corrupting the whole result so that lines which should be there will be missing (or truncated at the start).

Resources