Recursively grep string in a certain variable directory structure - linux

I have a directory structure on a shared Windows network like this:
\\\server\share\directory\anotherDirectory\[variable serial number]\config\
where the variable is a 4 digit integer. And there are hundreds of the [variable serial number] directories I have to go through recursively.
The drive share is mapped in Linux so it sees the Windows share and can traverse it.
Once inside the [variable serial number]\config\ directory, I need to grep in a .csv file that's named:
[variable serial number]_config_v1.csv
so the full path for an individual serial number file might look like this:
\\server\share\directory\anotherDirectory\1234\config\1234_config_v1.csv
There are hundreds of serial number directories I have to search through. I've tried adapting the answer from this SO question with no luck so far.
If it makes any difference, I'm doing this over VPN using Win10's Windows Subsystem for Linux with an Ubuntu distro.
Can I do something along the lines of:
for i in [list of serial numbers]
do
grep -in "string" $i_config_v1.csv >> log.txt
done
?? I'm not sure where to work the leading path in, or I could run the script from the root of where the serial numbered directories start?

You need to put i inside {} to delimit it from _config_v1.csv. Otherwise it will try to read the variable named i_config_v1.
for i in $list_of_serial_numbers
do
grep -in "string" "/path/to/share/$i/config/${i}_config_v1.csv"
done

Related

Read only nth first lines [sublime text]

I've got some files so big to directly open them in Sublime Text. Is there any way to open only the nth first lines? Something like head in bash? Thanks
If you're on Linux or Mac, or have Cygwin, Git Bash, or similar installed on a Windows machine, check out the split utility, which is part of the coreutils package. It does exactly what it says: it splits input into separate files. It is configurable via command-line options, like every Unix utility. For example, if you wanted to split your input file into separate 10,000-line files starting with notsobigfile and using numeric suffixes ending with .txt, you would run
split -d -l 10000 --additional-suffix=".txt" reallybigfile.txt notsobigfile
and it would output files named notsobigfile01.txt, notsobigfile02.txt, etc. If this would generate more than 100 files (00 through 99), just add -a x where x is the number of digits (the default is 2).
For all the possible options, just read the man page:
man split
If you only want to output the first part of the file, check out the options for the -n/--number flag.
To figure out how many lines your input file has, run the word counting utility using the lines option:
wc -l reallybigfile.txt

How to address some of many files that include one of a few sequences in their name, in Linux file-name system?

I have some files in my directory named as:
...
asdfab-18-121.csv
asdfab-19-221.csv
gafaac-19-289.csv
asdfax-19-311.csv
aasdfb-20-122.csv
aasdfb-20-220.csv
aberrc-20-281.csv
aasdfb-21-127.csv
aasdfb-21-224.csv
acadff-21-286.csv
...
I need to list the files that have "-19-" OR "-20-" in the middle part of their name (e.g. lines 2-7 above), at the same time. I know if only one character was variable I could use [seq] syntax. I tried
ls *#["-19-"|"-20-"]*
but it doesn't seem to work. Any ideas?
If using bash with the extglob option turned on, or ksh93, or zsh with the KSH_GLOB option turned on:
ls *-#(19|20)-*.csv

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

Iterate through files in a directory, create output files, linux

I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done

Take parts of the standard output value and put it into a array variable

I'm currently working on a script (using BASH) which backups VM file to a remote server.
I want to try and make the script a bit more dynamic by being able to just looping though each VM from a "show VM command". my idea is to take the standard output of a command which show all the VM and break up and turn it to useful variables. possibly a multi-array.
the Output comes out like this is there anyway to break it all up? say by spaces and line breaks?
Vmid Name File Guest OS Version Annotation
10 FREEPBX [datastore2] FREEPBX/FREEPBX.vmx other26xLinux64Guest vmx-08
13 AdaptivNICE2Cloud [datastore2] AdaptivNICE2Cloud/AdaptivNICE2Cloud.vmx other26xLinux64Guest vmx-08
15 IVSTelManager [datastore2] IVSTelManager/IVSTelManager.vmx debian6Guest vmx-08
4 Neptune [datastore1] Neptune/Neptune.vmx winNetEnterprise64Guest vmx-08
9 Kayako [datastore2] Kayako/Kayako.vmx other26xLinux64Guest vmx-08
I guess you need this:
$ vim-cmd vmsvc/getallvms | sed -n 's|.*\[|/vmfs/volumes/|;s|\] *|/|;s|\.vmx .*|.vmx|p'
/vmfs/volumes/datastore2/FREEPBX/FREEPBX.vmx
/vmfs/volumes/datastore2/AdaptivNICE2Cloud/AdaptivNICE2Cloud.vmx
/vmfs/volumes/datastore2/IVSTelManager/IVSTelManager.vmx
/vmfs/volumes/datastore1/Neptune/Neptune.vmx
/vmfs/volumes/datastore2/Kayako/Kayako.vmx
# Prints all VMX files paths
OR
$ vim-cmd vmsvc/getallvms | sed -n 's|.*\[|/vmfs/volumes/|;s|\] *|/|;s|/[^/]*\.vmx .*||p'
/vmfs/volumes/datastore2/FREEPBX
/vmfs/volumes/datastore2/AdaptivNICE2Cloud
/vmfs/volumes/datastore2/IVSTelManager
/vmfs/volumes/datastore1/Neptune
/vmfs/volumes/datastore2/Kayako
# Prints all directories having VMX files. These directories also contain the virtual HDDs, which you would want to backup.
(Ignore the $ in the prompt; it is still root prompt. SO would interpret it as comment if I use # in place if $..)

Resources