I made a shell script the purpose of which is to find files that don't contain a particular string, then display the first line that isn't empty or otherwise useless. My script works well in the console, but for some reason when I try to direct the output to a .txt file, it comes out empty.
Here's my script:
#!/bin/bash
# takes user input.
echo "Input substance:"
read substance
echo "Listing media without $substance:"
cd media
# finds names of files that don't feature the substance given, then puts them inside an array.
searchresult=($(grep -L "$substance" *))
# iterates the array and prints the first line of each - contains both the number and the medium name.
# however, some files start with "Microorganisms" and the actual number and name feature after several empty lines
# the script checks for that occurence - and prints the first line that doesnt match these criteria.
for i in "${searchresult[#]}"
do
grep -m 1 -v "Microorganisms\|^$" $i
done >> output.txt
I've tried moving the >>output.txt to right after the grep line inside the loop, tried switching >> to > and 2>&1, tried using tee. No go.
I'm honestly feeling utterly stuck as to what the issue could be. I'm sure there's something I'm missing, but I'm nowhere near good enough with this to notice. I would very much appreciate any help.
EDIT: Added files to better illustrate what I'm working with. Sample inputs I tried: Glucose, Yeast extract, Agar. Link to files [140kB] - the folder was unzipped beforehand.
The script was given full permissions to execute. I don't think the output is being rewritten because even if I don't iterate and just run a single line of the loop, the file is empty.
Related
I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example, /dev/stdin, /dev/stdout, and /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.
I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done
I want to store output of ls command in my bash script in a variable and use each file name in a loop, but for example one file in the directory has name "Hello world", when I do variable=$(ls) "Hello" and "world" end up as two separate entries, and when I try to do
for i in $variable
do
mv $i ~
done
it shows error that files "Hello" and "world" doesn't exist.
Is there any way I can access all files in current directory and run some command even if the files have space(s) in their names.
If you must, dirfiles=(/path/of/interest/*).
And accept the admonition against parsing the output of ls!
I understand you are new to this and I'd like to help. But it isn't easy for me (us?) to provide you with an answer that would be of much help to you by the way you've stated your question.
Based on what I hear so far, you don't seem to have a basic understanding on how parameter expansions work in the shell. The following two links will be useful to you:
Matching Pathnames, Parameters
Now, if your task at hand is to operate on files meeting certain criteria then find(1) will likely to do the job.
Say it with me: don't parse the output of ls! For more information, see this post on Unix.SE.
A better way of doing this is:
for i in *
do
mv -- "$i" ~
done
or simply
mv -- * ~
I'm totally new to bash scripting but i want to solve this problem..
the command is:
objfil=`echo ${srcfil} | sed -e "s,c$,o,"`
the idea about the bash script program is to check for the source files, and check if there is an adjacent object file in the OBJ directory, if so, the rest of the program runs smoothly, if not, the iteration terminates and skips the current source file, and moves on to the next one.. it works with .c files but not on the headers, since the object filenames depend on .c files.. i want to write this command so it checks the object files not just the .c but the .h files too.. but without skipping them. i know i have to do something else too, but i need to understand what this line of command does exactly to move on. Thanks. (Sorry for my english)
UPDATE:
if test -r ${curOBJdir}/${objfil}
then
cp -v ${srcfil} ./SAVEDSRC/${srcfil}
fdone="NO"
linenums=ALL
else
fdone="YES"
err="${curOBJdir}/${objfil} is missing - ${srcfil} skipped)"
echo ${err}
echo ${err} >>${log}
fi
while test ${fdone} == "NO"
do
#rest of code ...
here is the rest of the program.. i tried to comment out the "test" part to ignore the comparison just because i only want my script to work on .h files, but without checking the e.g abc.h files has an abc.o file.. (the object file generation is needed because the end of the script there's a comparison between the hexdump of the original and modified object files). The whole script is for changing the basic types with typedefs like int to sint32_t for example.
This concrete command will substitute all c's right before line-end to o:
srcfill=abcd.c
objfil=`echo ${srcfil} | sed -e "s,c$,o,"`
echo $objfil
Output:
abcd.o
P.S. It uses a different match/replace separator: default is / but it uses ,.
I'm trying to redirect(?) my standard error/output to a text file.
I did my research, but for some reason the online answers are not working for me.
What am I doing wrong?
cd /home/user1/lists/
for dir in $(ls)
do
(
echo | $dir > /root/user1/$dir" "log.txt
) > /root/Desktop/Logs/Update.log
done
I also tried
2> /root/Desktop/Logs/Update.log
1> /root/Desktop/Logs/Update.log
&> /root/Desktop/Logs/Update.log
None of these work for me :(
Help please!
Try this for the basics:
echo hello >> log.txt 2>&1
Could be read as: echo the word hello, redirecting and appending STDOUT to the file log.txt. STDERR (file descriptor 2) is redirected to wherever STDOUT is being pointed. Note that STDOUT is the default and thus there is no "1" in front of the ">>". Works on the current line only.
To redirect and append all output and error of all commands in a script, put this line near the top. It will be in effect for the length of the script instead of doing it on each line:
exec >>log.txt 2>&1
If you are trying to obtain a list of the files in /home/user1/lists, you do not need a loop at all:
ls /home/usr1/lists/ >Update.log
If you are attempting to run every file in the directory as an executable with a newline as its input, and collect the output from all these programs in Update.log, try this:
for file in /home/user1/lists/*; do
echo | "$file"
done >Update.log
(Notice how we avoid the useless use of ls and how there is no redirection inside the loop.)
If you want to create an empty file called *.log.txt for each file in the directory, you would do
for file in /home/user1/lists/*; do
touch "$(basename "$file")"log.txt
done
(Using basename to obtain the file name without the directory part avoids the cd but you could do it the other way around. Generally, we tend to avoid changing the directory in scripts, so that the tool can be run from anywhere and generate output in the current directory.)
If you want to create a file containing a single newline, regardless of whether it already exists or not,
for file in /home/user1/lists/*; do
echo >"$(basename "$file")"log.txt
done
In your original program, you redirect the echo inside the loop, which means that the redirection after done will not receive any output at all, so the created file will be empty.
These are somewhat wild guesses at what you might actually be trying to accomplish, but should hopefully help nudge you slightly in the right direction. (This should properly be a comment, I suppose, but it's way too long and complex.)