Query of the execution step of shell command `ls > list` - linux

root#VM-0-11-debian:~/linux/2023/01# ls
root#VM-0-11-debian:~/linux/2023/01# ls > list
root#VM-0-11-debian:~/linux/2023/01# ls
list
root#VM-0-11-debian:~/linux/2023/01# cat list
list
I know that > will redirect stdout to file. it will create the file if not present, otherwise replace it.
I would like to ask that is the shell command ls > list implementation process as I described below?
1)As the file named list not exists, so create a file named list first.
2)ls command will list the directory content(list). the content listed(list) will be in the standard output.
3)Add the content of the standard output(list) to the file named list in a replaced way.
My personal understanding of the implementation process as described above, I hope you can give me some guidance. Thank you.

The file redirection operator > is handled by your shell and any file to which you write will be created/truncated before the binary is started. That's why you can see the file name list in the content of the file: the file has already been created before the ls process was started.
So yes, your understanding is correct.
This is why it is not possible to do something like sort txt > txt – the file txt will be truncated before sort reads it. You will end up with an empty file.

Related

grep empty output file

I made a shell script the purpose of which is to find files that don't contain a particular string, then display the first line that isn't empty or otherwise useless. My script works well in the console, but for some reason when I try to direct the output to a .txt file, it comes out empty.
Here's my script:
#!/bin/bash
# takes user input.
echo "Input substance:"
read substance
echo "Listing media without $substance:"
cd media
# finds names of files that don't feature the substance given, then puts them inside an array.
searchresult=($(grep -L "$substance" *))
# iterates the array and prints the first line of each - contains both the number and the medium name.
# however, some files start with "Microorganisms" and the actual number and name feature after several empty lines
# the script checks for that occurence - and prints the first line that doesnt match these criteria.
for i in "${searchresult[#]}"
do
grep -m 1 -v "Microorganisms\|^$" $i
done >> output.txt
I've tried moving the >>output.txt to right after the grep line inside the loop, tried switching >> to > and 2>&1, tried using tee. No go.
I'm honestly feeling utterly stuck as to what the issue could be. I'm sure there's something I'm missing, but I'm nowhere near good enough with this to notice. I would very much appreciate any help.
EDIT: Added files to better illustrate what I'm working with. Sample inputs I tried: Glucose, Yeast extract, Agar. Link to files [140kB] - the folder was unzipped beforehand.
The script was given full permissions to execute. I don't think the output is being rewritten because even if I don't iterate and just run a single line of the loop, the file is empty.

Iterate through files in a directory, create output files, linux

I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done

Obtaining file names from directory in Bash

I am trying to create a zsh script to test my project. The teacher supplied us with some input files and expected output files. I need to diff the output files from myExecutable with the expected output files.
Question: Does $iF contain a string in the following code or some kind of bash reference to the file?
#!/bin/bash
inputFiles=~/project/tests/input/*
outputFiles=~/project/tests/output
for iF in $inputFiles
do
./myExecutable $iF > $outputFiles/$iF.out
done
Note:
Any tips in fulfilling my objectives would be nice. I am new to shell scripting and I am using the following websites to quickly write the script (since I have to focus on the project development and not wasting time on extra stuff):
Grammar for bash language
Begginer guide for bash
As your code is, $iF contains full path of file as a string.
N.B: Don't use for iF in $inputFiles
use for iF in ~/project/tests/input/* instead. Otherwise your code will fail if path contains spaces or newlines.
If you need to diff the files you can do another for loop on your output files. Grab just the file name with the basename command and then put that all together in a diff and output to a ".diff" file using the ">" operator to redirect standard out.
Then diff each one with the expected file, something like:
expectedOutput=~/<some path here>
diffFiles=~/<some path>
for oF in ~/project/tests/output/* ; do
file=`basename ${oF}`
diff $oF "${expectedOutput}/${file}" > "${diffFiles}/${file}.diff"
done

how to print the ouput/error to a text file?

I'm trying to redirect(?) my standard error/output to a text file.
I did my research, but for some reason the online answers are not working for me.
What am I doing wrong?
cd /home/user1/lists/
for dir in $(ls)
do
(
echo | $dir > /root/user1/$dir" "log.txt
) > /root/Desktop/Logs/Update.log
done
I also tried
2> /root/Desktop/Logs/Update.log
1> /root/Desktop/Logs/Update.log
&> /root/Desktop/Logs/Update.log
None of these work for me :(
Help please!
Try this for the basics:
echo hello >> log.txt 2>&1
Could be read as: echo the word hello, redirecting and appending STDOUT to the file log.txt. STDERR (file descriptor 2) is redirected to wherever STDOUT is being pointed. Note that STDOUT is the default and thus there is no "1" in front of the ">>". Works on the current line only.
To redirect and append all output and error of all commands in a script, put this line near the top. It will be in effect for the length of the script instead of doing it on each line:
exec >>log.txt 2>&1
If you are trying to obtain a list of the files in /home/user1/lists, you do not need a loop at all:
ls /home/usr1/lists/ >Update.log
If you are attempting to run every file in the directory as an executable with a newline as its input, and collect the output from all these programs in Update.log, try this:
for file in /home/user1/lists/*; do
echo | "$file"
done >Update.log
(Notice how we avoid the useless use of ls and how there is no redirection inside the loop.)
If you want to create an empty file called *.log.txt for each file in the directory, you would do
for file in /home/user1/lists/*; do
touch "$(basename "$file")"log.txt
done
(Using basename to obtain the file name without the directory part avoids the cd but you could do it the other way around. Generally, we tend to avoid changing the directory in scripts, so that the tool can be run from anywhere and generate output in the current directory.)
If you want to create a file containing a single newline, regardless of whether it already exists or not,
for file in /home/user1/lists/*; do
echo >"$(basename "$file")"log.txt
done
In your original program, you redirect the echo inside the loop, which means that the redirection after done will not receive any output at all, so the created file will be empty.
These are somewhat wild guesses at what you might actually be trying to accomplish, but should hopefully help nudge you slightly in the right direction. (This should properly be a comment, I suppose, but it's way too long and complex.)

Piping with multiple commands

Assume you have a file called “heading” as follows
echo "Permissions^V<TAB>^V<TAB>Size^V<TAB>^V<TAB>File Name" > heading
echo "-------------------------------------------------------" >> heading
Write a (single) set of commands that will create a report as follows:
make a list of the names, permissions and size of all the files in your current directory,
matching (roughly) the format of the heading you just created,
put the list of files directly following the heading, and
save it all into a file called “file.list”.
All this is to be done without destroying the heading file.
I need to be able to do this all in a pipleline without altering the file. I can't seem to do this without destroying the file. Can somebody please make a pipe for me?
You can use command group:
{ cat heading; ls -l | sed 's/:/^V<tab>^V<tab>/g'; } > file.list

Resources