Cat command giving " No such file or directory" - linux

I am trying to use this script to create a list of files names (including their path) and have each path separated by a comma so ideally, an output would look like: file1.txt,file2.txt,file3.txt ...etc. How do I go about this?
#!/bin/bash
LEFT=/home/ndevon/USER/SRA/PE/*_1.fastq.gz
for f in $LEFT; do
cat "${f}," >> /home/ndevon/USER/left_list.txt
done

What you want is probably
echo /home/ndevon/USER/SRA/PE/*_1.fastq.gz | tr ' ' ,
which translates spaces to commas. This works as long as your file names don't contain spaces.

Try this:
# read the filenames into an array
files=( /home/ndevon/USER/SRA/PE/*_1.fastq.gz )
# print the filenames comma-separated
IFS=,
echo "${files[*]}" > output_file

Related

Why comparison of two md5sum files is not working properly?

I have 2 lists with files with their md5sum checks and the lists have different paths for the same files.
Example of content in first file with check sums (server.list):
2c03ff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R2_001.fastq.gz/
6e6bcd84f264233cf7c428c0cfdc0c03 tmp/fastq1_L002_R1_001.fastq.gz
Example of content in two file with check sums (downloaded.list):
2c03ff18a643a1437ec0cf051b8b7b9d /home/projects/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R2_001.fastq.gz
6e6bcd84f264233cf7c428c0cfdc0c03 /home/projects/fastq1_L002_R1_001.fastq.gz
When I run the following line, I got the following lines:
awk -F"/" 'FNR==NR{filearray[$1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' downloaded.list server.list
fastq1_L001_R1_001.fastq.gz has a different md5sum
fastq1_L001_R2_001.fastq.gz has a different md5sum
fastq1_L002_R2_001.fastq.gz has a different md5sum
Why I am getting this message since the first column is the same in both files? Can someone enlighten me on this issue?
Edit:
If I remove the path and leave only the file name, it works just fine.
Edit 2:
As pointed out, there is another possibility of file path form, which does not start with /. In this case, I cannot use / as the field separator.
Assumptions:
filename (sans path) and md5sum have to match
filenames may not be listed in the same order
filenames may not exist in both files
Sample data:
$ head downloaded.list server.list
==> downloaded.list <==
2c03ff18a643a1437ec0cf051b8b7b9d /home/projects/fastq1_L001_R1_001.fastq.gz # match
YYYYf587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R5_911.fastq.gz # different md5sum
c430f587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R2_001.fastq.gz # match
MNOPf587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R8_abc.fastq.gz # filename does not exist in other file
ABCDf587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R9_004.fastq.gz # different filename but matching md5sum (vs last line of other file)
==> server.list <==
2c03ff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L001_R1_001.fastq.gz # match
c430f587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R2_001.fastq.gz # match
XXXXf587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R5_911.fastq.gz # different md5sum
TUVWff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L999_R6_922.fastq.gz # filename does not exist in other file
ABCDf587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R7_933.fastq.gz # different filename but matching md5sum (vs last line of other file)
One awk idea to address white space issues as well as verifying filename matches:
awk ' # stick with default field delimiter of white space but ...
{ md5sum=$1
n=split($2,arr,"/") # split 2nd field on "/" delimiter
fname=arr[n]
if (FNR==NR)
filearray[fname]=md5sum
else {
if (fname in filearray && filearray[fname] == $1)
next
printf "%s has a different md5sum\n",fname
}
}
' downloaded.list server.list
This generates:
fastq1_L001_R5_911.fastq.gz has a different md5sum
fastq1_L999_R6_922.fastq.gz has a different md5sum
fastq1_L001_R7_933.fastq.gz has a different md5sum
The whitespace on $1 used as an array key is causing problems. Removing it:
awk -F"/" '{gsub(/ /, "", $1)}; FNR==NR{filearray[ $1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' list1.txt list2.txt

How to delete smallest file if names are duplicate

I would like to clean up a folder with videos. I have a bunch of videos that were downloaded with different resolutions, so each file will start with the same name and then end with "_480p" or "_720p" etc.
I just want to keep the largest file of each such set.
So I am looking for a way to delete files based on
check if name before "_" is identical
if true, then delete all files except largest one
Thinking of a flexible and fast way to approach the problem, you can gather a list of files ending in "[[:digit:]]+p" and then a quick way to parse the names is to provide them on stdin to awk and let awk index an array with the file prefix (path + part of name before '_') so it will be unique for files allowing the different format size to be obtained and stored at that index.
Then it's a simply matter of comparing the stored resolution number for the file against the current file number and deleting the lesser of the two.
Your find command to locate all files in the directory below the current, recursively, could be:
find ./tmp -type f -regex "^.*[0-9]+p$"
What I would do is then pipe the filename output to a short awk script where an array stores the last seen number for a given file prefix, and then if the current record (line) resolution number if bigger than the value stored in the array, a filename using the array number is created and that file deleted with system() using rm filename. If the current line resolution number is less than what is already stored in the array for the file, you simply delete the current file.
You can do that as:
#!/usr/bin/awk -f
BEGIN { FS = "/" }
{
num = $NF # last field holds number up to 'p'
prefix = $0 # prefix is name up to "_[[:digit:]]+p
sub (/^.*_/, "", num) # isolate number
sub (/p$/, "", num) # remove 'p' at and
sub (/_[[:digit:]]+p$/, "", prefix) # isolate path and name prefix
if (prefix in a) { # current file in array a[] ?
rmfile = $0 # set file to remove to current
if (num + 0 > a[prefix] + 0) { # current number > array number
rmfile = prefix "_" a[prefix] "p" # for remove filename from array
a[prefix] = num # update array with higher num
}
system ("rm " rmfile); # delete the file
}
else
a[prefix] = num # if no num for prefix in array, store first
}
(note: the field-separator splits the fields using the directory separator so you have all file components to work with.)
Example Use/Output
With a representative set of files in a tmp/ directory below the current, e,g.
$ ls -1 tmp
a_480p
a_720p
b_1080p
b_480p
c_1080p
c_720p
Running the find command piped to the awk script named awkparse.sh would be as follows (don't forget to make the awk script executable):
$ find ./tmp -type f -regex "^.*[0-9]+p$" | ./awkparse.sh
Looking at the directory after piping the results of find to the awk script, the tmp/ directory now only contains the highest resolution (largest) files for any given filename, e.g.
$ ls -1
a_720p
b_1080p
c_1080p
This would be highly efficient. It could also handle all files in a nested directory structure where multiple directory levels hold files you need to clean out. Look things over and let me know if you have questions.
This shell script might be what you want:
previous_prefix=
for file in *_[0-9]*[0-9]p*; do
prefix=${file%_*}
resolution=${file##*_}
resolution=${resolution%%p*}
if [ "$prefix" = "$previous_prefix" ]; then
if [ "$resolution" -gt "$greater_resolution" ]; then
file_to_be_removed=$greater_file
greater_file=$file
greater_resolution=$resolution
else
file_to_be_removed=$file
fi
echo rm -- "$file_to_be_removed"
else
greater_resolution=$resolution
greater_file=$file
previous_prefix=$prefix
fi
done
Drop the echo if the output looks good.
I would try to:
list all non-smallest files (non-480p): *_720p* and *_1080p*
for each of them replace *_720p*/*_1080p* in the name with all possible smaller resolutions
and try to delete those files with rm -f, whether they exist or not
#!/bin/sh -e
shopt -s nullglob
for file in *_1080p*; do
rm -f -- "${file//_1080p/_720p}"
rm -f -- "${file//_1080p/_480p}"
done
for file in *_720p*; do
rm -f -- "${file//_720p/_480p}"
done
And here is a Bash script using nested loops to automate the above:
#!/bin/bash -e
shopt -s nullglob
res=(_1080p _720p _480p _240p)
for r in ${res[#]}; do
res=("${res[#]:1}") # remove the first element in res array
for file in *$r*; do
for r2 in ${res[#]}; do
rm -f -- "${file//$r/$r2}"
done
done
done

How do I concatenate each line of 2 variables in bash?

I have 2 variables, NUMS and TITLES.
NUMS contains the string
1
2
3
TITLES contains the string
A
B
C
How do I get output that looks like:
1 A
2 B
3 C
paste -d' ' <(echo "$NUMS") <(echo "$TITLES")
Having multi-line strings in variables suggests that you are probably doing something wrong. But you can try
paste -d ' ' <(echo "$nums") - <<<"$titles"
The basic syntax of paste is to read two or more file names; you can use a command substitution to replace a file anywhere, and you can use a here string or other redirection to receive one of the "files" on standard input (where the file name is then conventionally replaced with the pseudo-file -).
The default column separator from paste is a tab; you can replace it with a space or some other character with the -d option.
You should avoid upper case for your private variables; see also Correct Bash and shell script variable capitalization
Bash variables can contain even very long strings, but this is often clumsy and inefficient compared to reading straight from a file or pipeline.
Convert them to arrays, like this:
NUMS=($NUMS)
TITLES=($TITLES)
Then loop over indexes of whatever array, lets say NUMS like this:
for i in ${!NUMS[*]}; {
# and echo desired output
echo "${NUMS[$i]} ${TITLES[$i]}"
}
Awk alternative:
awk 'FNR==NR { map[FNR]=$0;next } { print map[FNR]" "$0} ' <(echo "$NUMS") <(echo "$TITLE")
For the first file/variable (NR==FNR), set up an array called map with the file number record as the index and the line as the value. Then for the second file, print the entry in the array as well as the line separated by a space.

How to remove all lines from a text file starting at first empty line?

What is the best way to remove all lines from a text file starting at first empty line in Bash? External tools (awk, sed...) can be used!
Example
1: ABC
2: DEF
3:
4: GHI
Line 3 and 4 should be removed and the remaining content should be saved in a new file.
With GNU sed:
sed '/^$/Q' "input_file.txt" > "output_file.txt"
With AWK:
$ awk '/^$/{exit} 1' test.txt > output.txt
Contents of output.txt
$ cat output.txt
ABC
DEF
Walkthrough: For lines that matches ^$ (start-of-line, end-of-line), exit (the whole script). For all lines, print the whole line -- of course, we won't get to this part after a line has made us exit.
Bet there are some more clever ways to do this, but here's one using bash's 'read' builtin. The question asks us to keep lines before the blank in one file and send lines after the blank to another file. You could send some of standard out one place and some another if you are willing to use 'exec' and reroute stdout mid-script, but I'm going to take a simpler approach and use a command line argument to let me know where the post-blank data should go:
#!/bin/bash
# script takes as argument the name of the file to send data once a blank line
# found
found_blank=0
while read stuff; do
if [ -z $stuff ] ; then
found_blank=1
fi
if [ $found_blank ] ; then
echo $stuff > $1
else
echo $stuff
fi
done
run it like this:
$ ./delete_from_empty.sh rest_of_stuff < demo
output is:
ABC
DEF
and 'rest_of_stuff' has
GHI
if you want the before-blank lines to go somewhere else besides stdout, simply redirect:
$ ./delete_from_empty.sh after_blank < input_file > before_blank
and you'll end up with two new files: after_blank and before_blank.
Perl version
perl -e '
open $fh, ">","stuff";
open $efh, ">", "rest_of_stuff";
while(<>){
if ($_ !~ /\w+/){
$fh=$efh;
}
print $fh $_;
}
' demo
This creates two output files and iterates over the demo data. When it hits a blank line, it flips the output from one file to the other.
Creates
stuff:
ABC
DEF
rest_of_stuff:
<blank line>
GHI
Another awk would be:
awk -vRS= '1;{exit}' file
By setting the record separator RS to be an empty string, we define the records as paragraphs separated by a sequence of empty lines. It is now easily to adapt this to select the nth block as:
awk -vRS= '(FNR==n){print;exit}' file
There is a problem with this method when processing files with a DOS line-ending (CRLF). There will be no empty lines as there will always be a CR in the line. But this problem applies to all presented methods.

Handling multiple filenames from FileOpenDialog()

I am writing a picture editing program and am using the below snippet to choose the files:
$var = FileOpenDialog("",#DesktopDir,"Images (*.jpg;*.bmp;*.png)",1+4)
$var = StringReplace($var, "|", #CRLF)
When I select multiple files all the file names are stored in $var separated by the | symbol. I replace that symbol with a newline character. But I need to run the program for all the filenames and I can't figure out how to separate the various filenames from the variable. So my programs stops if I select multiple files.
$var = FileOpenDialog("", #DesktopDir, "Images (*.jpg;*.bmp;*.png)", 1+4)
$files = StringSplit($var, "|", 2)
For $i = 0 To UBound($files)-1
$file = $files[$i]
ConsoleWrite($file & #CRLF) ; Do something with file
Next
For me the results look like this:
C:\Users\Manadar\Desktop
skin1.png
skin2.png
So it's:
Directory of files
File1
File2
File3
etc.

Resources