How to iterate over files with quotes in filename? - linux

I have set of files in /home/user/source. One file in this set has name 'e e' (with single quotes). When I tried to loop over this set of files and print all filenames I got this file with name e e (quotes disappered). How write this loop to save this quotes in output? Here is the code:
#!/bin/bash
for existedFile in "$(ls /home/user/source)"
do
echo $existedFile
done
Confused moment here is when I just use ls /home/user/source output is correct.

Don't Parse the Output of ls
The output of ls can contain anything. It can contain whitespace, newlines, commas, pipe symbols, etc. This can be extremely harmful in a script.
Instead, iterate over a glob: *. An asterisk is shorthand for "everything in this directory". Bash will take care of the iteration over files. If you need to match a particular file type/pattern, you can use *.java, file*.f90, etc.

Related

Delete files in a variable - bash

i have a variable of filenames that end with a vowel. I need to delete all of these files at once. I have tried using
rm "$vowels"
but that only seems to return the files within the variable and state that there is "No such file or Directory"
Its your use of quotes: they tell rm that your variables contents are to be interpreted as a single argument (filename). Without quotes the contents will be broken into multiple arguments using the shell rules in effect.
Be aware that this can be risky if your filenames contain spaces - as theres no way to tell the difference between spaces between filenames, and spaces IN filenames.
You can get around this by using an array instead and using quoted array expansion (which I cant remember the syntax of, but might look something like rm "${array[#]}" - where each element in the array will be output as a quoted string).
SOLUTION
assigning the variable
vowel=$(find . -type f | grep "[aeiou]$")
removing all files within variable
echo $vowel | xargs rm -v

Grep for specific numbers within a text file and output per number text file

I have a text file chunk_names.txt that looks like this:
chr1_12334_64321
chr1_134435_77474
chr10_463252_74754
chr10_54265_423435
chr13_5464565_547644567
This is an example but all chromosomes are represented (1...22, X and Y). All entries follow the same formatchr{1..22, X or Y}_*string of numbers*__*string of numbers*.
I would like to split these into per chromosome files e.g. all of the chunks starting chr10 to be put into a file called chr10.txt:
In Linux I have tried :
for i in {1..22}
do
grep chr$i chunk_names.txt > chr$i.txt
done
However, the chr1.txt output file now contains all the chromosome chunks with 1 in them (1,10,11,12, etc).
How would I modify this script to separate out the chromosomes?
I also haven't tackled how to include chromosome X or Y within the same script and am currently running that separately
Things I have tried :
grep -o gives me just "chr$i" as an output
grep 'chr$i' gives me blank files
grep "chr$i" has the initial problem
Many thanks for your time.
Your 'for' loop will mean parsing your file N times (where N is the number of chromosomes/contigs in your list). Here's an agnostic approach using awk that will parse the file just once:
awk -F '_' '{ print > $1 ".txt" }' chunk_names.txt
If you include the _ following the number you can distinguish between chr1_ and e.g. chr10_. To include X and Y, simply include these in the loop
for i in {1..22} X Y
do
grep "chr${i}_" chunk_names.txt > chr$i.txt
done
To search at the beginning of the line only you can add a leading ^ to the pattern
grep "^chr${i}_" chunk_names.txt > chr$i.txt
Explanation about your attempts:
grep chr$i searches for the pattern anywhere in the line. The shell replaces $i with the value of the variable i, so you get chr1, chr2 etc.
If you enclose the pattern in double quotes as grep "chr$i" the shell will not do any file name globbing or splitting of the string, but still expand variables. In your case it is the same as without quotes.
If you use single quotes, the shell takes the literal string as is, so you always search for a line that contains chr$i (instead of chr1 etc.) which does not occur in your file.
Explanation about quotes:
The quotes in my proposed solution are not necessary in your case, but it is a good habit to quote everything. If your pattern would contain spaces or characters that are special to the shell, the quoting will make a difference.
Example:
If your file would contain a chr1* instead of the chr1_, the pattern chr${i}* would be replaced by the list of matching files.
When you already created your output files chr1.txt etc., try these commands
$ i=1; echo chr$i*
chr10.txt chr11.txt chr12.txt chr13.txt chr14.txt chr15.txt chr16.txt chr17.txt chr18.txt chr19.txt chr1.txt
$ i=1; echo "chr$i*"
chr1*
In the first case, the grepcommand
grep chr${i}* chunk_names.txt
would be expanded as
grep chr10.txt chr11.txt chr12.txt chr13.txt chr14.txt chr15.txt chr16.txt chr17.txt chr18.txt chr19.txt chr1.txt chunk_names.txt
which would search for the pattern chr10.txt in files chr11.txt ... chr1.txt and chunk_names.txt.

How to add sequential numbers say 1,2,3 etc. to each file name and also for each line of the file content in a directory?

I want to add sequential number for each file and its contents in a directory. The sequential number should be prefixed with the filename and for each line of its contents should have the same number prefixed. In this manner, the sequential numbers should be generated for all the files(for names and its contents) in the sub-folders of the directory.
I have tried using maxdepth, rename, print function as a part. but it throws error saying that "-maxdepth" - not a valid option.
I have already a part of code(to print the names and contents of text files in a directory) and this logic should be appended with it.
#!bin/bash
cd home/TESTING
for file in home/TESTING;
do
find home/TESTING/ -type f -name *.txt -exec basename {} ';' -exec cat {} \;
done
P.s - print, rename, maxdepth are not working
If the name of the first file is File1.txt and its contents is mentioned as "Louis" then the output for the filename should be 1File1.txt and the content should be as "1Louis".The same should be replaced with 2 for second file. In this manner, it has to traverse through all the subfolders in the directory and print accordingly. I have already a part of code and this logic should be appended with it.
There should be fail safe if you execute cd in a script. You can execute command in wrong directory if you don't.
In your attempt, the output would be the same even without the for cycle, as for file in home/TESTING only pass home/TESTING as argument to for so it only run once. In case of
for file in home/TESTING/* this would happen else how.
I used find without --maxdepth, so it will look into all subdirectory as well for *.txt files. If you want only the current directory $(find /home/TESTING/* -type f -name "*.txt") could be replaced to $(ls *.txt) as long you do not have directory that end to .txt there will be no problem.
#!/bin/bash
# try cd to directory, do things upon success.
if cd /home/TESTING ;then
# set sequence number
let "x = 1"
# pass every file to for that find matching, sub directories will be also as there is no maxdeapth.
for file in $(find /home/TESTING/* -type f -name "*.txt") ; do
# print sequence number, and base file name, processed by variable substitution.
# basename can be used as well but this is bash built in.
echo "${x}${file##*/}"
# print file content, and put sequence number before each line with stream editor.
sed 's#^#'"${x}"'#g' ${file}
# increase sequence number with one.
let "x++"
done
# unset sequence number
unset 'x'
else
# print error on stderr
echo 'cd to /home/TESTING directory is failed' >&2
fi
Variable Substitution:
There is more i only picked this 4 for now as they similar.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest
So ${file##*/} will take the variable of $file and drop every caracter * before the last ## slash /. The $file variable value not get modified by this, so it still contain the path and filename.
sed 's#^#'"${x}"'#g' ${file} sed is a stream editor, there is whole books about its usage, for this particular one. It usually placed into single quote, so 's#^#1#g' will add 1 the beginning of every line in a file.s is substitution, ^ is the beginning of the file, 1 is a text, g is global if you not put there the g only first mach will be affected.
# is separator it can be else as well, like / for example. I brake single quote to let variable be used and reopened the single quote.
If you like to replace a text, .txt to .php, you can use sed 's#\.txt#\.php#g' file , . have special meaning, it can replace any singe character, so it need to be escaped \, to use it as a text. else not only file.txt will be matched but file1txt as well.
It can be piped , you not need to specify file name in that case, else you have to provide at least one filename in our case it was the ${file} variable that contain the filename. As i mentioned variable substitution is not modify variable value so its still contain the filename with path.

IFS and command substitution

I am writing a shell script to read input csv files and run a java program accordingly.
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while read row
do
$myScript
IFS=$"|"
for column in $row
do
$myScript
done
done < $CSV_FILE
csv file:
a|b|c
Interestingly, $myScript outside the for loop works but the $myScript inside the for loop says "/usr/bin/java -version: not found [No such file or directory]". I have come to know that it is because I am setting IFS. If I comment IFS, and change the csv file to
a b c
It works ! I imagine the shell using the default IFS to separate the command /usr/bin/java and then apply the -version argument later. Since I changed the IFS, it is taking the entire string as a single command - or that is what I think is happening.
But this is my requirement: I have a csv file with a custom delimiter, and the command has arguments in it, separated by space. How can I do this correctly?
IFS indicates how to split the values of variables in unquoted substitutions. It applies to both $row and $myscript.
If you want to use IFS to do the splitting, which is convenient in plain sh, then you need to change the value of IFS or arrange to need the same value. In this particular case, you can easily arrange to need the same value, by defining myScript as myScript="/usr/bin/java|-version". Alternatively, you can change the value of IFS just in time. In both cases, note that an unquoted substitution doesn't just split the value using IFS, it also interprets each part as a wildcard pattern and replaces it by the list of matching file names if there are any. This means that if your CSV file contains a line like
foo|*|bar
then the row won't be foo, *, bar but foo, each file name in the current directory, bar. To process the data like this, you need to turn off with set -f. Also remember that read reads continuation lines when a line ends with a backslash, and strips leading and trailing IFS characters. Use IFS= read -r to turn off these two behaviors.
myScript="/usr/bin/java -version"
set -f
while IFS= read -r row
do
$myScript
IFS='|'
for column in $row
do
IFS=' '
$myScript
done
done
However there are better ways that avoid IFS-splitting altogether. Don't store a command in a space-separated string: it fails in complex cases, like commands that need an argument that contains a space. There are three robust ways to store a command:
Store the command in a function. This is the most natural approach. Running a command is code; you define code in a function. You can refer to the function's arguments collectively as "$#".
myScript () {
/usr/bin/java -version "$#"
}
…
myScript extra_argument_1 extra_argument_2
Store an executable command name and its arguments in an array.
myScript=(/usr/bin/java -version)
…
"${myScript[#]}" extra_argument_1 extra_argument_2
Store a shell command, i.e. something that is meant to be parsed by the shell. To evaluate the shell code in a string, use eval. Be sure to quote the argument, like any other variable expansion, to avoid premature wildcard expansion. This approach is more complex since it requires careful quoting. It's only really useful when you have to store the command in a string, for example because it comes in as a parameter to your script. Note that you can't sensibly pass extra arguments this way.
myScript='/usr/bin/java -version'
…
eval "$myScript"
Also, since you're using ksh and not plain sh, you don't need to use IFS to split the input line. Use read -A instead to directly split into an array.
#!/usr/bin/ksh
CSV_FILE=${1}
myScript=(/usr/bin/java -version)
while IFS='|' read -r -A columns
do
"${myScript[#]}"
for column in "${columns[#]}"
do
"${myScript[#]}"
done
done <"$CSV_FILE"
The simplest soultion is to avoid changing IFS and do the splitting with read -d <delimiter> like this:
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while read -A -d '|' columns
do
$myScript
for column in "${columns[#]}"
do
echo next is "$column"
$myScript
done
done < $CSV_FILE
IFS tells the shell which characters separate "words", that is, the different components of a command. So when you remove the space character from IFS and run foo bar, the script sees a single argument "foo bar" rather than "foo" and "bar".
the IFS should be placed behind of "while"
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while IFS="|" read row
do
$myScript
for column in $row
do
$myScript
done
done < $CSV_FILE

How to remove first 16 characters of all file names in a directory?

I have a directory with many files with really long, repetitive names and I would like to remove the first 16 characters from each file name.
So I would like to rename files like this:
0123456789012345file1.fits
0123456789012345file2.fits
to this:
file1.fits
file2.fits
I would like to be able to do this from the command line in the terminal.
In bash, you can run
for f in *; do mv "$f" "${f:16}"; done
to rename all files stripping off the first 16 characters of the name.
You can change the * to a more restrictive pattern such as *.fits if you don't want to rename all files in the current directory. The quotes around the parameters to mv are necessary if any filenames contain whitespace.
bash's ${var:pos:len} syntax also supports more advanced usage than the above. You can take only the first five characters with ${f::5}, or the first five characters after removing the first 16 characters with ${f:16:5}. Many other variable substitution expressions are available in bash; see a reference such as TLDP's Bash Parameter Substitution for more information.

Resources