Bash substring from position not printing - linux

I am using the following format #{string:start:length} to extract the file name from wget's .listing file, line by line.
The format for the file is something I think we are all familiar with:
04-30-13 01:41AM 7033614 some_archive.zip
04-29-13 08:13PM <DIR> DIRECTORY NAME 1
04-29-13 05:41PM <DIR> DIRECTORY NAME 2
All file names start at pos:40, so setting :start to 39, with no :length should (and does) return the file name for each line:
#!/bin/bash
cat .listing | while read line; do
file="${line:40}"
echo $file
done
Correctly returns:
some_archive.zip
DIRECTORY NAME 1
DIRECTORY NAME 2
However, if I get any more creative, it breaks:
#!/bin/bash
cat .listing | while read line; do
file="${line:40}"
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
Returns:
$ ./test.sh
is a <DIR>ECTORY NAME 1
is a <DIR>ECTORY NAME 2
What gives? I lose "the file " and the rest of the test looks like it prints on top of "the file DIRECTORY NAME 1" from pos:0.
It's weird, what's it on account of?

The answer, as I am learning more and more with linux as I progress, is non-printing control characters.
Adding a pipe to egrep for only printing characters solved the problem:
#!/bin/bash
cat .listing | while read line; do
file=$(echo ${line:39} | egrep -o '[[:print:]]+' | head -n1)
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
Correctly returns:
$ ./test.sh
the file DIRECTORY NAME 1 is a <DIR>
the file DIRECTORY NAME 2 is a <DIR>
Wish there were a better way to visualize these control characters, but what the above does is basically take the string segment, pull out the first string of printable characters, and assign it to the variable.
I assume there is a control character at the end of the line that returns the cursor to the beginning of the line. Causing the rest of the echo to be printed there, overwriting the previous characters.'
Odd.

You can remove the \r control characters from the whole file by using the tr command on the first line of your script:
#!/bin/bash
cat .listing | tr -d '\015' | while read line; do
file="${line:39}"
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done

Related

Replace filename to a string of the first line in multiple files in bash

I have multiple fasta files, where the first line always contains a > with multiple words, for example:
File_1.fasta:
>KY620313.1 Hepatitis C virus isolate sP171215 polyprotein gene, complete cds
File_2.fasta:
>KY620314.1 Hepatitis C virus isolate sP131957 polyprotein gene, complete cds
File_3.fasta:
>KY620315.1 Hepatitis C virus isolate sP127952 polyprotein gene, complete cds
I would like to take the word starting with sP* from each file and rename each file to this string (for example: File_1.fasta to sP171215.fasta).
So far I have this:
$ for match in "$(grep -ro '>')";do
fname=$("echo $match|awk '{print $6}'")
echo mv "$match" "$fname"
done
But it doesn't work, I always get the error:
grep: warning: recursive search of stdin
I hope you can help me!
you can use something like this:
grep '>' *.fasta | while read -r line ; do
new_name="$(echo $line | cut -d' ' -f 6)"
old_name="$(echo $line | cut -d':' -f 1)"
mv $old_name "$new_name.fasta"
done
It searches for *.fasta files and handles every "hitted" line
it splits each result of grep by spaces and gets the 6th element as new name
it splits each result of grep by : and gets the first element as old name
it
moves/renames from old filename to new filename
There are several things going on with this code.
For a start, .. I actually don't get this particular error, and this might be due to different versions.
It might resolve to the fact that grep interprets '>' the same as > due to bash expansion being done badly. I would suggest maybe going for "\>".
Secondly:
fname=$("echo $match|awk '{print $6}'")
The quotes inside serve unintended purpose. Your code should like like this, if anything:
fname="$(echo $match|awk '{print $6}')"
Lastly, to properly retrieve your data, this should be your final code:
for match in "$(grep -Hr "\>")"; do
fname="$(echo "$match" | cut -d: -f1)"
new_fname="$(echo "$match" | grep -o "sP[^ ]*")".fasta
echo mv "$fname" "$new_fname"
done
Explanations:
grep -H -> you want your grep to explicitly use "Include Filename", just in case other shell environments decide to alias grep to grep -h (no filenames)
you don't want to be doing grep -o on your file search, as you want to have both the filename and the "new filename" in one data entry.
Although, i don't see why you would search for '>' and not directory for 'sP' as such:
for match in "$(grep -Hro "sP[0-9]*")"
This is not the exact same behaviour, and has different edge cases, but it just might work for you.
Quite straightforward in (g)awk :
create a file "script.awk":
FNR == 1 {
for (i=1; i<=NF; i++) {
if (index($i, "sP")==1) {
print "mv", FILENAME, $i ".fasta"
nextfile
}
}
}
use it :
awk -f script.awk *.fasta > cmmd.txt
check the content of the output.
mv File_1.fasta sP171215.fasta
mv File_2.fasta sP131957.fasta
if ok, launch rename with . cmmd.txt
For all fasta files in directory, search their first line for the first word starting with sP and rename them using that word as the basename.
Using a bash array:
for f in *.fasta; do
arr=( $(head -1 "$f") )
for word in "${arr[#]}"; do
[[ "$word" =~ ^sP* ]] && echo mv "$f" "${word}.fasta" && break
done
done
or using grep:
for f in *.fasta; do
word=$(head -1 "$f" | grep -o "\bsP\w*")
[ -z "$word" ] || echo mv "$f" "${word}.fasta"
done
Note: remove echo after you are ok with testing.

Linux: append all filenames in path to text file

I want to add the filenames of all files of a certain type (*.cub) in the path to a text file in the same path. This file will become the batch (.submit) file. That I can run overnight. I also need to adapt the name a bit.
I do not really know how to describe it better, so I'll give an example:
Let's say I have three files: 001.cub, 002.cub & 003.cub
Then the final text file must be:
[program] -i 001.cub -o 001.vdb
[program] -i 002.cub -o 002.vdb
[program] -i 003.cub -o 003.vdb
It seems a fairly easy operation, but I simply can't get it right.
Also, it really has to become a .submit (or at least some text) file. I cannot run the program immediately.
I hope someone can help!
A simple for loop will do the job:
for i in *.cub
b=$(basename "$i" .cub)
echo "program -i \"$b.cub\" -o \"$b.vdb\""
done >output.txt
Create an empty sh file
List the files *.cub and loop through them
Store the sequence by splitting on dot [.]
echo the required string and append to the sh file of step 1
echo -n "" > 'Run.sh'
for filename in `ls *.cub`
do
sequence=`echo $filename | cut -d "." -f1`
echo "Program -i $filename -o $sequence.vdb" >> Run.sh
done
Directly put the stream into the file as below:
for filename in `ls *.cub`
do
sequence=`echo $filename | cut -d "." -f1`
echo "Program -i $filename -o $sequence.vdb"
done > Run.sh
For everything before the extension to be retained in the variable:
for filename in `ls *.cub`
do
sequence=`echo $filename | rev | cut -d "." -f2- | rev`
echo "Program -i $filename -o $sequence.vdb"
done > Run.sh
For extracting only the numbers from the filename and use accordingly:
for filename in `ls *.cub`
do
sequence=`echo $filename | sed 's/[^0-9]*//g'`
echo "Program -i $filename -o $sequence.vdb"
done > Run.sh
This oneliner will do what you want:
ls *.cub | sort | awk '{split($1,x,"."); print "[program] -i "$1" -o "x[1]".vdb "}' > something.sh

grep lines containing specific string (a line can be written on max 3 lines)

I need to get all log done in my project.
I'm using this command to do that:
grep -rnw $1 -e "Logger.[view]*;$" >> log.txt
this line return all lines containing Logger.[one of the these caracters]
contained in the project directory "$1" except that there are some lines written on 2 or 3 lines (IDE formating). In this case I get only the first line only.
What can I do to get the complete text of that log knowing that a log line will always end with ");"
example of such line :
Logger.v(xxxxxxxxxxxxx
xxxxxxxxxxxxxxxx);
Here is my script:
#!/bin/bash
echo "Hello Logger!
# get project path
echo "project directory is $1"
# get all project logs and store them into temporary file tmp.txt for processing
grep -rnw $1 -e "Logger.[view]" >> tmp.txt
echo "tmp.txt created successfully"
# remove package name from previous result and store result into log.txt
sed -r 's/.{52}//' tmp.txt >> log.txt
echo "log.txt created successfully"
grep command return file_path/file_name : line_number : line.
I found this command that returns only the line even if it is written in 2 or 3 lines but without the file_path file_name and the line_number
sed -n '/Logger.[viewd]/{:start /;/!{N;b start};/Logger.[viewd]/p}' Main.java
Is there a way to have those two results combined.
example :
/home/xxx/xxx/xxx/Main.java:97:Logger.i(xxxxxxxxxxxxx);
/home/xxx/xxx/xxx/Main.java:106:Logger.d(yyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyy);
i think that's a break line problem. Try to replace grep -rnw $1 -e "Logger.[view]" >> tmp.txt by the following lines:
for i in `ls $1`;
do
cat $1/$i | tr '\n' ' ' | grep -rnw -e "Logger.[view]" >> tmp.txt
done
Here, tr '\n' ' ' replace the break line by a simple space.
I found a solution for my problem and here is my code:
# get all project logs and store them into log.txt for processing
for i in $(find -name "*.java")
do
echo >> log.txt
echo "**************** file $i ********************************" >> log.txt
echo >> log.txt
grep -rnw Logger.[viewd] $i | while read -r line ; do
# remove breaklines from first line to avoid having bad results
line="$(echo $line | sed $'s/\r//')"
# if first line ends with ");" print it to log file
if [[ ${line: -2} == ");" ]]; then
echo $line >> log.txt
# else get next line also
else
# get second line number
line_number="$(echo "$line" | cut -d : -f1)"
next_line_number=$((line_number+1))
# get second line
next_line=$(sed "${next_line_number}q;d" $i | sed -e 's/^[ \t]*//')
# concatenate first line & second line
line="$line $next_line"
# print resulting line to log file
echo $line >> log.txt
fi
done

How to read file from another file

This script lists the unit-*-slides.txt files in from directory to a filelist.txt file and from that file list it goes to the file and reads the file and gives the count of st^ lines to a file.but it is not counting in order for ex 1,2,3,4,.... it is counting like 10,1,2,3,4......
How to read it in order.
#!/bin/sh
#
outputdir=filelist
mk=$(mkdir $outputdir)
$mk
dest=$outputdir
cfile=filelist.txt
ofile="combine-slide.txt"
output=file-list.txt
path=/home/user/Desktop/script
ls $path/unit-*-slides.txt | sort -n -t '-' -k 2 > $dest/$cfile
echo "Generating files list..."
echo "Done"
#Combining
while IFS= read file
do
if [ -f "$file" ]; then
tabs=$(cat unit-*-slides.txt | grep "st^" | split -l 200)
fi
done < "$dest/$cfile"
echo "Combining Done........!"
Try with sort -n
tabs=$(cat $( ls unit-*-slides.txt | sort -n ) | grep "st^" | split -l 200)
sort -n means numeric sort, so output of ls is ordered by number.

How to read path from input file

I have a txt file that has the path to xml files.now i want to read the path from the text file and print the number of tabs present in each xml file.how to do this?
here is what i have done
txt file with path
/home/user/Desktop/softwares/firefox/searchplugins/bing.xml
/home/user/Desktop/softwares/firefox/searchplugins/eBay.xml
/home/user/Desktop/softwares/firefox/searchplugins/answers.xml
/home/user/Desktop/softwares/firefox/searchplugins/wikipedia.xml
/home/user/Desktop/softwares/firefox/blocklist.xml
code to count tabs in each file
code:
#!/bin/sh
#
FILEPATH=/home/user/Desktop/softwares/firefox/*.xml
for file in $FILEPATH; do
tabs=$(tr -cd '\t' < $file | wc -c);
echo "$tabs tabs in file $file" >> /home/user/Desktop/output.txt
done
echo "Done!"
Where /home/user/Desktop/files.txt contains the list of xml files:
#!/bin/bash
while IFS= read file
do
if [ -f "$file" ]; then
tabs=$(tr -cd '\t' < "$file" | wc -c);
echo "$tabs tabs in file $file" >> "/home/user/Desktop/output.txt"
fi
done < "/home/user/Desktop/files.txt"
echo "Done!"
sudo_O has provided an excellent answer. However, there are chances that somehow, mostly due to text editor's preferences, your tabs were converted to 8 consecutive space. If you would prefer to count them as tabs too then replace the "tabs" definition as:
tabs=$(cat test.xml | sed -e 's/ \{8\}/\t/g' | tr -cd '\t' | wc -c)
Full code:
#!/bin/sh
# original file names might contain spaces
# FILEPATH=/home/user/Desktop/softwares/firefox/*.xml
# a better option would be
FIREFOX_DIR="/home/user/Desktop/softwares/firefox/"
while read file
do
if [[ -f "$file" ]]
then
tabs=$(cat test.xml | sed -e 's/ \{8\}/\t/g' | tr -cd '\t' | wc -c)
echo "$tabs tabs in file $file" >> /home/user/Desktop/output.txt
fi
done < $FIREFOX_DIR/*.xml
echo "Done!"
but this is applicable only if you prefer to count 8 consecutive spaces as tabs.

Resources