How to read path from input file

How to read path from input file - linux

I have a txt file that has the path to xml files.now i want to read the path from the text file and print the number of tabs present in each xml file.how to do this?
here is what i have done
txt file with path
/home/user/Desktop/softwares/firefox/searchplugins/bing.xml
/home/user/Desktop/softwares/firefox/searchplugins/eBay.xml
/home/user/Desktop/softwares/firefox/searchplugins/answers.xml
/home/user/Desktop/softwares/firefox/searchplugins/wikipedia.xml
/home/user/Desktop/softwares/firefox/blocklist.xml
code to count tabs in each file
code:
#!/bin/sh
#
FILEPATH=/home/user/Desktop/softwares/firefox/*.xml
for file in $FILEPATH; do
tabs=$(tr -cd '\t' < $file | wc -c);
echo "$tabs tabs in file $file" >> /home/user/Desktop/output.txt
done
echo "Done!"

Where /home/user/Desktop/files.txt contains the list of xml files:
#!/bin/bash
while IFS= read file
do
if [ -f "$file" ]; then
tabs=$(tr -cd '\t' < "$file" | wc -c);
echo "$tabs tabs in file $file" >> "/home/user/Desktop/output.txt"
fi
done < "/home/user/Desktop/files.txt"
echo "Done!"

sudo_O has provided an excellent answer. However, there are chances that somehow, mostly due to text editor's preferences, your tabs were converted to 8 consecutive space. If you would prefer to count them as tabs too then replace the "tabs" definition as:
tabs=$(cat test.xml | sed -e 's/ \{8\}/\t/g' | tr -cd '\t' | wc -c)
Full code:
#!/bin/sh
# original file names might contain spaces
# FILEPATH=/home/user/Desktop/softwares/firefox/*.xml
# a better option would be
FIREFOX_DIR="/home/user/Desktop/softwares/firefox/"
while read file
do
if [[ -f "$file" ]]
then
tabs=$(cat test.xml | sed -e 's/ \{8\}/\t/g' | tr -cd '\t' | wc -c)
echo "$tabs tabs in file $file" >> /home/user/Desktop/output.txt
fi
done < $FIREFOX_DIR/*.xml
echo "Done!"
but this is applicable only if you prefer to count 8 consecutive spaces as tabs.

Related

Search in CSV and count the entries

I have a csv file which imaging my inventory sort and uniq:
|Nivcomp Nivelliergerät
|Bosch Rotationslaser GRL 300 HV
|Renault T440
|Renault Master
|Spritzer Silo
...
The second CSV file contains the complete inventory. I would like to find and count the entries of uniq.csv in cominv.csv. So that I become a new csv file with:
|Renault T440|20
|Spritzer Silo |10
...
I try with while read line, cat and grep but it would not really work.
#!/usr/bin/env bash
file="cominv.csv"
newfile="uniq.csv"
if [[ -f $file ]]; then
cat $file | uniq > $newfile
fi
if [[ -f $newfile ]]; then
while read line
do
cat $file | grep $line | wc -l
done < $newfile
fi
Thank you for help
Silvio

How to rename a file by retaining first 6 characters of my file name and remove rest of the characters?

I have file names like: Rv0012_gyrB.txt, Rv0001_Rv.txt
How to rename a file by retaining first 6 characters of my file name and remove rest of the characters?
My desired output should be:
Rv0012.txt and Rv0001.txt
Please let me know, how to do it using a script in Linux for multiple files.

for file in *; do
filename=${file%_*}
fileext=${file##*.}
if [ "$fileext" = "$file" ]; then
mv "$file $filename"
else
mv "$file $filename.$fileext"
fi
done
This should do it, assuming you want to separate at the first occurence of underscore.

If you want to keep first 6 characters, then this:
for file in `ls | grep .txt`;
do
extension="${file##*.}"
filename="${file%.*}"
filename=${filename:0:6}
echo $filename.$extension
mv $file $filename.$extension
done
If you want to get all characters before "_", then this will do the job
for file in `ls | grep .txt`;
do
extension="${file##*.}"
filename="${file%.*}"
filename=`echo $filename | cut -d "_" -f1`
echo $filename.$extension
mv $file $filename.$extension
done
In case you have some files without extensions, try this
for file in `ls`;
do
extension="${file##*.}"
filename="${file%.*}"
filename=`echo $filename | cut -d "_" -f1`
if [ $file == $extension ]
then
mv $file $filename
else
mv $file $filename.$extension
fi
done

grep lines containing specific string (a line can be written on max 3 lines)

I need to get all log done in my project.
I'm using this command to do that:
grep -rnw $1 -e "Logger.[view]*;$" >> log.txt
this line return all lines containing Logger.[one of the these caracters]
contained in the project directory "$1" except that there are some lines written on 2 or 3 lines (IDE formating). In this case I get only the first line only.
What can I do to get the complete text of that log knowing that a log line will always end with ");"
example of such line :
Logger.v(xxxxxxxxxxxxx
xxxxxxxxxxxxxxxx);
Here is my script:
#!/bin/bash
echo "Hello Logger!
# get project path
echo "project directory is $1"
# get all project logs and store them into temporary file tmp.txt for processing
grep -rnw $1 -e "Logger.[view]" >> tmp.txt
echo "tmp.txt created successfully"
# remove package name from previous result and store result into log.txt
sed -r 's/.{52}//' tmp.txt >> log.txt
echo "log.txt created successfully"
grep command return file_path/file_name : line_number : line.
I found this command that returns only the line even if it is written in 2 or 3 lines but without the file_path file_name and the line_number
sed -n '/Logger.[viewd]/{:start /;/!{N;b start};/Logger.[viewd]/p}' Main.java
Is there a way to have those two results combined.
example :
/home/xxx/xxx/xxx/Main.java:97:Logger.i(xxxxxxxxxxxxx);
/home/xxx/xxx/xxx/Main.java:106:Logger.d(yyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyy);

i think that's a break line problem. Try to replace grep -rnw $1 -e "Logger.[view]" >> tmp.txt by the following lines:
for i in `ls $1`;
do
cat $1/$i | tr '\n' ' ' | grep -rnw -e "Logger.[view]" >> tmp.txt
done
Here, tr '\n' ' ' replace the break line by a simple space.

I found a solution for my problem and here is my code:
# get all project logs and store them into log.txt for processing
for i in $(find -name "*.java")
do
echo >> log.txt
echo "**************** file $i ********************************" >> log.txt
echo >> log.txt
grep -rnw Logger.[viewd] $i | while read -r line ; do
# remove breaklines from first line to avoid having bad results
line="$(echo $line | sed $'s/\r//')"
# if first line ends with ");" print it to log file
if [[ ${line: -2} == ");" ]]; then
echo $line >> log.txt
# else get next line also
else
# get second line number
line_number="$(echo "$line" | cut -d : -f1)"
next_line_number=$((line_number+1))
# get second line
next_line=$(sed "${next_line_number}q;d" $i | sed -e 's/^[ \t]*//')
# concatenate first line & second line
line="$line $next_line"
# print resulting line to log file
echo $line >> log.txt
fi
done

Can't rename a file name

I have a small bash script that download and rename files. The problem is with some gibberish not standard characters that bash can't understand.
for example:
�������� ���� ���'�-2.jpg
my bash
while read line; do
if [ ! -z "$line" ]; then
NEW_FILENAME=$(echo "$line" | uniconv -encode Russian-Translit | uniconv -encode Latin | tr -d '\[\]\!\#\#\$\%\^\&\*\(\)\?\'')
mv "$line" "$NEW_FILENAME"
fi
done <<< "$FILES_TO_CONVERT"

why don't you delete those characters with something like :
sed 's/[^a-zA-Z0-9_\.-]//g'

Bash substring from position not printing

I am using the following format #{string:start:length} to extract the file name from wget's .listing file, line by line.
The format for the file is something I think we are all familiar with:
04-30-13 01:41AM 7033614 some_archive.zip
04-29-13 08:13PM <DIR> DIRECTORY NAME 1
04-29-13 05:41PM <DIR> DIRECTORY NAME 2
All file names start at pos:40, so setting :start to 39, with no :length should (and does) return the file name for each line:
#!/bin/bash
cat .listing | while read line; do
file="${line:40}"
echo $file
done
Correctly returns:
some_archive.zip
DIRECTORY NAME 1
DIRECTORY NAME 2
However, if I get any more creative, it breaks:
#!/bin/bash
cat .listing | while read line; do
file="${line:40}"
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
Returns:
$ ./test.sh
is a <DIR>ECTORY NAME 1
is a <DIR>ECTORY NAME 2
What gives? I lose "the file " and the rest of the test looks like it prints on top of "the file DIRECTORY NAME 1" from pos:0.
It's weird, what's it on account of?

The answer, as I am learning more and more with linux as I progress, is non-printing control characters.
Adding a pipe to egrep for only printing characters solved the problem:
#!/bin/bash
cat .listing | while read line; do
file=$(echo ${line:39} | egrep -o '[[:print:]]+' | head -n1)
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
Correctly returns:
$ ./test.sh
the file DIRECTORY NAME 1 is a <DIR>
the file DIRECTORY NAME 2 is a <DIR>
Wish there were a better way to visualize these control characters, but what the above does is basically take the string segment, pull out the first string of printable characters, and assign it to the variable.
I assume there is a control character at the end of the line that returns the cursor to the beginning of the line. Causing the rest of the echo to be printed there, overwriting the previous characters.'
Odd.

You can remove the \r control characters from the whole file by using the tr command on the first line of your script:
#!/bin/bash
cat .listing | tr -d '\015' | while read line; do
file="${line:39}"
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to read path from input file - linux

Where /home/user/Desktop/files.txt contains the list of xml files: #!/bin/bash while IFS= read file do if [ -f "$file" ]; then tabs=$(tr -cd '\t' < "$file" | wc -c); echo "$tabs tabs in file $file" >> "/home/user/Desktop/output.txt" fi done < "/home/user/Desktop/files.txt" echo "Done!"

Related

Search in CSV and count the entries

How to rename a file by retaining first 6 characters of my file name and remove rest of the characters?

grep lines containing specific string (a line can be written on max 3 lines)

Can't rename a file name

Bash substring from position not printing

Categories

Resources