I have a small bash script that download and rename files. The problem is with some gibberish not standard characters that bash can't understand.
for example:
�������� ���� ���'�-2.jpg
my bash
while read line; do
if [ ! -z "$line" ]; then
NEW_FILENAME=$(echo "$line" | uniconv -encode Russian-Translit | uniconv -encode Latin | tr -d '\[\]\!\#\#\$\%\^\&\*\(\)\?\'')
mv "$line" "$NEW_FILENAME"
fi
done <<< "$FILES_TO_CONVERT"
why don't you delete those characters with something like :
sed 's/[^a-zA-Z0-9_\.-]//g'
Related
I have. directory with ~250 .txt files in it. Each of these files has a title like this:
Abraham Lincoln [December 01, 1862].txt
George Washington [October 25, 1790].txt
etc...
However, these are terrible file names for reading into python and I want to iterate over all of them to change them to a more suitable format.
I've tried similar things for changing single variables that are shared across many files. But I can't wrap my head around how I should iterate over these files and change the formatting of their names while still keeping the same information.
The ideal output would be something like
1861_12_01_abraham_lincoln.txt
1790_10_25_george_washington.txt
etc...
Please try the straightforward (tedious) bash script:
#!/bin/bash
declare -A map=(["January"]="01" ["February"]="02" ["March"]="03" ["April"]="04" ["May"]="05" ["June"]="06" ["July"]="07" ["August"]="08" ["September"]="09" ["October"]="10" ["November"]="11" ["December"]="12")
pat='^([^[]+) \[([A-Za-z]+) ([0-9]+), ([0-9]+)]\.txt$'
for i in *.txt; do
if [[ $i =~ $pat ]]; then
newname="$(printf "%s_%s_%s_%s.txt" "${BASH_REMATCH[4]}" "${map["${BASH_REMATCH[2]}"]}" "${BASH_REMATCH[3]}" "$(tr 'A-Z ' 'a-z_' <<< "${BASH_REMATCH[1]}")")"
mv -- "$i" "$newname"
fi
done
for file in *.txt; do
# extract parts of the filename to be differently formatted with a regex match
[[ $file =~ (.*)\[(.*)\] ]] || { echo "invalid file $file"; exit; }
# format extracted strings and generate the new filename
formatted_date=$(date -d "${BASH_REMATCH[2]}" +"%Y_%m_%d")
name="${BASH_REMATCH[1]// /_}" # replace spaces in the name with underscores
f="${formatted_date}_${name,,}" # convert name to lower-case and append it to date string
new_filename="${f::-1}.txt" # remove trailing underscore and add `.txt` extension
# do what you need here
echo $new_filename
# mv $file $new_filename
done
I like to pull the filename apart, then put it back together.
Also GNU date can parse-out the time, which is simpler than using sed or a big case statement to convert "October" to "10".
#! /usr/bin/bash
if [ "$1" == "" ] || [ "$1" == "--help" ]; then
echo "Give a filename like \"Abraham Lincoln [December 01, 1862].txt\" as an argument"
exit 2
fi
filename="$1"
# remove the brackets
filename=`echo "$filename" | sed -e 's/[\[]//g;s/\]//g'`
# cut out the name
namepart=`echo "$filename" | awk '{ print $1" "$2 }'`
# cut out the date
datepart=`echo "$filename" | awk '{ print $3" "$4" "$5 }' | sed -e 's/\.txt//'`
# format up the date (relies on GNU date)
datepart=`date --date="$datepart" +"%Y_%m_%d"`
# put it back together with underscores, in lower case
final=`echo "$namepart $datepart.txt" | tr '[A-Z]' '[a-z]' | sed -e 's/ /_/g'`
echo mv \"$1\" \"$final\"
EDIT: converted to BASH, from Bourne shell.
suppose I have a text file something.txt. written with
4062,2016-12-31
I want to send "4062" in one command and "2016-12-31" as a string in another command in a script. Can it be done with BASH scripting?
IFS="," read -r a b < file
echo "$a"
echo "$b"
Output:
4062
2016-12-31
This would work in most cases, unless you are handling a very large file.
for line $(cat yourfile.txt)
do
field1=$(echo $line | cut -f1 -d,)
field2=$(echo $line | cut -f2 -d,)
your_command_1 $field1
your_command_2 $field2
done
a=$(cat file.txt); IFS=',' list=($a) ; echo ${list[0]}; echo ${list[1]}
I have the following script and for some reason it is not working
find . -name '*---*' | while read fname3
do
new_fname3=`echo $fname3 | tr "---" "-"`
if [ -e $new_fname3 ]
then
echo "File $new_fname3 already exists. Not replacing $fname3"
else
echo "Creating new file $new_fname3 to replace $fname3"
mv "$fname3" $new_fname3
fi
done
However if I use
find . -name '*---*' | while read fname3
do
new_fname3=`echo $fname3 | tr "-" "_"`
if [ -e $new_fname3 ]
then
echo "File $new_fname3 already exists. Not replacing $fname3"
else
echo "Creating new file $new_fname3 to replace $fname3"
mv "$fname3" $new_fname3
fi
done
The script works but I end up with 3 underscores "_" how can I replace the 3 dashes "---" with a single dash?
Thanks,
Have a look at man tr. tr will just replace single characters.
Use something like perl -wpe "s/---/-/" instead.
Also have a look at man 1p rename. It is doing pretty much exactly what you want:
rename 's/---/-/' *---*
I believe you need to change the tr for a sed substitution:
tr '---' '-' should be changed to sed -e 's/---/-/g
As an example of the difference:
$ echo "a---b" | tr '---' '-'
tr: unrecognised option '---'
try `tr --help' for more information
$ echo "a---b" | sed -e 's/---/-/g'
a-b
I am using the following format #{string:start:length} to extract the file name from wget's .listing file, line by line.
The format for the file is something I think we are all familiar with:
04-30-13 01:41AM 7033614 some_archive.zip
04-29-13 08:13PM <DIR> DIRECTORY NAME 1
04-29-13 05:41PM <DIR> DIRECTORY NAME 2
All file names start at pos:40, so setting :start to 39, with no :length should (and does) return the file name for each line:
#!/bin/bash
cat .listing | while read line; do
file="${line:40}"
echo $file
done
Correctly returns:
some_archive.zip
DIRECTORY NAME 1
DIRECTORY NAME 2
However, if I get any more creative, it breaks:
#!/bin/bash
cat .listing | while read line; do
file="${line:40}"
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
Returns:
$ ./test.sh
is a <DIR>ECTORY NAME 1
is a <DIR>ECTORY NAME 2
What gives? I lose "the file " and the rest of the test looks like it prints on top of "the file DIRECTORY NAME 1" from pos:0.
It's weird, what's it on account of?
The answer, as I am learning more and more with linux as I progress, is non-printing control characters.
Adding a pipe to egrep for only printing characters solved the problem:
#!/bin/bash
cat .listing | while read line; do
file=$(echo ${line:39} | egrep -o '[[:print:]]+' | head -n1)
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
Correctly returns:
$ ./test.sh
the file DIRECTORY NAME 1 is a <DIR>
the file DIRECTORY NAME 2 is a <DIR>
Wish there were a better way to visualize these control characters, but what the above does is basically take the string segment, pull out the first string of printable characters, and assign it to the variable.
I assume there is a control character at the end of the line that returns the cursor to the beginning of the line. Causing the rest of the echo to be printed there, overwriting the previous characters.'
Odd.
You can remove the \r control characters from the whole file by using the tr command on the first line of your script:
#!/bin/bash
cat .listing | tr -d '\015' | while read line; do
file="${line:39}"
dir=$(echo $line | egrep -o '<DIR>' | head -n1)
if [ $dir ]; then
echo "the file $file is a $dir"
fi
done
I have a txt file that has the path to xml files.now i want to read the path from the text file and print the number of tabs present in each xml file.how to do this?
here is what i have done
txt file with path
/home/user/Desktop/softwares/firefox/searchplugins/bing.xml
/home/user/Desktop/softwares/firefox/searchplugins/eBay.xml
/home/user/Desktop/softwares/firefox/searchplugins/answers.xml
/home/user/Desktop/softwares/firefox/searchplugins/wikipedia.xml
/home/user/Desktop/softwares/firefox/blocklist.xml
code to count tabs in each file
code:
#!/bin/sh
#
FILEPATH=/home/user/Desktop/softwares/firefox/*.xml
for file in $FILEPATH; do
tabs=$(tr -cd '\t' < $file | wc -c);
echo "$tabs tabs in file $file" >> /home/user/Desktop/output.txt
done
echo "Done!"
Where /home/user/Desktop/files.txt contains the list of xml files:
#!/bin/bash
while IFS= read file
do
if [ -f "$file" ]; then
tabs=$(tr -cd '\t' < "$file" | wc -c);
echo "$tabs tabs in file $file" >> "/home/user/Desktop/output.txt"
fi
done < "/home/user/Desktop/files.txt"
echo "Done!"
sudo_O has provided an excellent answer. However, there are chances that somehow, mostly due to text editor's preferences, your tabs were converted to 8 consecutive space. If you would prefer to count them as tabs too then replace the "tabs" definition as:
tabs=$(cat test.xml | sed -e 's/ \{8\}/\t/g' | tr -cd '\t' | wc -c)
Full code:
#!/bin/sh
# original file names might contain spaces
# FILEPATH=/home/user/Desktop/softwares/firefox/*.xml
# a better option would be
FIREFOX_DIR="/home/user/Desktop/softwares/firefox/"
while read file
do
if [[ -f "$file" ]]
then
tabs=$(cat test.xml | sed -e 's/ \{8\}/\t/g' | tr -cd '\t' | wc -c)
echo "$tabs tabs in file $file" >> /home/user/Desktop/output.txt
fi
done < $FIREFOX_DIR/*.xml
echo "Done!"
but this is applicable only if you prefer to count 8 consecutive spaces as tabs.