iconv any encoding to UTF-8 - linux

I am trying to point iconv to a directory and all files will be converted UTF-8 regardless of the current encoding
I am using this script but you have to specify what encoding you are going FROM. How can I make it autdetect the current encoding?
ICONVBIN='/usr/bin/iconv' # path to iconv binary
if [ $# -lt 3 ]
echo "$0 dir from_charset to_charset"
for f in $1/*
if test -f $f
echo -e "\nConverting $f"
/bin/mv $f $f.old
$ICONVBIN -f $2 -t $3 $f.old > $f
echo -e "\nSkipping $f - not a regular file";
terminal line
sudo convert/dir_iconv.sh convert/books CURRENT_ENCODING utf8

Maybe you are looking for enca:
Enca is an Extremely Naive Charset Analyser. It detects character set and encoding of text files and can also convert them to other encodings using either a built-in converter or external libraries and tools like libiconv, librecode, or cstocs.
Currently it supports Belarusian, Bulgarian, Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, Ukrainian, Chinese, and some multibyte encodings independently on language.
Note that in general, autodetection of current encoding is a difficult process (the same byte sequence can be correct text in multiple encodings). enca uses heuristics based on the language you tell it to detect (to limit the number of encodings). You can use enconv to convert text files to a single encoding.

You can get what you need using standard gnu utils file and awk. Example:
file -bi .xsession-errors
gives me:
"text/plain; charset=us-ascii"
so file -bi .xsession-errors |awk -F "=" '{print $2}'
gives me
I use it in scripts like so:
CHARSET="$(file -bi "$i"|awk -F "=" '{print $2}')"
if [ "$CHARSET" != utf-8 ]; then
iconv -f "$CHARSET" -t utf8 "$i" -o outfile

Compiling all them. Go to dir, create dir2utf8.sh:
# converting all files in a dir to utf8
for f in *
if test -f $f then
echo -e "\nConverting $f"
CHARSET="$(file -bi "$f"|awk -F "=" '{print $2}')"
if [ "$CHARSET" != utf-8 ]; then
iconv -f "$CHARSET" -t utf8 "$f" -o "$f"
echo -e "\nSkipping $f - it's a regular file";

Here is my solution to in place all files using recode and uchardet:
apt-get -y install recode uchardet > /dev/null
find "$1" -type f | while read FFN # 'dir' should be changed...
encoding=$(uchardet "$FFN")
echo "$FFN: $encoding"
enc=`echo $encoding | sed 's#^x-mac-#mac#'`
set +x
recode $enc..UTF-8 "$FFN"
put it into convert-dir-to-utf8.sh and run:
bash convert-dir-to-utf8.sh /pat/to/my/trash/dir
Note that sed is a workaround for mac encodings here.
Many uncommon encodings need workarounds like this.

First answer
find "<YOUR_FOLDER_PATH>" -name '*' -type f -exec grep -Iq . {} \; -print0 |
while IFS= read -r -d $'\0' LINE_FILE; do
CHARSET=$(uchardet $LINE_FILE)
echo "Converting ($CHARSET) $LINE_FILE"
# NOTE: Convert/reconvert to utf8. By Questor
iconv -f "$CHARSET" -t utf8 "$LINE_FILE" -o "$LINE_FILE"
# NOTE: Remove "BOM" if exists as it is unnecessary. By Questor
# [Refs.: https://stackoverflow.com/a/2223926/3223785 ,
# https://stackoverflow.com/a/45240995/3223785 ]
sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE"
# [Refs.: https://justrocketscience.com/post/handle-encodings ,
# https://stackoverflow.com/a/9612232/3223785 ,
# https://stackoverflow.com/a/13659891/3223785 ]
FURTHER QUESTION: I do not know if my approach is the safest. I say this because I noticed that some files are not correctly converted (characters will be lost) or are "truncated". I suspect that this has to do with the "iconv" tool or with the charset information obtained with the "uchardet" tool. I was curious about the solution presented by #demofly because it could be safer.
Another answer
Based on #demofly 's answer:
find "<YOUR_FOLDER_PATH>" -name '*' -type f -exec grep -Iq . {} \; -print0 |
while IFS= read -r -d $'\0' LINE_FILE; do
CHARSET=$(uchardet $LINE_FILE)
REENCSED=`echo $CHARSET | sed 's#^x-mac-#mac#'`
echo "\"$CHARSET\" \"$LINE_FILE\""
# NOTE: Convert/reconvert to utf8. By Questor
if [ -n "$STDERR_OP" ] ; then
# NOTE: Convert/reconvert to utf8. By Questor
iconv -f "$CHARSET" -t utf8 "$LINE_FILE" -o "$LINE_FILE" 2> STDERR_OP 1> STDOUT_OP
# NOTE: Remove "BOM" if exists as it is unnecessary. By Questor
# [Refs.: https://stackoverflow.com/a/2223926/3223785 ,
# https://stackoverflow.com/a/45240995/3223785 ]
sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE"
if [ -n "$STDERR_OP" ] ; then
echo "ERROR: \"$STDERR_OP\""
if [ -n "$STDOUT_OP" ] ; then
echo "RESULT: \"$STDOUT_OP\""
# [Refs.: https://justrocketscience.com/post/handle-encodings ,
# https://stackoverflow.com/a/9612232/3223785 ,
# https://stackoverflow.com/a/13659891/3223785 ]
Third answer
Hybrid solution with recode and vim:
find "<YOUR_FOLDER_PATH>" -name '*' -type f -exec grep -Iq . {} \; -print0 |
while IFS= read -r -d $'\0' LINE_FILE; do
CHARSET=$(uchardet $LINE_FILE)
REENCSED=`echo $CHARSET | sed 's#^x-mac-#mac#'`
echo "\"$CHARSET\" \"$LINE_FILE\""
# NOTE: Convert/reconvert to utf8. By Questor
if [ -n "$STDERR_OP" ] ; then
# NOTE: Convert/reconvert to utf8. By Questor
bash -c "</dev/tty vim -u NONE +\"set binary | set noeol | set nobomb | set encoding=utf-8 | set fileencoding=utf-8 | wq\" \"$LINE_FILE\""
# NOTE: Remove "BOM" if exists as it is unnecessary. By Questor
# [Refs.: https://stackoverflow.com/a/2223926/3223785 ,
# https://stackoverflow.com/a/45240995/3223785 ]
sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE"
This was the solution with the highest number of perfect conversions. Additionally, we did not have any truncated files.
WARNING: Make a backup of your files and use a merge tool to check/compare the changes. Problems probably will appear!
TIP: The command sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE" can be executed after a preliminary comparison with the merge tool after a conversion without it since it can cause "differences".
NOTE: The search using find brings all non-binary files from the given path ("") and its subfolders.

Check out tools available for a data convertation in a linux cli: https://www.debian.org/doc/manuals/debian-reference/ch11.en.html
Also, there is a quest to find out a full list of encodings which are available in iconv. Just run iconv --list and find out that encoding names differs from names returned by uchardet tool (for example: x-mac-cyrillic in uchardet vs. mac-cyrillic in iconv)

enca command doesn't work for my Simplified-Chinese text file with GB2312 encoding.
Instead, I use the following function to convert the text file for me.
You could of course re-direct the output into a file.
It requires chardet and iconv commands.
detection_cat ()
DET_OUT=$(chardet $1);
ENC=$(echo $DET_OUT | sed "s|^.*: \(.*\) (confid.*$|\1|");
iconv -f $ENC $1

use iconv and uchardet (thx farseerfc)
fish shell
cat your_file | iconv -f (uchardet your_file ) -t UTF-8
bash shell
cat your_file | iconv -f $(uchardet your_file ) -t UTF-8
if use bash script
for fn in "$#"
iconv < "$fn" -f $(uchardet "$fn") -t utf8
by #flowinglight at ubuntu group.


Set file modification time from the date string present in the filename

I'm restoring a number of archives with dates within their names, something along the lines of:
I want to set each file's modification date to match the date in their filename by piping the filenames to touch via xargs and using replace-str to set the dates.
touch -m -t will take a datetime in the format [CCYYMMDDhhmm], but I'm having trouble substituting inline:
find . -name "*.xz" | xargs -I {} touch -m -t $(sed -e 's/\.tar\.xz//g; s/user-//g; s/\.//g; s/\///g; s/$/0000/g' {}) {}
Returns touch: invalid date format ‘./user-2018.03.22.tar.xz’, even though this:
find . -name "*.xz" | sed -e 's/\.tar\.xz//g; s/user-//g; s/\.//g; s/\///g; s/$/0000/g'
Returns properly-formatted dates, for example 201812200000. Am I misusing command substitution in my replace string somehow?
EDIT : Yes, a simple script could do this no problem. But the question remains...
You don't need find, sed, xargs or any third party tools, but just use the shell built-in regex capabilities to get the timestamp from the file
for file in *.tar.xz; do
[ -f "$file" ] || continue
if [[ $file =~ ^user-([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).tar.xz$ ]]; then
touch -m -t "$dateStr"
The problem is that the command substitution will be evaluated once when you call xargs, not for each argument. You would need to spawn a shell for that:
find . -name "*.xz" \
| xargs -I {} bash -c 'touch -m --date "$(sed -e "s/\.tar\.xz//;s/user-//g; s/\.//g; s/\///g;" <<< "$1")" "$1"' -- {}
Note: xargs is not needed because you can use the -exec option of find:
find . -name "*.xz" -exec bash -c 'touch -m --date "$(sed -e "s/\.tar\.xz//;s/user-//g; s/\.//g; s/\///g;" <<< "$1")" "$1"' -- {} \;
PS: A small for loop would be more readable:
for file in user-*.tar.xz ; do
# remove prefix and suffix
# replace dots by /
touch -m --date "${date}" "${file}"
This might work for you (GNU parallel):
parallel --dryrun touch -m --date '{= s/[^0-9]//g =}' {} ::: *.xz
When happy that the commands are correct, then remove the --dryrun option.
parallel touch -m --date `{= s/user-//;s/\.tar\.xz//;s/\.//g =}' {} ::: *.xz

sed with variable as argument in bash script

I am trying to write a bash script to scan for authorized_keys files and remove the keys of a couple previous employees if found. I am having one heck of a time figuring out the escaping for the sed command at the end. I am using commas instead of / since / can show up in the ssh-key. Any help would be appreciated
declare -A keys
files=`find / -name authorized_keys`
echo "Checking Authorized_Keys files on: " `hostname`
echo ""
echo "Located files: "
for file in $files; do
echo " $file"
for file in $files; do
for key in "${!keys[#]}"; do
if grep -q ${keys[$key]} $file; then
echo " *** Removing $key from $file"
sed "s,${keys[$key]},d" $file
You've made it a bit complicated I think.
You can do this using grep -vf and process substitution:
# array to hold the value you want to remove
while IFS= read -d '' -r file; do
grep -vf <(printf "%s\n" "${keys[#]}") "$file" > "$file.tmp"
mv "$file.tmp" "$file"
done < <(find / -name authorized_keys -print0)
In your case, it's easy, just need to use a sign which not contained in base64 code as the delimiter, eg |:
sed "\|${keys[$key]}|d" $file
Explanation in the sed manual:
(The % may be replaced by any other single character.)
This also matches the regular expression regexp, but allows one to use a different delimiter than /.

How to make vim SpellCheck *not* code aware?

By default, vim spell checker is code aware, so it doesn't spell-check code parts of the file. In effect, in markdown it considers (pandoc multiline) tables to be codes and thus doesn't spell-check their contents.
Is it possible to override this? Or enable spell-check for the entire file including code.
As far as I'm able to determine, there is no way to tell Vim to ignore the
spellcheck suggestions in the syntax file and to just "check everything".
A fairly heavy-handed workaround is to disable syntax entirely with :syn off;
you can re-enable this with :syn on.
Specifically for Markdown, you can disable highlighting of code blocks with
:syn clear markdownCodeBlock; you can reset this with :syn on as well.
Use syntax spell
:syntax spell toplevel
In that case I would contact the maintainer of the markdown syntax file and ask him/she if (s)he could fix this issue.
I created bash script fixing syntax files. IT IS NOT PERFECT BUT IT IS GOOD. It can be reversed by running it again. It adds contains=#Spell to syn match and syn region definitions in all files in given directory.
To use it:
Save the script as fix_syntax_files.sh
Give it permissions
Change path at the bottom of the script to one corresponding to your vim plugins location
Run the script
(OPTIONAL) Run script again to revert the changes
The script makes backup of all files before modification so you can assume it is safe to run it. I anyway do not take any responsibility for potential problems caused by the script.
You can leave feedback to the script in the following repository:
function fix_file {
sed -i -e '/exe/! {s/contains=/contains=#Spell,/g}' $1
sed -i -e 's/contains=#Spell,ALL/contains=ALL/g' $1
sed -i -e 's/contains=#Spell,ALLBUT/contains=ALLBUT/g' $1
sed -i -e 's/contains=#Spell,TOP/contains=TOP/g' $1
sed -i -e 's/contains=#Spell,CONTAINED/contains=CONTAINED/g' $1
sed -i -e 's/contains=#Spell,NONE/contains=#Spell/g' $1
sed -i -e '/^ *syn match/ {/contains=/! s/$/ contains=#Spell/g}' $1
sed -i -e '/^ *syn region/ {/contains=/! s/$/ contains=#Spell/g}' $1
return 0
function revert_file {
mv "$1/$2.spellfix-backup" "$1/$2"
return 0
function fix_recursively_in_catalog {
syntax_catalogs_paths="$(find $1 -type d ! -name '*.*' -not -path '*git*' -print)"
syntax_catalogs_count="$(echo "${syntax_catalogs_paths}" | wc -l)"
echo "${syntax_catalogs_count} syntax catalogs found and will be scanned for files"
echo "${syntax_catalogs_paths}" | while read -r catalog_path ; do
echo " Scanning $catalog_path"
ls -p "${catalog_path}" | grep -v / | grep -v .spellfix-backup | grep .vim | while read -r file_name ; do
cp "${catalog_path}/${file_name}" "${catalog_path}/${file_name}.spellfix-backup"
fix_file "${catalog_path}/${file_name}"
echo " Fixing ${file_name} (backup created as ${file_name}.spellfix-backup)"
echo 'Fix done.'
echo 'Remember to REVERT FIX before updating vim plugins'
return 0
function revert_recursively_in_catalog {
syntax_catalogs_paths="$(find $1 -type d ! -name '*.*' -not -path '*git*' -print)"
syntax_catalogs_count="$(echo "${syntax_catalogs_paths}" | wc -l)"
echo "${syntax_catalogs_count} syntax catalogs found and will be scanned for spellfix-backup files"
echo "${syntax_catalogs_paths}" | while read -r catalog_path ; do
echo " Scanning $catalog_path"
ls -p "${catalog_path}" | grep -v / | grep -v .spellfix-backup | grep .vim | while read -r file_name ; do
revert_file "${catalog_path}" "${file_name}"
echo " Reverting ${file_name} (from file ${file_name}.spellfix-backup)"
echo 'Revert done.'
echo 'Remember to FIX AGAIN after plugins update (or set it as a post update hook)'
return 0
function main {
syntax_catalogs_paths="$(find $1 -type d ! -name '*.*' -not -path '*git*' -print)"
while read -r catalog_path ; do
if ls -p "${catalog_path}" | grep -v / | grep .spellfix-backup; then
echo ".spellfix-backup files found, reverting fix!"
echo "--------------------------------------------"
revert_recursively_in_catalog $1
return 0
done < <(echo "${syntax_catalogs_paths}")
echo ".spellfix-backup files NOT found, fixing!"
echo "-----------------------------------------"
fix_recursively_in_catalog $1

How to move a single file with (.JPEG, .JPG, .jpeg, .jpg) extensions) and change the extension to .jpg with Linux bash

I have an inotify wait script that will move a file from one location to another whenever it detects that a file has been uploaded to the source directory.
The challenge I am facing is that i need to retain the basename of the file and convert the following extensions: .JPEG, .JPG, .jpeg to .jpg so that the file is renamed with the .jpg extension only.
Currently I have this:
( while [ 1 ]
do inotifywait -m -r -e close_write --format %f -q \
$SRC | while read F
do mv "$SRC/$F" $TARGET
done ) &
So I need a way to split out and test for those non standard extensions and move the file with the correct extension. All files not having those 4 extensions just get moved as is.
if [[ "$F" =~ .JPEG\|jpg\|jpeg\|jpg ]];then
echo mv $F ${F%.*}.jpg
Using extglob option with some parameter expansion:
#! /bin/bash
shopt -s extglob
( while : ; do
inotifywait -m -r -r close_write --format %f -q \
$SRC | while read F ; do
basename=${F##*/} # Remove everything before /
ext=${basename##*.} # Remove everything before .
basename=${basename%.$ext} # Remove .$ext at the end
if [[ $ext == #(JPG|JPEG|jpeg) ]] ; then # Match any of the words
echo mv "$F" "$TARGET/$basename.$ext"
done ) &
Try this format. (Updated)
while :; do
inotifywait -m -r -e close_write --format %f -q "$SRC" | while IFS= read -r F; do
case "$F" in
echo mv "$SRC/$F" "$TARGET/" ## Move as is.
echo mv "$SRC/$F" "$TARGET/${F%.*}.jpg" ## Move with new proper extension.
) &
Remove echo from the mv commands if you find it correct already. Also it's meant for bash but could also be compatible with other shells. If you get an error with the read command try to remove the -r option.

How to convert ISO8859-15 to UTF8?

I have an Arabic file encoded in ISO8859-15. How can I convert it into UTF8?
I used iconv but it doesn't work for me.
iconv -f ISO-8859-15 -t UTF-8 Myfile.txt
I wanted to attach the file, but I don't know how.
Could it be that your file is not ISO-8859-15 encoded? You should be able to check with the file command:file YourFile.txt
Also, you can use iconv without providing the encoding of the original file:iconv -t UTF-8 YourFile.txt
I found this to work for me:
iconv -f ISO-8859-14 Agreement.txt -t UTF-8 -o agreement.txt
I have ubuntu 14 and the other answers where no working for me
iconv -f ISO-8859-1 -t UTF-8 in.tex > out.tex
I found this command here
We have this problem and to solve
Create a script file called to-utf8.sh
TO="UTF-8"; FILE=$1
FROM=$(file -i $FILE | cut -d'=' -f2)
if [[ $FROM = "binary" ]]; then
echo "Skipping binary $FILE..."
exit 0
iconv -f $FROM -t $TO -o $FILE.tmp $FILE; ERROR=$?
if [[ $ERROR -eq 0 ]]; then
echo "Converting $FILE..."
mv -f $FILE.tmp $FILE
echo "Error on $FILE"
Set the executable bit
chmod +x to-utf8.sh
Do a conversion
./to-utf8.sh MyFile.txt
If you want to convert all files under a folder, do
find /your/folder/here | xargs -n 1 ./to-utf8.sh
Hope it's help.
I got the same problem, but i find the answer in this page! it works for me, you can try it.
iconv -f cp936 -t utf-8 
in my case, the file command tells a wrong encoding, so i tried converting with all the possible encodings, and found out the right one.
execute this script and check the result file.
for i in `iconv -l`
echo $i
iconv -f $i -t UTF-8 yourfile | grep "hint to tell converted success or not"
done &>/tmp/converted
You can use ISO-8859-9 encoding:
iconv -f ISO-8859-9 Agreement.txt -t UTF-8 -o agreement.txt
Iconv just writes the converted text to stdout. You have to use -o OUTPUTFILE.txt as an parameter or write stdout to a file. (iconv -f x -t z filename.txt > OUTPUTFILE.txt or iconv -f x -t z < filename.txt > OUTPUTFILE.txt in some iconv versions)
iconv -f encoding -t encoding inputfile
The iconv program converts the encoding of characters in inputfile from one coded character set to another.
**The result is written to standard output unless otherwise specified by the --output option.**
--from-code, -f encoding
Convert characters from encoding
--to-code, -t encoding
Convert characters to encoding
List known coded character sets
--output, -o file
Specify output file (instead of stdout)
Print progress information.
