Regarding Carriage Returns - linux

I have created an sh script in Unix and it basically compiles a program specified via argument, for example: sh shellFile cprogram.c, and feeds the compiled program a variety of input files.
These input files are created in Unix and i have tested it, I am happy with those results. I have even tested it out with input files, made in Unix, that even had an extra carriage return at the end and yet i get good results, lets name this test file 00ecr for reference. Is it possible that if i created a test file in windows and transferred it over to Unix, that this new test file, lets call it 00wind, will produce bad results in my shell program.
This is just a theoretical question overall. I am just wondering if it will muck things up even though I tested my shell script with files, made in Unix, that accounted for extra carriage returns?

How about in your script, you use Linux command file to identify if the file has Windows style line terminations:
$file test.txt
test.txt: ASCII text, with CRLF line terminators
So your script could have a converting function like this:
#!/bin/bash
windows_file=`file $1 | grep CRLF`
if [[ -z "$windows_file" ]]; then
echo "File already in unix format"
else
# file need converting
dos2unix $1
echo "Converted a windows file"
fi
So here we first use the file utility to output the file type, and grep CRLF string, to see if the file needs converting. The grep will return null if it's not in windows format, and we can test for null string with if [[ -z "$var" ]] statement.
And then just dos2unix utility to convert it:
$ dos2unix test.txt
dos2unix: converting file test.txt to Unix format ...
$ file test.txt
test.txt: ASCII text
This way you could ensure that your input is always "sanitized".

Related

check if a file is a unix file and set a bit if it's true

I want to create a shell file that looks for whether a file is a Unix or a Dos file type. Using an IF query I want to decide after checking whether the file needs to be converted using "dos2unix" or not. I know the command "FILE" but the return value is no BOOLEAN data type its a string.
So is there any way to set a BOOLEAN bit to true if the file is a unix file type?
thanks in advance...!
You could parse the output of the file command. For text files with \n line endings, it outputs ASCII text ..., while for text files with \r\n line endings, it outputs ASCII text ... with CRLF line terminators. Note that depending on the actual file contents, there can be additional information in place of the "...". Hence, you could do something like
file YOURFILE | grep -q '^ASCII text.*with CRLF'
((is_dos_text_file=1-$?))
The variable is_dos_text_file contains the value 1, if YOURFILE was judged by file as a text file with CRLF endings. It is 0 if YOURFILE either has Unix line endings, or was not judged as textfile.
UPDATE: I just noticed that you have used the shell tag in your posting and hence search for a Posix Shell solution. In this case, the ((...)) construct can't be used and you would have to do something like
if file YOURFILE | grep -q '^ASCII text.*with CRLF'
then
is_dos_text_file=1 # true
else
is_dos_text_file=0 # false
fi
to get the same effect.
You can convert the file to a Unix file and check if it is still the same. In that case it is a Unix file. Otherwise it is a DOS file.
echo unix > unix-file
echo dos | unix2dos > dos-file
for file in {dos,unix}-file; do
if cmp -s $file <(dos2unix < $file); then
echo $file is a unix file
else
echo $file is a dos file
fi
done

Concatenate string in a loop of shell script

I want to concatenate a suffix to a string in a loop of shell script, but the result makes me confused. The shell script is as follows:
for i in `cat IMAGElist.txt`
do
echo $i
echo ${i}_NDVI
done
The result is:
LT51240392010131BKT01
_NDVI40392010131BKT01
LT51240392010163BKT01
_NDVI40392010163BKT01
...
The front five chars was replaced with "_NDVI".
But the expected result should be:
LT51240392010131BKT01
LT51240392010131BKT01_NDVI
LT51240392010163BKT01
LT51240392010163BKT01_NDVI
...
I think the method for string concatenation is right if not in the loop. I don't know why this result is produced?
It looks as though your file may contain Windows-style line endings (carriage return + line feed), so you should convert them to UNIX-style ones. A simple way to do this is with the tool dos2unix.
Don't use for to read lines of a text file:
while read -r line
do
echo "$line"
echo "${line}_NDVI"
done < IMAGElist.txt
Note that you can achieve this result more efficiently with tools designed to process text, such as awk or sed.

bash not parsing (cat + grep) correctly

I have a text file 1.grep
grep -P -e "^<job.+type.+rule" "Emake-4agents-1st-10-25-51.53.xml"
To make my grepping go faster, I do the following in bash
cat 1.grep | bash > 1.search
This works fine normally but in this case, I get the following:
$ cat 1.grep
grep -P -e "^<job.+type.+rule" "Emake-4agents-1st-10-25-51.53.xml"
$ cat 1.grep | bash > 2.search
: No such file or directory25-51.53.xml
Why does bash think that my .xml filename is a directory?
The immediate problem is that the file 1.grep is in DOS/Windows format, and has a carriage return followed by linefeed at the end of the line. Windows treats that two-character combination as the end-of-line marker, but unix tools like bash (and grep and ...) will treat just the linefeed as the end-of-line marker, so the carriage return is treated as part of the line. As a result, it's trying to read from a file named "Emake-4agents-1st-10-25-51.53.xml^M" (where ^M indicates the carriage return), which doesn't exist, so it prints an error message with a carriage return in the middle of it:
cat: Emake-4agents-1st-10-25-51.53.xml^M
: No such file or directory
...where the carriage return makes the second part overwrite the first part, giving the cryptic result you saw.
Solution: use something like dos2unix to convert the file to unix (line-feed-only) format, and use text editors that store in the unix format.
However, I also have to agree with several comments that said using cat | bash is ... just plain weird. I'm not sure exactly what you're trying to accomplish in the bigger picture, but I can't think of any situation where that'd be the "right" way to do it.

shell script to find file type

I am working on a shell script that takes a single command line parameter, a file path (might be relative or absolute). The script should examine that file and print a single line consisting of the phrase:
Windows ASCII
if the files is an ASCII text file with CR/LF line terminators, or
Something else
if the file is binary or ASCII with “Unix” LF line terminators.
currently I have the following code.
#!/bin/sh
file=$1
if grep -q "\r\n" $file;then
echo Windows ASCII
else
echo Something else
fi
It displays information properly, but when I pass something that is not of Windows ASCII type through such as /bin/cat it still id's it as Windows ASCII. When I pass a .txt file type it displays something else as expected it is just on folders that it displays Windows ASCII. I think I am not handling it properly, but I am unsure. Any pointers of how to fix this issue?
As you specify you only need to differentiate between 2 cases, this should work.
#!/bin/sh
file="$1"
case $(file "$file") in
*"ASCII text, with CRLF line terminators" )
echo "Windows ASCII"
;;
* )
echo "Something else"
;;
esac
As you have specified #!/bin/sh, OR if your goal is total backward compatibility, you may need to change
$(file "$file")
with
`file "$file"`
To use your script with filenames that include spaces, note that all $ variable names are now surrounded with double-quotes. AND you'll also have to quote the space char in the filename when you call the script, i.e.
myFileTester.sh "file w space.txt"
OR
myFileTester.sh 'file w space.txt'
OR
myFileTester.sh file\ w\ space.txt
OR
Also, if you have to start discriminating all the possible cases that file can analyze, you'll have a rather large case statement on your hands. AND file is notorious for the different messages it returns, depending on the the contents of /etc/file/magic, OS, versions, etc.
IHTH
Use file command to find out file type:
$ file /etc/passwd
/etc/passwd: ASCII English text
$ file /bin/cat
/bin/cat: Mach-O 64-bit executable x86_64
$ file test.txt
test.txt: ASCII text, with CRLF line terminators

Embedded Linux shell (BusyBox) check over list of files if they exist and run commands

I'm trying to download a list of files (text file, one filename per line, no spaces or newlines in the filenames), and then check for each file if it exists and run commands accordingly. the first part seems to work just fine, the file from the web server is downloaded, and the first echo outputs the filenames. but the file existence check does not work.
#!/bin/sh
wget -qO- http://web.server/x/files.txt | while read file
do
echo $file
if [ -f $file ]; then
echo $file exists
else
echo $file does not exist
fi
done
output when executed in a directory where the second file (temp.txt) does exist:
file1.tmp
does not exist
temp.txt
does not exist
file3.tmp
does not exist
file4.tmp
does not exist
The second file does exist, and the echo commands in the if statement apparently doesn't recognize the $file variable either.
Any help is appreciated, I tried cobbling this together with info found here. A problem might be that this is not a full linux system, but embedded Linux (OpenELEC) with BusyBox v1.22.1.
UPDATE: thanks to the commenters we figured out that the code as is basically works fine AS LONG as the files.txt from the web server only contains unix EOL -- it doesn't work with windows CRLF line endings.
Now how could the script be made to work regardless of the line endings in the file from the web server?
dos2unix is a utility which converts Windows line endings to Unix line endings. You can use it in your script like this:
wget -qO- http://web.server/x/files.txt | dos2unix | while read file
Or:
while read line; do
...
done < <(wget -qO- http://web.server/x/files.txt | dos2unix)

Resources