echo command in script wiht input from excel.txt file - excel

First: Many thanks for taking the effort to read and answer this post
What I want to do:
I would like to read lines from a txt file (each line may contain 1 or 2 words) and then echo a text saying:
curl "text.word1+word2"
How I do it:
while read line
do
kw=( $line )
echo 'curl "text.'${kw[0]}+${kw[1]}'"'
done
Remark: there are additional if command in my file i did not post here
Problem
While this works fine when I execute it in the terminal and enter a line in the terminal, it does not product the desired result when I use a txt file as input
The text file is an excel file where I have text in the first column. I save the excel file as "Windows formatted text (.txt) and can open the resulting name.txt file in an editor (seeing a "normal" text file)
when I use in the terminal now
./myscript.command < name.txt
the result will be
"url "text.word1+word2
instead of
curl "text.word1+word2"
Any ideas how to solve this?
Many thanks!

${kw[1]} has a carriage return character at the end of it, what you're seeing is:
curl "text.word1+word2
"
with the second line overwriting the first.
You can confirm this by doing a hex dump on the file, such as with:
od -xcb name.txt
and looking for those pesky \r characters.
The following transcript shows this in detail:
pax> od -xcb name.txt
0000000 6150 2078 6944 6261 6f6c 0a0d
P a x D i a b l o \r \n
120 141 170 040 104 151 141 142 154 157 015 012
0000014
pax> cat name.sh
#!/bin/bash
while read line ; do
kw=( $line )
echo 'curl "text.'${kw[0]}+${kw[1]}'"'
done <name.txt
pax> ./name.sh
"url "text.Pax+Diablo
As to how you get rid of the carriage return, there are numerous solutions, some of which are:
dos2unix;
tr -d '\r'; or
sed 's/\r$//'.
Your weapon of choice depends on what tools you have available to you, dos2unix is probably the easiest if you have it to hand.

DOS uses CRLF for new line(each line ends with two characters, CR then LF). Unix uses LF only (each line ends with an \n character).
To convert dos file format into unix file format use
$ dos2unix filename
It will solve your problem.

This code works fine:
while read line
do
kw=( $line )
echo 'curl "text.'${kw[0]}+${kw[1]}'"'
done < $1
And generates:
./text_line.sh text.txt
curl "text.aaa+aaa"
curl "text.bbb+bbb"
curl "text.ccc+"
curl "text.ddd+ddd"

Related

What's confusing both grep and ack?

Try this: download https://www.mathworks.com/matlabcentral/fileexchange/19-delta-sigma-toolbox
In the unzipped folder, I get the following results:
ack --no-heading --no-break --matlab dsexample
Contents.m:56:% dsexample1 - Discrete-time lowpass/bandpass/quadrature modulator.
Contents.m:57:% dsexample2 - Continuous-time lowpass modulator.
dsexample1(dsm, LiveDemo);
fprintf(1,'Done.\n');
adc.sys_cs = sys_cs;
grep -nH -R --include="*.m" dsexample
Contents.m:56:% dsexample1 - Discrete-time lowpass/bandpass/quadrature modulator.
Contents.m:57:% dsexample2 - Continuous-time lowpass modulator.
dsexample1(dsm, LiveDemo); d center frequency larger Hinfation Script
fprintf(1,'Done.\n');c = c;formed.s of finite op-amp gain and capacitorased;;n for the input.
adc.sys_cs = sys_cs;snr;seed with CT simulations tora states used in the d-t model_amp); Response');
What's going on ?
[Edit for clarification]: Why is there no file name, no line number on the 3rd line result ? Why results on the 4th and 5th line do not even contain dsexample ?
NB: using ack 3.40 and grep 2.16
I do not deserve any credits for this answer - It is all about line endings.
I have known for years about Windows line endings (CR-LF) and Linux line endings (LF only), but I had never heard of Legacy MAC line endings (CR only)... The latter really upsets ack, grep, and I'm sure lots of other tools.
dos2unix and unix2dos have no effect on files with Legacy MAC format - But after using this nifty little endline tool, I could eventually bring some consistency to the source files:
endlines : 129 files converted from :
- 23 Legacy Mac (CR)
- 105 Unix (LF)
- 1 Windows (CR-LF)
Now, ack and grep are much happier.
Let's see what files contain dsexample, grep -l doesn't print the contents, just file names:
$ grep -l dsexample *
Contents.m
demoLPandBP.m
dsexample1.m
dsexample2.m
Ok, then, file shows that they have CR line terminators. (It would say "CRLF line terminators" for Windows files.)
$ file Contents.m demoLPandBP.m dsexample*
Contents.m: ASCII text
demoLPandBP.m: ASCII text, with CR line terminators
dsexample1.m: ASCII text, with CR line terminators
dsexample2.m: ASCII text, with CR line terminators
Unlike what I commented about before, Contents.m is fine. Let's look at another one, how it prints:
$ grep dsexample demoLPandBP.m
dsexample1(dsm, LiveDemo); d center frequency larger Hinf
The output from grep is actually the whole file, since grep doesn't consider the plain CR as breaking a line -- the whole file is just one line. If we change CRs to LFs, we see it better, or can just count the lines:
$ grep dsexample demoLPandBP.m | tr '\r' '\n' | wc -l
51
These are the longest lines there, in order:
%% 5th-order lowpass with optimized zeros and larger Hinf
dsm.f0 = 1/6; % Normalized center frequency
dsexample1(dsm, LiveDemo);
With a CR in the end of each, the cursor moves back to the start of the line, partially overwriting the previous output, so you get:
dsexample1(dsm, LiveDemo); d center frequency larger Hinf
(There's a space after the semicolon on that line, so the e gets overwritten too. I checked.)
Someone said dos2unix can't deal with that, and well, they're not DOS or Windows files anyway so why should it. You could do something like this, though, in Bash:
for f in *.m; do
if [[ $(file "$f") = *"ASCII text, with CR line terminators" ]]; then
tr '\r' '\n' < "$f" > tmptmptmp &&
mv tmptmptmp "$f"
fi
done
I think it was just the .m files that had the issue, hence the *.m in the loop. There was at least one PDF file there, and we don't want to break that. Though with the check on file there, it should be safe even if you just run the loop on *.
It looks like both ack and grep are getting confused by the line endings in the files. Run file *.m on your files. You'll see that some files have proper linefeeds, and some have CR line terminators.
If you clean up your line endings, things should be OK.

How to add UTF-16 characters at the beginning of an existing file using sed?

I have a large script that generates many files and part of it doesn't work due to BOM missing. I have to work with the file named pagecounts-${_date} which is ultimately created like this:
cat $TMPDIR/*.filtered > $TMPDIR/pagecounts-${_date}
Then, I use sort and try to work with it in another script, but I get the BOM error. My guestion is, can I add BOM for utf-16 at the beginning of an already existing file? If yes, how can I achieve that?
I was thinking of using a temporary file like this:
cat $TMPDIR/*.filtered > $TMPDIR/tmp_pagecounts-${_date}
echo '\ufeff' > $TMPDIR/pagecounts-${_date}
cat $TMPDIR/tmp_pagecounts-${_date} | sort >> $TMPDIR/pagecounts-${_date}
But this way seems to chop off some of the UTF-16 characters.
You could use echo -e for printing the Unicode utf-16 character sequence as is
sed "1s/^/$(echo -ne '\ufeff')/" "$TMPDIR"/pagecounts-${_date}
or use printf too
sed "1s/^/$(printf '\ufeff')/" "$TMPDIR"/pagecounts-${_date}
Confirm the same sequence to be accurate after doing a hexdump -c or hexdump -C on the same file
echo -ne '\ufeff' | hexdump -c
0000000 355 237 277 355 273 277
0000006
You can confirm these bytes to be consistent on applying to the file also.
The above sed commands just print the file contents to stdout, to modify the file in-place use the -i flag (-i '' is required for macOS's sed)
sed -i '' ...

Kshell read a file and define record length then strip ascii 20 characters from the end of each record

I am moving a file transfer from an OpenVMS server over to a Unix server, and there is a VMS program that reads this file in record by record, defining a record length of 320, and then strips all ASCII 20 characters from the end of each record. How can I do this with a KSH script?
I just want to read a file in record by record, strip all ascii 20 characters from position 320 in each record back to where there is an actual good character (real end of the record), and write it out to a new file.
Thanks in advance!
EDIT: I'm on AIX 6
You can use sed:
sed -i -r 's/(.{319})\x14(.*)$/\1\2/' file
This command will read first 319 character followed by \x14 (ASCII 20) and then rest of the line. It will then put back matched group # 1 and matched group # 2. This will leave out \x20.
-i flag (inline) of sed will save the changed file back.
EDIT: Try this sed on AIX:
sed 's/\(.\{319\}\)'$'\x14''\(.*\)$/\1\2/' file > _temp && mv _temp file
Final Solution: After investigations (see comments) it was found that file had trailing spaces (hex 20) instead of ASCII 20. Once it was established following solution worked:
sed 's/ *$//' file > _temp && mv _temp file
ksh:
while IFS= read -r line; do
record=${line:0:320} # the first 320 chars of the line
echo "${record%%+(\x14)}" # removes ascii 20 chars from end of record
done < input.file > output.file
Let me know if I've mis-read your requirements.

How to find a windows end of line (EOL) character

I have several hundred GB of data that I need to paste together using the unix paste utility in Cygwin, but it won't work properly if there are windows EOL characters in the files. The data may or may not have windows EOL characters, and I don't want to spend the time running dos2unix if I don't have to.
So my question is, in Cygwin, how can I figure out whether these files have windows EOL CRLF characters?
I've tried creating some test data and running
sed -r 's/\r\n//' testdata.txt
But that appears to match regardless of whether dos2unix has been run or not.
Thanks.
The file(1) utility knows the difference:
$ file * | grep ASCII
2: ASCII text
3: ASCII English text
a: ASCII C program text
blah: ASCII Java program text
foo.js: ASCII C++ program text
openssh_5.5p1-4ubuntu5.dsc: ASCII text, with very long lines
windows: ASCII text, with CRLF line terminators
file(1) has been optimized to try to read as little of a file as possible, so you may be lucky and drastically reduce the amount of disk IO you need to perform when finding and fixing the CRLF terminators.
Note that some cases of CRLF should stay in place: captures of SMTP will use CRLF. But that's up to you. :)
#!/bin/bash
for i in $(find . -type f); do
if file $i | grep CRLF ; then
echo $i
file $i
#dos2unix "$i"
fi
done
Uncomment "#dos2unix "$i"" when you are ready to convert them.
You can find out using file:
file /mnt/c/BOOT.INI
/mnt/c/BOOT.INI: ASCII text, with CRLF line terminators
CRLF is the significant value here.
If you expect the exit code to be different from sed, it won't be. It will perform a substitution or not depending on the match. The exit code will be true unless there's an error.
You can get a usable exit code from grep, however.
#!/bin/bash
for f in *
do
if head -n 10 "$f" | grep -qs $'\r'
then
dos2unix "$f"
fi
done
grep recursive, with file pattern filter
grep -Pnr --include=*file.sh '\r$' .
output file name, line number and line itself
./test/file.sh:2:here is windows line break
You can use dos2unix's -i option to get information about DOS Unix Mac line breaks (in that order), BOMs, and text/binary without converting the file.
$ dos2unix -i *.txt
6 0 0 no_bom text dos.txt
0 6 0 no_bom text unix.txt
0 0 6 no_bom text mac.txt
6 6 6 no_bom text mixed.txt
50 0 0 UTF-16LE text utf16le.txt
0 50 0 no_bom text utf8unix.txt
50 0 0 UTF-8 text utf8dos.txt
With the "c" flag dos2unix will report files that would be converted, iow files have have DOS line breaks. To report all txt files with DOS line breaks you could do this:
$ dos2unix -ic *.txt
dos.txt
mixed.txt
utf16le.txt
utf8dos.txt
To convert only these files you simply do:
dos2unix -ic *.txt | xargs dos2unix
If you need to go recursive over directories you do:
find -name '*.txt' | xargs dos2unix -ic | xargs dos2unix
See also the man page of dos2unix.
As stated above the 'file' solution works. Maybe the following code snippet may help.
#!/bin/ksh
EOL_UNKNOWN="Unknown" # Unknown EOL
EOL_MAC="Mac" # File EOL Classic Apple Mac (CR)
EOL_UNIX="Unix" # File EOL UNIX (LF)
EOL_WINDOWS="Windows" # File EOL Windows (CRLF)
SVN_PROPFILE="name-of-file" # Filename to check.
...
# Finds the EOL used in the requested File
# $1 Name of the file (requested filename)
# $r EOL_FILE set to enumerated EOL-values.
getEolFile() {
EOL_FILE=$EOL_UNKNOWN
# Check for EOL-windows
EOL_CHECK=`file $1 | grep "ASCII text, with CRLF line terminators"`
if [[ -n $EOL_CHECK ]] ; then
EOL_FILE=$EOL_WINDOWS
return
fi
# Check for Classic Mac EOL
EOL_CHECK=`file $1 | grep "ASCII text, with CR line terminators"`
if [[ -n $EOL_CHECK ]] ; then
EOL_FILE=$EOL_MAC
return
fi
# Check for Classic Mac EOL
EOL_CHECK=`file $1 | grep "ASCII text"`
if [[ -n $EOL_CHECK ]] ; then
EOL_FILE=$EOL_UNIX
return
fi
return
} # getFileEOL
...
# Using this snippet
getEolFile $SVN_PROPFILE
echo "Found EOL: $EOL_FILE"
exit -1
Thanks for the tip to use file(1) command, however it does need a bit more refinement. I had the situation where not only plain text files but also some ".sh" scripts had the wrong eol. And "file" reports them as follows regardless of eol:
xxx/y/z.sh: application/x-shellscript
So the "file -e soft" option was needed (at least for Linux):
bash$ find xxx -exec file -e soft {} \; | grep CRLF
This finds all the files with DOS eol in directory xxx and subdirs.

Best tool to remove dos line ends and join line back up again

I have a csv file into which has crept some ^M dos line ends, and I want to get rid of them, as well as 16 spaces and 3 tabs which follow. Like, I have to merge that line with the next one down. Heres an offending record and a good one as a sample of what I mean:
"Mary had a ^M
little lamb", "Nursery Rhyme", 1878
"Mary, Mary quite contrary", "Nursery Rhyme", 1838
I can remove the ^M using sed as you can see, but I cannot work out how to rm the nix line end to join the lines back up.
sed -e "s/^M$ //g" rhymes.csv > rhymes.csv
UPDATE
Then I read "However, the Microsoft CSV format allows embedded newlines within a double-quoted field. If embedded newlines within fields are a possibility for your data, you should consider using something other than sed to work with the data file." from:
http://sed.sourceforge.net/sedfaq4.html
So editing my question to ask Which tool I should be using?
With help from How can I replace a newline (\n) using sed?, I made this one:
sed -e ':a;N;$!ba;s/\r\n \t\t\t/=/' -i rhymes.csv
<CR> <LF> <16 spaces> <3 tabs>
If you just want to delete the CR, you could use:
<yourfile tr -d "\r" | tee yourfile
(or if the two input and output file are different: <yourfile tr -d "\r" > output)
dos2unix file_name
to convert file, or
dos2unix old_file new_file
to create new file.

Resources