Bash replaceing string i don't want it to - linux

I have a list that I want to run a command on so I was testing it first with an echo to make sure it was correct but it printed it out wrong.
while read name; do echo $name; done < RandomNames
This prints out just this list exactly how I want it but if I put anything after the variable it replaces the start of it.
So if I write
while read name; do echo $name; done < RandomNames
It will print
Rich
Chris
Zack
but if I write
while read name; do echo $name t; done < RandomNames
it writes
tch
tris
tck
so it replaces the first two characters with what ever I put after the variable and I have no idea why.

Your file has DOS newlines, so each input line ends with a hidden $'\r' character that moves the cursor to the beginning of the line after that line is printed, such that the next character written overwrites the line's first character.
Either use dos2unix or a similar tool to convert them to UNIX newlines, or expand ${name%$'\r'} to trim them.

Related

Unexpected End Of File Error for invalid line number [duplicate]

I need my script to send an email from terminal. Based on what I've seen here and many other places online, I formatted it like this:
/var/mail -s "$SUBJECT" "$EMAIL" << EOF
Here's a line of my message!
And here's another line!
Last line of the message here!
EOF
However, when I run this I get this warning:
myfile.sh: line x: warning: here-document at line y delimited by end-of-file (wanted 'EOF')
myfile.sh: line x+1: syntax error: unexpected end of file
...where line x is the last written line of code in the program, and line y is the line with /var/mail in it. I've tried replacing EOF with other things (ENDOFMESSAGE, FINISH, etc.) but to no avail. Nearly everything I've found online has it done this way, and I'm really new at bash so I'm having a hard time figuring it out on my own. Could anyone offer any help?
The EOF token must be at the beginning of the line, you can't indent it along with the block of code it goes with.
If you write <<-EOF you may indent it, but it must be indented with Tab characters, not spaces. So it still might not end up even with the block of code.
Also make sure you have no whitespace after the EOF token on the line.
The line that starts or ends the here-doc probably has some non-printable or whitespace characters (for example, carriage return) which means that the second "EOF" does not match the first, and doesn't end the here-doc like it should. This is a very common error, and difficult to detect with just a text editor. You can make non-printable characters visible for example with cat:
cat -A myfile.sh
Once you see the output from cat -A the solution will be obvious: remove the offending characters.
Please try to remove the preceeding spaces before EOF:-
/var/mail -s "$SUBJECT" "$EMAIL" <<-EOF
Using <tab> instead of <spaces> for ident AND using <<-EOF works fine.
The "-" removes the <tabs>, not <spaces>, but at least this works.
Note one can also get this error if you do this;
while read line; do
echo $line
done << somefile
Because << somefile should read < somefile in this case.
May be old but I had a space after the ending EOF
<< EOF
blah
blah
EOF <-- this was the issue. Had it for years, finally looked it up here
For anyone stumbling here who googled "bash warning: here-document delimited by end-of-file", it may be that you are getting the
warning: here-document at line 74 delimited by end-of-file
...type warning because you accidentally used a here document symbol (<<) when you meant to use a here string symbol (<<<). That was my case.
Here is a flexible way to do deal with multiple indented lines without using heredoc.
echo 'Hello!'
sed -e 's:^\s*::' < <(echo '
Some indented text here.
Some indented text here.
')
if [[ true ]]; then
sed -e 's:^\s\{4,4\}::' < <(echo '
Some indented text here.
Some extra indented text here.
Some indented text here.
')
fi
Some notes on this solution:
if the content is expected to have simple quotes, either escape them using \ or replace the string delimiters with double quotes. In the latter case, be careful that construction like $(command) will be interpreted. If the string contains both simple and double quotes, you'll have to escape at least of kind.
the given example print a trailing empty line, there are numerous way to get rid of it, not included here to keep the proposal to a minimum clutter
the flexibility comes from the ease with which you can control how much leading space should stay or go, provided that you know some sed REGEXP of course.
When I want to have docstrings for my bash functions, I use a solution similar to the suggestion of user12205 in a duplicate of this question.
See how I define USAGE for a solution that:
auto-formats well for me in my IDE of choice (sublime)
is multi-line
can use spaces or tabs as indentation
preserves indentations within the comment.
function foo {
# Docstring
read -r -d '' USAGE <<' END'
# This method prints foo to the terminal.
#
# Enter `foo -h` to see the docstring.
# It has indentations and multiple lines.
#
# Change the delimiter if you need hashtag for some reason.
# This can include $$ and = and eval, but won't be evaluated
END
if [ "$1" = "-h" ]
then
echo "$USAGE" | cut -d "#" -f 2 | cut -c 2-
return
fi
echo "foo"
}
So foo -h yields:
This method prints foo to the terminal.
Enter `foo -h` to see the docstring.
It has indentations and multiple lines.
Change the delimiter if you need hashtag for some reason.
This can include $$ and = and eval, but won't be evaluated
Explanation
cut -d "#" -f 2: Retrieve the second portion of the # delimited lines. (Think a csv with "#" as the delimiter, empty first column).
cut -c 2-: Retrieve the 2nd to end character of the resultant string
Also note that if [ "$1" = "-h" ] evaluates as False if there is no first argument, w/o error, since it becomes an empty string.
make sure where you put the ending EOF you put it at the beginning of a new line
Along with the other answers mentioned by Barmar and Joni, I've noticed that I sometimes have to leave a blank line before and after my EOF when using <<-EOF.

concatenation of strings in bash results in substitution

I need to read a file into an array and concatenate a string at the end of each line. Here is my bash script:
#!/bin/bash
IFS=$'\n' read -d '' -r -a lines < ./file.list
for i in "${lines[#]}"
do
tmp="$i"
tmp="${tmp}stuff"
echo "$tmp"
done
However, when I do this, an action of replace happens, instead of concatenation.
For example, in the file.list, we have:
http://www.example1.com
http://www.example2.com
What I need is:
http://www.example1.comstuff
http://www.example2.comstuff
But after executing the script above, I get things as below on the terminal:
stuff//www.example1.com
stuff//www.example2.com
Btw, my PC is Mac OS.
The problem also occurs while concatenating strings via awk, printf, and echo commands. For example echo $tmp"stuff" or echo "${tmp}""stuff"
The file ./file.lst is, most probably, generated on a Windows system or, at least, it was saved using the Windows convention for end of line.
Windows uses a sequence of two characters to mark the end of lines in a text file. These characters are CR (\r) followed by LF (\n). Unix-like systems (Linux and macOS starting with version 10) use LF as end of line character.
The assignment IFS=$'\n' in front of read in your code tells read to use LF as line separator. read doesn't store the LF characters in the array it produces (lines[]) but each entry from lines[] ends with a CR character.
The line tmp="${tmp}stuff" does what is it supposed to do, i.e. it appends the word stuff to the content of the variable tmp (a line read from the file).
The first line read from the input file contains the string http://www.example1.com followed by the CR character. After the string stuff is appended, the content of variable tmp is:
http://www.example1.com$'\r'stuff
The CR character is not printable. It has a special interpretation when it is printed on the terminal: it sends the cursor at the start of the line (column 1) without changing the line.
When echo prints the line above, it prints (starting on a new line) http://www.example1.com, then the CR character that sends the cursor back to the start of the line where is prints the string stuff. The stuff fragment overwrites the first 5 characters already printed on that line (http:) and the result, as it is visible on screen, is:
stuff//www.example1.com
The solution is to get rid of the CR characters from the input file. There are several ways to accomplish this goal.
A simple way to remove the CR characters from the input file is to use the command:
sed -i.bak s/$'\r'//g file.list
It removes all the CR characters from the content of file file.list, saves the updated string back into the file.list file and stores the original file.list file as file.list.bak (a backup copy in case it doesn't produce the output you expect).
Another way to get rid of the CR character is to ask the shell to remove it in the command where stuff is appended:
tmp="${tmp/$'\r'/}stuff"
When a variable is expanded in a construct like ${tmp/a/b}, all the appearances of a in $tmp are replaced with b. In this case we replace \r with nothing.
I'm guessing it's have something to do with the Carriage Return character.
Did your file.list created on windows? If so, try to use dos2unix before running the script.
Edit
You can check your files using the file command.
Example:
file file.list
If you saved the file in Windows Notepad like this:
Then it will probably come up like this:
file.list: ASCII text, with no line terminators
You can use built in tools like iconv to convert the encodings. However for a simple use like this, you can just use a command that works for multiple encodings without any conversion necessary.
You could simply buffer the file through cat, and use a regular expression that applies to either:
Carriage return followed by line terminator, or
Line terminator on it's own
Then append the string.
Example:
cat file.list | grep -E -v "^$" | sed -E -e "s/(\r?$)/stuff/g"
Will work with ASCII text, and ASCII text with no line terminators.
If you need to modify a stream to append a fixed string, you can use sed or awk, for instance:
sed 's/$/stuff/'
to append stuff to the end of each line.
using "dos2unix file.list" would also solve the problem

Concatenate string in a loop of shell script

I want to concatenate a suffix to a string in a loop of shell script, but the result makes me confused. The shell script is as follows:
for i in `cat IMAGElist.txt`
do
echo $i
echo ${i}_NDVI
done
The result is:
LT51240392010131BKT01
_NDVI40392010131BKT01
LT51240392010163BKT01
_NDVI40392010163BKT01
...
The front five chars was replaced with "_NDVI".
But the expected result should be:
LT51240392010131BKT01
LT51240392010131BKT01_NDVI
LT51240392010163BKT01
LT51240392010163BKT01_NDVI
...
I think the method for string concatenation is right if not in the loop. I don't know why this result is produced?
It looks as though your file may contain Windows-style line endings (carriage return + line feed), so you should convert them to UNIX-style ones. A simple way to do this is with the tool dos2unix.
Don't use for to read lines of a text file:
while read -r line
do
echo "$line"
echo "${line}_NDVI"
done < IMAGElist.txt
Note that you can achieve this result more efficiently with tools designed to process text, such as awk or sed.

Bash: Read in file, edit line, output to new file

I am new to linux and new to scripting. I am working in a linux environment using bash. I need to do the following things:
1. read a txt file line by line
2. delete the first line
3. remove the middle part of each line after the first
4. copy the changes to a new txt file
Each line after the first has three sections, the first always ends in .pdf and the third always begins with R0 but the middle section has no consistency.
Example of 2 lines in the file:
R01234567_High Transcript_01234567.pdf High School Transcript R01234567
R01891023_Application_01891023127.pdf Application R01891023
Here is what I have so far. I'm just reading the file, printing it to screen and copying it to another file.
#! /bin/bash
cd /usr/local/bin;
#echo "list of files:";
#ls;
for index in *.txt;
do echo "file: ${index}";
echo "reading..."
exec<${index}
value=0
while read line
do
#value='expr ${value} +1';
echo ${line};
done
echo "read done for ${index}";
cp ${index} /usr/local/bin/test2;
echo "file ${index} moved to test2";
done
So my question is, how can I delete the middle bit of each line, after .pdf but before the R0...?
Using sed:
sed 's/^\(.*\.pdf\).*\(R0.*\)$/\1 \2/g' file.txt
This will remove everything between .pdf and R0 and replace it with single space.
Result for your example:
R01234567_High Transcript_01234567.pdf R01234567
R01891023_Application_01891023127.pdf R01891023
The Hard, Unreliable Way
It's a bit verbose, and much less terse and efficient than what would make sense if we knew that the fields were separated by tab literals, but the following loop does this processing in pure native bash with no external tools:
shopt -s extglob
while IFS= read -r line; do
[[ $line = *".pdf"*R0* ]] || continue # ignore lines that don't fit our format
filename=${line%%.pdf*}.pdf
id=R0${line##*R0}
printf '%s\t%s\n' "$filename" "$id"
done
${line%%.pdf*} returns everything before the first .pdf in the line; ${line%%.pdf*}.pdf then appends .pdf to that content.
Similarly, ${line##*R0} expands to everything after the last R0; R0${line##*R0} thus expands to the final field starting with R0 (presuming that that's the only instance of that string in the file).
The Easy Way (Using Tab Delimiters)
If cat -t file (on MacOS) or cat -A file (on Linux) shows ^I sequences between the fields (but not within the fields), use the following instead:
while IFS=$'\t' read -r filename title id; do
printf '%s\t%s\n' "$filename" "$id"
done
This reads the three tab separated fields into variables named filename, title and id, and emits the filename and id fields.
Updated answer assuming tab delim
Since there is a tab delimiter, then this is a cinch for awk. Borrowing from my originally deleted answer and #geek1011 deleted answer:
awk -F"\t" '{print $1, $NF}' infile.txt
Here awk splits each record in your file by tab, then prints the first field $1 and the last field $NF where NF is the built in awk variable for the record's Number of Fields; by prepending a dollar sign, it says "The value of the last field in the record".
Original answer assuming space delimiter
Leaving this here in case someone has space delimited nonsense like I originally assumed.
You can use awk instead of using bash to read through the file:
awk 'NR>1{for(i=1; $i!~/pdf/; ++i) firstRec=firstRec" "$i} NR>1{print firstRec,$i,$NF}' yourfile.txt
awk reads files line by line and processes each record it comes across. Fields are delimited automatically by white space. The first field is $1, the second is $2 and so on. awk has built in variables; here we use NF which is the Number of Fields contained in the record, and NR which is the record number currently being processed.
This script does the following:
If the record number is greater than 1 (not the header) then
Loop through each field (separated by white space here) until we find a field that has "pdf" in it ($i!~/pdf/). Store everything we find up until that field in a variable called firstRec separated by a space (firstRec=firstRec" "$i).
print out the firstRec, then print out whatever field we stopped iterating on (the one that contains "pdf") which is $i, and finally print out the last field in the record, which is $NF (print firstRec,$i,$NF)
You can direct this to another file:
awk 'NR>1{for(i=1; $i!~/pdf/; ++i) firstRec=firstRec" "$i} NR>1{print firstRec,$i,$NF}' yourfile.txt > outfile.txt
sed may be a cleaner way of going here since, if your pdf file has more than one space separating characters, then you will lose the multiple spaces.
You can use sed on each line like that:
line="R01234567_High Transcript_01234567.pdf High School Transcript R01234567"
echo "$line" | sed 's/\.pdf.*R0/\.pdf R0/'
# output
R01234567_High Transcript_01234567.pdf R01234567
This replace anything between .pdf and R0 with a spacebar.
It doesn't deal with some edge cases but it simple and clear

Does shell confuse <space> with <new line> when reading text files?

i tried to run this script :
for line in $(cat song.txt)
do echo "$line" >> out.txt
done
running it on ubuntu 11.04
when "song.txt" contains :
I read the news today oh boy
About a lucky man who made the grade
after running the script the "out.txt" looks like that:
I
read
the
news
today
oh
boy
About
a
lucky
man
who
made
the
grade
is anyone can tell me what i am doing wrong here?
For per-line input you should use while read, for example:
cat song.txt | while read line
do
echo "$line" >> out.txt
done
Better (more efficient really) would be the following method:
while read line
do
echo "$line"
done < song.txt > out.txt
That's because of the for command that takes every word from the list it is given (in your case, the content of the song.txt file), whether the words are separated by spaces or newline characters.
What's misguiding you here, is that your for variable name is line. for only works with words. Reread your script changing line by word and it should make sense.
In the for-each-in loop the list that is specified is assumed to be white-space separated. A white-space includes a space, new-line, tab etc. In your case the list is the entire text of the file and hence, the loop runs for every word in the file.

Resources