This question already has answers here:
Are shell scripts sensitive to encoding and line endings?
(14 answers)
Closed last year.
I am new on coding in bash Linux and I have the following problem.
I'm trying to concatenate string in loop to create a path. I have a text file in which I stored some strings to use in the loop. I wrote this example just to show you the problem:
for bio in `cat /data/giordano/species_ranges/prova_bio.txt` # list of strings: "bio_01", "bio_02"...
do
echo /data/giordano/species_range/$bio.tif # concatenation
done
The result I expect would be:
/data/giordano/species_range/bio_01.tif
/data/giordano/species_range/bio_02.tif
/data/giordano/species_range/bio_03.tif
But what actually came out was:
.tifa/giordano/species_range/bio_01
.tifa/giordano/species_range/bio_02
.tifa/giordano/species_range/bio_03
/data/giordano/species_range/bio_04.tif
I really don't understand what kind of problem it is...
I suggest that awk would be simpler for this task. We use tr to remove the Cr line endings
~/tests/bash $ tr -d "\r" < data/giordano/species_range/proverbio.txt | awk '{ print "/data/giordano/species_range/" $0 ".tif"
> }'
/data/giordano/species_range/bio_1.tif
/data/giordano/species_range/bio_2.tif
/data/giordano/species_range/bio_3.tif
/data/giordano/species_range/bio_4.tif
Thank you to Charles Duffy for the improvements.
You probably have Windows line endings in your file, which contain an additional carriage return (\r). This makes the cursor go to the beginning of the line. You can remove the \rs from your file by piping to tr. Extend your first line like this:
for bio in `cat /data/giordano/species_ranges/prova_bio.txt | tr -d '\r'`
Related
This question already has an answer here:
How can I extract the content between two brackets?
(1 answer)
Closed 4 years ago.
I have a large log file I need to sort, I want to extract the text between parentheses. The format is something like this:
<#44541545451865156> (example#6144) has left the server!
How would I go about extracting "example#6144"?
This sed should work here:
sed -E -n 's/.*\((.*)\).*$/\1/p' file_name
There are many ways to skin this cat.
Assuming you always have only one lexeme in parentheses, you can use bash parameter expansion:
while read t; do echo $(t=${t#*(}; echo ${t%)*}); done <logfile
The first substitution: ${t#*(} cuts off everything up and including the left parenthesis, leaving you with example#6144) has left the server!; the second one: ${t%)*} cuts off the right parenthesis and everything after that.
Alternatively, you can also use awk:
awk -F'[)(]' '{print $2}' logfile
-F'[)(]' tells awk to use either parenthesis as the field delimiter, so it splits the input string into three tokens: <#44541545451865156>, example#6144, and has left the server!; then {print $2} instructs it to print the second token.
cut would also do:
cut -d'(' -f 2 logfile | cut -d')' -f 1
Try this:
sed -e 's/^.*(\([^()]*\)).*$/\1/' <logfile
The /^.*(\([^()]*\)).*$/ is a regular expression or regex. Regexes are hard to read until you get used to them, but are most useful for extracting text by pattern, as you are doing here.
I have a file contained \n hidden behind each line:
input:
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
I want to remove \n behind
desired output:
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626
however, I try this:
tr -d '\n' < file1 > file2
but it goes like below without space and new line
s3741206s2561284s4411364s2516482s2071534s2074633s7856856s11957134s682333s9378200s1862626
I also try sed $'s/\n//g' -i file1 and it doesn't work in mac os.
Thank you.
This is a possible solution using sed:
sed 's/\\n/ /g'
with awk
awk '{sub(/\\n/,"")} 1' < file1 > file2
What you are describing so far in your question+comments doesn't make sense. How can you have a multi-line file with a hidden newline character at the end of each line? What you show as your input file:
s3741206\n
s2561284\n
s4411364\n
etc.
where each "\n" above according to your comment is a single newline character "\n" is impossible. If those "\n"s were newline characters then your file would simply look like:
s3741206
s2561284
s4411364
etc.
There's really only 2 possibilities I can think of:
You are wrongly interpreting what you are seeing in your input file
and/or using the wrong terminology and you actually DO have \r\n
at the end of every line. Run cat -v file to see the \rs as
^Ms and run dos2unix or similar (e.g. sed 's/\r$//' file) to
remove the \rs - you do not want to remove the \ns or you will
no longer have a POSIX text file and so POSIX tools will exhibit
undefined behavior when run on it. If that doesn't work for you then
copy/paste the output of cat -v file into your question so we can
see for sure what is in your file.
Or:
It's also entirely possible that your file is a perfectly fine POSIX
text file as-is and you are incorrectly assuming you will have a
problem for some reason so also include in your question a
description of the actual problem you are having, include an example
of the command you are executing on that input file and the output
you are getting and the output you expected to get.
You could use bash-native string substitution
$ cat /tmp/newline
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
$ for LINE in $(cat /tmp/newline); do echo "${LINE%\\n}"; done
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626
Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.
This question already has answers here:
how can I combine these lines
(4 answers)
Closed 8 years ago.
I want convert this text on a given file:
87665
S
3243423
S
334243
N
...
to something like this:
87665,S
3243423,S
334243,N
...
I've been reading some similar questions, but it didn't work... is there a way to do this with a single line command in linux?
Thanks!
Using sed:
sed '$!N;s/\n/,/' filename
Using paste:
paste -d, - - < filename
paste would leave a trailing , in case the input has an odd number of lines.
Something like this might work for you:
$ awk 'NR%2{a=$0;next}{print a","$0}' file
87665,S
3243423,S
334243,N
To handle files with odd lines, you can do:
awk '{printf "%s%s", $0, NR%2?",":ORS}' file
Just for fun, a pure bash solution:
while IFS= read -r l1; do
read -r l2
printf '%s\n' "$l1${l2:+,$l2}"
done < file
If there's an odd number of lines, the last line will not have a trailing comma.
I need to read a file line by line in Linux, find a substring in each line, remove it and place it at the end of that line.
Example:
Line in the original file:
a,b,c,substring,d,e,f
Line in the output file:
a,b,c,d,e,f,substring
How do I do it with the Linux command? Thanks!
sed '/substring/{ s///; s/$/substring/;} '
will handle a fixed substring. Note that if substring begins with a ,, this handles your example case well. If the substring is not fixed but may be a general regular expression:
sed 's/\(substring\)\(.*\)/\2\1'
If you are looking for general csv parsing, you should rephrase the question. (It will be difficult to apply this solution to find a fixed string at the start of a line if you are thinking of the input as comma separated fields.)
I always prefer to use perl's command line to do such regex tasks - perl is powerful enough to cover awk and sed in most of my usages, and both available in windows and linux, it is just easy and handy to me, so the solution in perl would be like:
perl -ne "s/^(.*?)(?:(?<comma>,)(?<substr>substring)|(?<substr>substring)(?<comma>,))(?<right>.*)$/$1$+{right}$+{comma}$+{substr}/; print" input.txt > output.txt
or a simpler one:
perl -lpe "if(s/(,substring|substring,)//){ s/$/,substring/ }" input.txt > output.txt
input.txt
substring,a,b,c,d,e,f
a,b,c,substring,d,e,f
a,b,c,d,e,f,substring
substring,a
a,substring
substring
a
output.txt
a,b,c,d,e,f,substring
a,b,c,d,e,f,substring
a,b,c,d,e,f,substring
a,substring
a,substring
substring
a
You can edit based on your actual input:
If there are any space between words and commas
If you are using tab as separator
Some explanation of the command line:
use perl's -n -e options: -n means process the input line by line in a loop; -e means one line program in the command line
use perl's -l -p options: -l means process multilines; -p means always print
The one line program is just a regex replacement and a print
(?:pattern) means group but don't capture the match
(?<comma>) is a named group, you then need to use $+{comma} hash to access it