I would like to create a file where the line feeds are removed, except in the first line.
Input:
EHH_2020_A1
CCAAGATATTTTATAT
CCATATACC
ATTAT
GTA
Desired output:
EHH_2020_A1
CCAAGATATTTTATATCCATATACCATTATGTA
Thanks a lot in advance!
Best,
Perl to the rescue!
perl -pe 'chomp unless 1 == $.' file
-p reads the input line by line and runs the code for each line,
$. stores the current line number,
chomp removes the final newline (if present)
If you want to keep the final newline, change the condition to
unless 1 == $. || eof
eof returns true when at the end of the file.
You can do it trivially in awk as well, e,g, with your input in the file genes, you would have:
$ awk 'FNR==1 {print; next} {printf "%s", $0} END {print ""}' genes
EHH_2020_A1
CCAAGATATTTTATATCCATATACCATTATGTA
Where the command takes the first record (line) where FNR==1 and simply prints it unchanged. The second rule prints all other lines without a '\n' effectively concatenating them together, and the END rule outputs the final newline.
Related
I have a bash script which gets a text file as input and takes two parameters (Line N° one and line N° two), then changes both lines with each other in the text. Here is the code:
#!/bin/bash
awk -v var="$1" -v var1="$2" 'NR==var {
s=$0
for(i=var+1; i < var1 ; i++) {
getline; s1=s1?s1 "\n" $0:$0
}
getline; print; print s1 s
next
}1' Ham > newHam_changed.txt
It works fine for every two lines which are not consecutive. but for lines which follows after each other (for ex line 5 , 6) it works but creates a blank line between them. How can I fix that?
I think your actual script is not what you posted in the question. I think the line with all the prints contains:
print s1 "\n" s
The problem is that when the lines are consecutive, s1 will be empty (the for loop is skipped), but it will still print a newline before s, producing a blank line.
So you need to make that newline conditional.
awk -v var="4" -v var1="6" 'NR==var {
s=$0
for(i=var+1; i < var1 ; i++) {
getline; s1=s1?s1 "\n" $0:$0
}
getline; print; print (s1 ? s1 "\n" : "") s
next
}1' Ham > newHam_changed.txt
Using getline makes awk scripts always a bit complicated. It is better to prevent the use of getline and just make use of the awk pattern { action } syntax. This will make perfectly readable scripts. In any other language you would just do a loop and get the next line, but in awk I think it is best to make good use of this feature.
awk -v var="$1" -v var1="$2" '
NR==var {s=$0; collect=1; next;}
NR==var1 {collect=0; print; printf inbetween; print s}
collect {inbetween=inbetween""$0"\n"; next;}
1' Ham
Here I capture the first line in s when I found it and set the collect flag. This will trigger the collect block on the next iteration which collects all lines in between. Whenever the second line is found it sets the collect back to zero and prints first the current line, than the inbetween lines and then s. If the lines are consecutive inbetween is empty and printf will than do nothing.
Too complex for my taste, here is something quite simple that achieves the same task:
#!/bin/bash
ORIGFILE='original.txt' # original text file
PROCFILE='processed.txt' # copy of the original file to be proccesed
CHGL1=`sed "$1q;d" $ORIGFILE` # get original $1 line
CHGL2=`sed "$2q;d" $ORIGFILE` # get original $2 line
`cat $ORIGFILE > $PROCFILE`
sed -i "$2s/^.*/$CHGL1/" $PROCFILE # replace
sed -i "$1s/^.*/$CHGL2/" $PROCFILE # replace
More code doesn't mean more useful, keep it simple. This code do not use for and instead goes directly to the specific lines.
EDIT:
A simple way on one line to do this task:
printf '%s\n' 14m26 26-m14- w q | ed -s file
Found in this answer.
I have a text file with a few lines in it. What i am trying to do is to find all lines matching a pattern and if there is no newline (= non empty line) before them, create it.
Something like this, but it is not working properly:
sed -i '/[a-zA-Z0-9]/{N;/PATTERN/{s/PATTERN/\nPATTERN/}}' FILENAME
I know it could be probably done more easily and nicely in awk or perl/bash, but i would prefer an one line/one step solution.
Sample input file:
LINE1
LINE2
PATTERN
LINE3
PATTERN
LINE4
Expected output:
LINE1
LINE2
PATTERN
LINE3
PATTERN
LINE4
I'm not very good at sed but here's how I'd do it in awk:
awk 'prev != "" && /PATTERN/ { print "" } { prev = $0; print }' file
If prev (the previous line) is not empty and the current line matches /PATTERN/ then print a blank line. Unconditionally save the current line for comparison with the next, and print the current line.
To achieve an "in-place" edit (like sed -i), just redirect the command to a temporary file and then overwrite the original:
awk 'prev != "" && /PATTERN/ { print "" } { prev = $0; print }' file > tmp && mv tmp file
Note that since prev is initially unset, this won't print a newline at the start of the output, even if the first line matches /PATTERN/. To get around this, you can change the condition to:
(NR == 1 || prev != "") && /PATTERN/
You can also achieve the in-place edit with GNU awk, using the -i inplace option.
Take a look at this GNU sed (note that awk is a better tool for the job):
sed -i '/PATTERN/{x;/^$/!i\
x};h' input
h is a command that saves the contents of the pattern space into the hold buffer. It saves the line at the end of each cycle so that it can be used as the "previous" line in the next cycle
x exchanges the contents of the hold and pattern spaces. Whenever the current line matches your /PATTERN/, the previously saved line is put into the pattern space. If the previous line is NOT empty (/^$/!), newline is inserted with the i command. The current line is then put back into the pattern space with the x command
If you want to add a newline even if the first line matches /PATTERN/, use:
sed -i '/PATTERN/{1h;x;/^$/! ...
Further reading:
GNU sed: Less Frequently-Used Commands
grymoire.com sed tutorial
given file test and its content:
bcd://dfl
sf
I would like to append extra information to the line having certain content (starting with bcd)
While the following script works
awk '/bcd*/ {print $0", extra information"} ' test > test.old && mv test.old test
it removes the non matching lines. (sf)
Is it possible to preserve them in the output file?
As discussed over in the comments appending a {..}1 at the end will solve your problem,
awk '/^bcd/ {print $0", extra information"; next} 1' file
because the /<pattern>/{<action>} is applied to the lines only matching the <pattern>, the other lines are just printed as-is, {..}1 is a always-true-no-matter-what condition to print lines.
awk '/^bcd/ {$0 = $0 ", extra information"} 1' test
Have to write a script which updates the file in this way.
raw file:
<?blah blah blah?>
<pen>
<?pineapple?>
<apple>
<pen>
Final file:
<?blah blah blah?><pen>
<?pineapple?><apple><pen>
Where ever in the file if the new line charter is not followed by
<?
We have to remove the newline in order to append it at the end of previous line.
Also it will be really helpful if you explain how your sed works.
Perl solution:
perl -pe 'chomp; substr $_, 0, 0, "\n" if $. > 1 && /^<\?/'
-p reads the input line by line, printing each line after changes
chomp removes the final newline
substr with 4 arguments modifies the input string, here it prepends newline if it's not the first line ($. is the input line number) and the line starts with <?.
Sed solution:
sed ':a;N;$!ba;s/\n\(<[^?]\)/\1/g' file > newfile
The basic idea is to replace every
\n followed by < not followed by ?
with what you matched except the \n.
When you are happy with a solution that puts every <? at the start of a line, you can combine tr with sed.
tr -d '\n' < inputfile| sed 's/<?/\n&/g;$s/$/\n/'
Explanation:
I use tr ... < inputfile and not cat inputfile | tr ... avoiding an additional catcall.
The sed command has 2 parts.
In s/<?/\n&/g it will insert a newline and with & it will insert the matched string (in this case always <?, so it will only save one character).
With $s/$/\n/ a newline is appended at the end of the last line.
EDIT: When you only want newlines before <? when you had them already,
you can use awk:
awk '$1 ~ /^<\?/ {print} {printf("%s",$0)} END {print}'
Explanation:
Consider the newline as the start of the line, not the end. Then your question transposes into "write a newline when the line starts with <?. You must escape the ? and use ^ for the start of the line.
awk '$1 ~ /^<\?/ {print}'
Next print the line you read without a newline character.
And you want a newline at the end.
I have some data that looks like this. It comes in chunk of four lines. Each chunk starts with a # character.
#SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
#SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
88888888888888888888888888
What I want to do is to extract last line of each chunk. Yielding:
::::::::::::::::::::::::;;8
888888888888888888888888888
Note that the last line of the chunk may contain any standard ASCII character
including #.
Is there an effective one-liner to do it?
The following sed command will print the 3rd line after the pattern:
sed -n '/^#/{n;n;n;p}' file.txt
If there are no blank lines:
perl -ne 'print if $. % 4 == 0' file
$ awk 'BEGIN{RS="#";FS="\n"}{print $4 } ' file
::::::::::::::::::::::::;;8
88888888888888888888888888
If you always have those 4 lines in a chunk, some other ways
$ ruby -ne 'print if $.%4==0' file
::::::::::::::::::::::::;;8
88888888888888888888888888
$ awk 'NR%4==0' file
::::::::::::::::::::::::;;8
88888888888888888888888888
It also seems like your line is always after the line that start with "+", so
$ awk '/^\+/{getline;print}' file
::::::::::::::::::::::::;;8
88888888888888888888888888
$ ruby -ne 'gets && print if /^\+/' file
::::::::::::::::::::::::;;8
88888888888888888888888888
This prints the lines before lines that starts with #, and also the last line. It can work with non uniform sized chunks, but assumes that only a chunk leading line starts with #.
sed -ne '1d;$p;/^#/!{x;d};/^#/{x;p}' file
Some explanation is in order:
First you don't need the first line so delete it 1d
Next you always need the last line, so print it $p
If you don't have a match swap it into the hold buffer and delete it x;d
If you do have match swap it out of the hold buffer, and print it x;p
This works similarly to dogbane's answer
awk '/^#/ {mark = NR} NR == mark + 3 {print}' inputfile
And, like that answer, will work regardless of the number of lines in each chunk (as long as there are at least 4).
The direct analog to that answer, however, would be:
awk '/^#/ {next; next; next; print}' inputfile
this can be done using grep easily
grep -A 1 '^#' ./infile
This might work for you (GNU sed):
sed '/^#/,+2d' file