Appending blank lines to paragraphs in Linux - linux

I have a textfile that contains a story with many paragraphs and I want to print the story in the terminal, I also want to make it so when it finds a blank line in the textfile it adds another blank line to this in the whole story and print this as a new file.
I have this code but it only appends a new line when it finds a fullstop (so it adds a blank line to every sentence) and prints this as a new file:
awk -v RS="." '/^./ { print " " $0 " " }' < 52293-0.txt > output
What mistake am I making?
Many thanks,

Related

Add pipe delimiter at the end of each row using unix

I am new to unix commands, please forgive if i am not using correct line of code below.
I have files (xxxx.txt.date) on winscp with header and footer. Now i want to add N number of pipe (|) at the end of the each row of all files starting from 2nd line till second last line. (i dont want | in header as well as footer).
Now i have created a scirpt in which i am using below command:
sed -e "2,\$s/$/|/" $file | column -t
2,$s/$/|/: adds | at the end of every line from line 2
Now below are the issues i am facing
First
The data doesn't change in the files i am able to see pipe added at end of each row in hive, how can i change data in files?
I don't want | in footer.
Any suggestion or help will be appreciated.
Thanks in advance !!
If you need to append just one "|" at the end of each line except header and footer
sed -i '1n; $n; s/$/|/' file_name
1n; $n; : Just print first and last line as is.
-i : make changes to the file instead of printing to STDOUT.
If you need to append n pipes at the end of each line except Header and Footer. If you use the below awk command, you will have to redirect the output to a temporary file and then rename it.
Assumptions:
I am assuming your Header and Footer are standard and start with some character(e.g., H, F, T etc) or String(Header, Footer, Trailer etc)
I am assuming your original file is delimited with "|". You can specify your actual delimiter in the below awk.
awk -F'|' -v n=7 '{if(/^Header|^Footer/) {print} else {end="";for (i=1;i<=n;i++) end=sprintf("%s%s", end, "|"); rec=sprintf("%s%s", $0, end); print rec}}' file_name
n=number of times you want to repeat | at the end of each line.
^Header|^Footer - If the line starts with "Header" or "Footer", just print the record as it is. You can specify your header and footer strings from file.
for loop - prepares a string "end" which contains "|" n times.
rec - Contains concatenated string of entire record followed by end string

Replace fasta headers using sed command

I have a fasta file which looks like this.
>header1
ATGC....
>header2
ATGC...
My list files looks like this
organism1
organism2
and contains a list of organism that I want to replace the header with.
I tried to use a for loop using sed command which is as follows:
for i in `cat list7b`; do sed "s/^>/$i/g" sequence.fa; done
but it didn't work please tell how I can achieve this task.
The result file should look like this
>organism1
ATGC...
>organism2
ATGC....
that is >header1 replaced with >organism_1 and so on
The two headers are distinguished from ATGC as header always starts with > greater than sign whereas ATGC would not. That's how they are distinguished.
The header lines should be replaced by the order of appearance, i.e. first header* replaced with first-line from file, 2nd header from the second and so on.
I also request to explain the logic if possible.
thanks in advance.
With awk this is easy to do in one run.
Assuming your fasta file is named sequence.fa and your organisms list file is named list7b as in the question you can use
awk 'NR == FNR { o[n++] = $0; next } /^>/ && i < n { $0 = ">" o[i++] } 1' list7b sequence.fa > output.fa
Explanation:
NR == FNR is a condition for doing something with the first file only. (total number of records is equal to number of records in current file)
{ o[n++] = $0; next } puts the input line into array o, counts the entries and skips further processing of the input line, so o will contain all your organism lines.
The next part is executed for the remaining file(s).
/^>/ && i < n is valid for lines that start with > as long as i is less than the number of elements n that were put into array o.
{ $0 = ">" o[i++] } replaces the current line with > followed by the array element (i.e. a line from the first file) and increments the index i to the next element.
1 is an "always true" condition with the implicit default action { print } to print the current line for every input line.

AWK: How to cut and rearrange by index AND columns without overwriting the first argument to print

I'm trying to read a file line by line, do string manipulations to each line and write the output to a file;
cat fileName | awk '{...}' >> fileOut
The specific string manipulation I am trying to accomplish is to, for each line, firstly print all the content after some index, the same for each line, say X, excluding the terminating newline, then " : ", then the first column, although I could also do this by substring if needed. I have found examples which combine variable declaration of column values, setting them to zero, variable declaration of substrings (with or without terminating on the last index), and combining these with print/f, but in all examples the use of substring and column indexing are mutually exclusive.
In every attempt to substitute one for the other in examples, the content of the first column always seems to simply replace the content of the substring. As I have tried many ways around this, I will provide the most recent attempt;
Say a line of input was "1234 abcd efgh IJKL mnop" and I want to print everything from index 10, then " : " then column 1, my command would look like:
cat fileName | awk '{printf(“%s : %s/n”,substr($0,10),$1)}' >> fileOut
cat fileName | awk '{A=substr($0,10);B=$1;printf(“%s : %s/n”,A,B)}' >> fileOut
cat fileName | awk '{print substr($0,10)” : “$1}' >> fileOut
However in every case so far, the string returned starts with the " : " followed by the contents of $1, followed by the substr with the first consistent number of characters removed from the front, e.g.
" : 1234L mnop", when I expect "efgh IJKL mnop : 1234"
Why does using a column overwrite the return of substr?

Printing to file with vbNewLine results in double NewLine

I have the following simple line to print a new line to a log file:
Print #fileNumber, vbNewLine
However, this results in 2 newlines instead of one. My code does not have any other vbNewLines or anything that would print newlines.
If I do not have this print line, then I print no newlines, so this means this line is printing 2 newlines.
Does anyone have any ideas why?
Just tested this, and Print always adds a linebreak.
So simply using Print #fileNumber, will result in 1 blank line.
The problem is that Print already prints on a new line, so when you add vbnewline you're getting 2 lines.

Substituting a single line with multiple lines of text

In Linux what command can I use to replace a single line of text with new multiple lines? I want to look for a keyword on a line and delete this line and replace it with multiple new lines. So in the text shown below I want to search for the line that contains "keyword" and replace the entire line with 3 new lines of text as shown.
For example replacing the line containing the keyword,
This is Line 1
This is Line 2 that has keyword
This is Line 3
changed to this:
This is Line 1
Inserted is new first line
Inserted is new second line
Inserted is new third line
This is Line 3
$ sed '/keyword/c\
> Inserted is new first line\
> Inserted is new second line\
> Inserted is new third line' input.txt
This is Line 1
Inserted is new first line
Inserted is new second line
Inserted is new third line
This is Line 3
$ and > are bash prompt
Create a file, script.sed, containing:
/keyword/{i\
Inserted is new first line\
Inserted is new second line\
Inserted is new third line
d
}
Apply it to your data:
sed -f script.sed your_data
There are numerous variations on how to do it, using the c and a commands instead of i and/or d, but this is reasonably clean. It finds the keyword, inserts three lines of data, and then deletes the line containing the keyword. (The c command does that all, but I didn't remember that it existed, and the a command appends the text and is essentially synonymous with i in this context.)
you can do it using shell builtins too:
STRING1_WITH_MULTIPLE_LINES="your
text
here"
STRING2_WITH_MULTIPLE_LINES="more
text"
OUTPUT=""
while read LINE || [ "$LINE" ]; do
case "$LINE" in
"Entire line matches this")OUTPUT="$OUTPUT$STRING1_WITH_MULTIPLE_LINES
";;
*"line matches this with extra before and/or after"*)OUTPUT="$OUTPUT$STRING2_WITH_MULTIPLE_LINES
";;
*)OUTPUT="$OUTPUT$LINE
";;
esac
done < file
echo "$OUTPUT" >file

Resources