Bash sed replace text with file content - linux

I would like to replace string with file.txt content.
mtn="John"
fs=`cat file.txt`
lgtxt=`cat large_text.txt`
stxt1=`echo $lgtxt | sed "s/zzzz/$mtn/g"`
stxt2=`echo $stxt1 | sed "s/pppp/$fs/g"`
It replace 'zzzz' with value of 'mnt' but doesn't 'pppp'.
File file.txt contain list of names eg:
Tom jones
Ted Baker
Linda Evans
in separate lines.
I want to place them in file large_text.txt in separate lines like they are in oryginal file and separated by commas.

You don't want to use a variable for the substitution (because it may well contain newlines, for example). I'm assuming that it's GNU sed given it's linux. In which case, see whether GNU sed's r command could help you:
`r FILENAME'
As a GNU extension, this command accepts two addresses.
Queue the contents of FILENAME to be read and inserted into the
output stream at the end of the current cycle, or when the next
input line is read. Note that if FILENAME cannot be read, it is
treated as if it were an empty file, without any error indication.
As a GNU `sed' extension, the special value `/dev/stdin' is
supported for the file name, which reads the contents of the
standard input.
If pppp is on a line of its own, you could go with something like
/pppp/{
r file.txt
d
}
or, alternatively, the s command with e modifier:
`e'
This command allows one to pipe input from a shell command into
pattern space. If a substitution was made, the command that is
found in pattern space is executed and pattern space is replaced
with its output. A trailing newline is suppressed; results are
undefined if the command to be executed contains a NUL character.
This is a GNU `sed' extension.
This would look something like
s/pppp/cat file.txt/e
and is what you'll need if pppp is mid-line. Also, if you need to do further processing on file.txt, you could replace cat with whatever you need (though you need to be careful about quoting / and \).
A final option is to consider Perl, which will accept something very similar to your shell commands.

Related

How to truncate rest of the text in a file after finding a specific text pattern, in unix?

I have a HTML PAGE which I have extracted in unix using wget command, in that after the word "Check list" I need to remove all of the text and with the remaining I am trying to grep some data. I am unable to think on a way which can be helpful for removing the text after a keyword. if I do
s/Check list.*//g
It just removes the line , I want everything below that to be gone. How do I perform this?
The other solutions you have so far require non-POSIX-mandatory tools (GNU sed, GNU awk, or perl) so YMMV with their availability and will read the whole file into memory at once.
These will work in any awk in any shell on every Unix box and only read 1 line at a time into memory:
awk -F 'Check list' '{print $1} NF>1{exit}' file
or:
awk 'sub(/Check list.*/,""){f=1} {print} f{exit}' file
With GNU awk for multi-char RS you could do:
awk -v RS='Check list' '{print; exit}' file
but that would still read all of the text before Check list into memory at once.
Depending on which sed version you have, maybe
sed -z 's/Check list.*//'
The /g flag is useless as you only want to replace everything once.
If your sed does not have the -z option (which says to use the ASCII null character as line terminator instead of newline; this hinges on your file not containing any actual nulls, but that should trivially be true for any text file), try Perl:
perl -0777 -pe 's/Check list.*//s'
Unlike sed -z, this explicitly says to slurp the entire file into memory (the argument to -0 is the octal character code of a terminator character, but 777 is not a valid terminator character at all, so it always reads the entire file as a single "line") so this works even if there are spurious nulls in your file. The final s flag says to include newline in what . matches (otherwise s/.*// would still only substitute on the matching physical line).
I assume you are aware that removing everything will violate the integrity of the HTML file; it needs there to be a closing tag for every start tag near the beginning of the document (so if it starts with <html><body> you should keep </body></html> just before the end of the file, for example).
With awk you could make use of RS variable and then set field separator to regex with word boundaries and then print the very first field as per need.
awk -v RS="^$" -v FS='\\<check_list\\>' '{print $1}' Input_file
You might use q to instruct GNU sed to quit, thus ending processing, consider following simple example, let file.txt content be
123
456
789
and say you want to jettison everything beyond 5, then you could do
sed '/5/{s/5.*//;q}' file.txt
which gives output
123
4
Explanation: for line having 5, substitute 5 and everything beyond it with empty string (i.e. delete it), then q. Observe that lowercase q is used to provide printing of altered line before quiting.
(tested in GNU sed 4.7)

Output the names of all files from file.txt, having the .conf extension

I need to output from a file file.txt the names of all files with the .conf extension.
grep .conf file.txt
But in the end, I get a file called dconf and a file with the config extension. How can I output everything else, but without these two?
The '.' has a special meaning, it says "any character". If you really want to match only the dot itself, you have to mask the character with:
grep "\.conf" file.txt
The masking with backslash must also be masked for the shell itself with ".
To see a list of regular expressions, you can take a look at online regex test.
Add on:
From the comments: How to see no file from the list which is named xyz.config
Answer: You have to tell grep that the regular expression ends at the end of the word with:
grep "\.conf\>" file.txt
TL;DR: you should instead do:
grep "\.conf\>" file.txt
grep uses Regular Expressions. The . character in a regex is a command which means "match any one character." So your command means "match any string which contains one character followed by c o n f in that order."
So, your regular expression will match what you are looking for, but it will also match strings that have things after your match (your .config example) as well as anything followed by "conf" (your dconf example)
So instead you want to tell grep that you are looking for a "string literal ." by escaping that character in your regular expression by preceding it with a backslash (\), and you want to describe what the end or your string input is like, which may be a newline or it may simply be a space.

How to add character at the end of specific line in UNIX/LINUX?

Here is my input file. I want to add a character ":" into the end of lines that have ">" at the beginning of the line. I tried seq -i 's|$|:|' input.txt but ":" was added to all the ending of each line. It is also hard to call out specific line numbers because, in each of my input files, the line contains">" present in different line numbers. I want to run a loop for multiple files so it is useless.
>Pas_pyrG_2
AAAGTCACAATGGTTAAAATGGATCCTTATATTAATGTCGATCCAGGGACAATGAGCCCA
TTCCAGCATGGTGAAGTTTTTGTTACCGAAGATGGTGCAGAAACAGATCTGGATCTGGGT
>Pas_rpoB_4
CAAACTCACTATGGTCGTGTTTGTCCAATTGAAACTCCTGAAGGTCCAAACATTGGTTTG
ATCAACTCGCTTTCTGTATACGCAAAAGCGAATGACTTCGGTTTCTTGGAAACTCCATAC
CGCAAAGTTGTAGATGGTCGTGTAACTGATGATGTTGAATATTTATCTGCAATTGAAGAA
>Pas_cpn60_2
ATGAACCCAATGGATTTAAAACGCGGTATCGACATTGCAGTAAAAACTGTAGTTGAAAAT
ATCCGTTCTATTGCTAAACCAGCTGATGATTTCAAAGCAATTGAACAAGTAGGTTCAATC
TCTGCTAACTCTGATACTACTGTTGGTAAACTTATTGCTCAAGCAATGGAAAAAGTAGGT
AAAGAAGGCGTAATCACTGTAGAAGAAGGCTCAGGCTTCGAAGACGCATTAGACGTTGTA
Here is experted output file:
>Pas_pyrG_2:
AAAGTCACAATGGTTAAAATGGATCCTTATATTAATGTCGATCCAGGGACAATGAGCCCA
TTCCAGCATGGTGAAGTTTTTGTTACCGAAGATGGTGCAGAAACAGATCTGGATCTGGGT
>Pas_rpoB_4:
CAAACTCACTATGGTCGTGTTTGTCCAATTGAAACTCCTGAAGGTCCAAACATTGGTTTG
ATCAACTCGCTTTCTGTATACGCAAAAGCGAATGACTTCGGTTTCTTGGAAACTCCATAC
CGCAAAGTTGTAGATGGTCGTGTAACTGATGATGTTGAATATTTATCTGCAATTGAAGAA
>Pas_cpn60_2:
ATGAACCCAATGGATTTAAAACGCGGTATCGACATTGCAGTAAAAACTGTAGTTGAAAAT
ATCCGTTCTATTGCTAAACCAGCTGATGATTTCAAAGCAATTGAACAAGTAGGTTCAATC
TCTGCTAACTCTGATACTACTGTTGGTAAACTTATTGCTCAAGCAATGGAAAAAGTAGGT
AAAGAAGGCGTAATCACTGTAGAAGAAGGCTCAGGCTTCGAAGACGCATTAGACGTTGTA
Do seq have more option to modify or the other commands can solve this problem?
sed -i '/^>/ s/$/:/' input.txt
Search the lines of input for lines that match ^> (regex for "starts with the > character). Those that do substitute : for end-of-line (you got this part right).
/ slashes are the standard separator character in sed. If you wish to use different characters, be sure to pass -e or s|$|:| probably won't work. Since / characters, unlike | characters, are not meaningful character within the shell, it's best to use them unless the pattern also contains slashes, in which case things get unwieldy.
Be careful with sed -i. Make a backup - make sure you know what's changing by using diff to compare the files.
On OSX -i requires an argument.
Using ed to edit the file:
printf "%s\n" 'g/^>/s/$/:/' w | ed -s input.txt
For every line starting with >, add a colon to the end, and then write the changed file back to disk.

OSX Terminal: how to add same item to each line of a csv file?

I have got a csv file like this:
120,256,300
36,255,12
etc...
I want to add a fixed string like 'USA' to all lines in order to obtain:
120,256,300,USA
36,255,12,USA
etc...
How can I do that?
Thanks
From a text processing point of view that CSV file is plain text in this context, you just want to attach , USA to each line.
The easiest (and operationally least expensive) way to do so is probably:
sed -i '' 's/$/, USA/' file
What this does is to instruct sed to look for the end of line $ and "replace" it with , USA. As sed is line-based this obviously doesn't actually trim out the new line of the file.
-i '' instructs sed to make the changes in-line without creating a backup file.
If you wanted a backup you can put the desired extension instead of '', e.g. -i .bak.
You can just use sed: cat <input-file> | sed 's/\(.*\)/\1, USA/'.
Here s is the substitute command, which uses the following character as a separator between a regular expression and a substitution. For the regular expression, the escaped parenthesis are used to create a capture group, the regex .* captures the entire line. For the substitution, the \1 inserts the first capture group, and then the , USA text is appended.
You can perform the replacement in place using: sed -i .bak 's/\(.*\)/\1, USA/' <input-file>

concatenation of strings in bash results in substitution

I need to read a file into an array and concatenate a string at the end of each line. Here is my bash script:
#!/bin/bash
IFS=$'\n' read -d '' -r -a lines < ./file.list
for i in "${lines[#]}"
do
tmp="$i"
tmp="${tmp}stuff"
echo "$tmp"
done
However, when I do this, an action of replace happens, instead of concatenation.
For example, in the file.list, we have:
http://www.example1.com
http://www.example2.com
What I need is:
http://www.example1.comstuff
http://www.example2.comstuff
But after executing the script above, I get things as below on the terminal:
stuff//www.example1.com
stuff//www.example2.com
Btw, my PC is Mac OS.
The problem also occurs while concatenating strings via awk, printf, and echo commands. For example echo $tmp"stuff" or echo "${tmp}""stuff"
The file ./file.lst is, most probably, generated on a Windows system or, at least, it was saved using the Windows convention for end of line.
Windows uses a sequence of two characters to mark the end of lines in a text file. These characters are CR (\r) followed by LF (\n). Unix-like systems (Linux and macOS starting with version 10) use LF as end of line character.
The assignment IFS=$'\n' in front of read in your code tells read to use LF as line separator. read doesn't store the LF characters in the array it produces (lines[]) but each entry from lines[] ends with a CR character.
The line tmp="${tmp}stuff" does what is it supposed to do, i.e. it appends the word stuff to the content of the variable tmp (a line read from the file).
The first line read from the input file contains the string http://www.example1.com followed by the CR character. After the string stuff is appended, the content of variable tmp is:
http://www.example1.com$'\r'stuff
The CR character is not printable. It has a special interpretation when it is printed on the terminal: it sends the cursor at the start of the line (column 1) without changing the line.
When echo prints the line above, it prints (starting on a new line) http://www.example1.com, then the CR character that sends the cursor back to the start of the line where is prints the string stuff. The stuff fragment overwrites the first 5 characters already printed on that line (http:) and the result, as it is visible on screen, is:
stuff//www.example1.com
The solution is to get rid of the CR characters from the input file. There are several ways to accomplish this goal.
A simple way to remove the CR characters from the input file is to use the command:
sed -i.bak s/$'\r'//g file.list
It removes all the CR characters from the content of file file.list, saves the updated string back into the file.list file and stores the original file.list file as file.list.bak (a backup copy in case it doesn't produce the output you expect).
Another way to get rid of the CR character is to ask the shell to remove it in the command where stuff is appended:
tmp="${tmp/$'\r'/}stuff"
When a variable is expanded in a construct like ${tmp/a/b}, all the appearances of a in $tmp are replaced with b. In this case we replace \r with nothing.
I'm guessing it's have something to do with the Carriage Return character.
Did your file.list created on windows? If so, try to use dos2unix before running the script.
Edit
You can check your files using the file command.
Example:
file file.list
If you saved the file in Windows Notepad like this:
Then it will probably come up like this:
file.list: ASCII text, with no line terminators
You can use built in tools like iconv to convert the encodings. However for a simple use like this, you can just use a command that works for multiple encodings without any conversion necessary.
You could simply buffer the file through cat, and use a regular expression that applies to either:
Carriage return followed by line terminator, or
Line terminator on it's own
Then append the string.
Example:
cat file.list | grep -E -v "^$" | sed -E -e "s/(\r?$)/stuff/g"
Will work with ASCII text, and ASCII text with no line terminators.
If you need to modify a stream to append a fixed string, you can use sed or awk, for instance:
sed 's/$/stuff/'
to append stuff to the end of each line.
using "dos2unix file.list" would also solve the problem

Resources