Replace string in a file from a file [duplicate] - linux

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 5 years ago.
I need help with replacing a string in a file where "from"-"to" strings coming from a given file.
fromto.txt:
"TRAVEL","TRAVEL_CHANNEL"
"TRAVEL HD","TRAVEL_HD_CHANNEL"
"FROM","TO"
First column is what to I'm searching for, which is to be replaced with the second column.
So far I wrote this small script:
while read p; do
var1=`echo "$p" | awk -F',' '{print $1}'`
var2=`echo "$p" | awk -F',' '{print $2}'`
echo "$var1" "AND" "$var2"
sed -i -e 's/$var1/$var2/g' test.txt
done <fromto.txt
Output looks good (x AND y), but for some reason it does not replace the first column ($var1) with the second ($var2).
test.txt:
"TRAVEL"
Output:
"TRAVEL" AND "TRAVEL_CHANNEL"
sed -i -e 's/"TRAVEL"/"TRAVEL_CHANNEL"/g' test.txt
"TRAVEL HD" AND "TRAVEL_HD_CHANNEL"
sed -i -e 's/"TRAVEL HD"/"TRAVEL_HD_CHANNEL"/g' test.txt
"FROM" AND "TO"
sed -i -e 's/"FROM"/"TO"/g' test.txt
$ cat test.txt
"TRAVEL"

input:
➜ cat fromto
TRAVEL TRAVEL_CHANNEL
TRAVELHD TRAVEL_HD
➜ cat inputFile
TRAVEL
TRAVELHD
The work:
➜ awk 'BEGIN{while(getline < "fromto") {from[$1] = $2}} {for (key in from) {gsub(key,from[key])} print}' inputFile > output
and output:
➜ cat output
TRAVEL_CHANNEL
TRAVEL_CHANNEL_HD
➜
This first (BEGIN{}) loads your input file into an associate array: from["TRAVEL"] = "TRAVEL_HD", then rather inefficiently performs search and replace line by line for each array element in the input file, outputting the results, which I piped to a separate outputfile.
The caveat, you'll notice, is that the search and replaces can interfere with each other, the 2nd line of output being a perfect example since the first replacement happens. You can try ordering your replacements differently, or use a regex instead of a gsub. I'm not certain if awk arrays are guaranteed to have a certain order, though. Something to get you started, anyway.
2nd caveat. There's a way to do the gsub for the whole file as the 2nd step of your BEGIN and probably make this much faster, but I'm not sure what it is.

you can't do this oneshot you have to use variables within a script
maybe something like below sed command for full replacement
-bash-4.4$ cat > toto.txt
1
2
3
-bash-4.4$ cat > titi.txt
a
b
c
-bash-4.4$ sed 's|^\s*\(\S*\)\s*\(.*\)$|/^\2\\>/s//\1/|' toto.txt | sed -f - titi.txt > toto.txt
-bash-4.4$ cat toto.txt
a
b
c
-bash-4.4$

Related

Filename manipulation

Kindly help me with a unix script to modify the filename in required format as shown below:
AN_555a_orange_20190513.txt
AN_555b_apple_20190513.txt
Required format: Fruits names first character should be in Caps and also its position should be is changed to second:
AN_Orange_555a_20190513.txt
AN_Apple_555a_20190513.txt
And it should apply for all files present in directory,
below is the command i'm trying which is not working
for in in aaal*
do
out=${in#*_}
out=${out%_*_*_*}
out=${out%[0-9]}
out1=${out#*_}
out2=${out%_*}
AAAI_$out1$out2.txt
done
This script is simple, but worked with your sample:
#!/bin/bash
for i in AN*; do
NAME=$(echo $i | awk -F_ '{printf "%s_%s%s_%s_%s", $1,toupper( substr( $3,1,1)),(substr($3,2,100)),$2,$4,$5}')
echo "--> $NAME"
done
An interesting solution for this case is to use sed, just like this:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/e'
Note the final e at the end of the sed command. It tells sed to execute the result of the substitution as a bash command.
So if you remove the e (which you could do at first, to check the substitution works as expected), you would get in the console:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/'
mv "AN_555a_orange_20190513.txt" "AN_Orange_555a_20190513.txt"
mv "AN_555b_apple_20190513.txt" "AN_Apple_555b_20190513.txt"
(The sed substitution matches the several groups of characters, reorders them and creates the mv ... ... line. Note that & in the replacement pattern denotes the whole pattern matched, and \u tells sed to put the next character as upper case.)
Then add back that final e, and instead of printing these lines sed will execute them, effectively renaming the files.
This onliner could give you more idas:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt)
This will print something like:
mv AN_555a_orange_20190513.txt AN_Orange_555a_20190513.txt
mv AN_555b_apple_20190513.txt AN_Apple_555b_20190513.txt
Then if are happy with the results, pipe it to sh for example:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt) | sh

Iterative Bash Script Bug

Using a bash script, I'm trying to iterate through a text file that only has around 700 words, line-by-line, and run a case-insensitive grep search in the current directory using that word on particular files. To break it down, I'm trying to output the following to a file:
Append a newline to a file, then the searched word, then another newline
Append the results of the grep command using that search
Repeat steps 1 and 2 until all words in the list are exhausted
So for example, if I had this list.txt:
search1
search2
I'd want the results.txt to be:
search1:
grep result here
search2:
grep result here
I've found some answers throughout the stack exchanges on how to do this and have come up with the following implementation:
#!/usr/bin/bash
while IFS = read -r line;
do
"\n$line:\n" >> "results.txt";
grep -i "$line" *.in >> "results.txt";
done < "list.txt"
For some reason, however, this (and the numerous variants I've tried) isn't working. Seems trivial, but I'd it's been frustrating me beyond belief. Any help is appreciated.
Your script would work if you changed it to:
while IFS= read -r line; do
printf '\n%s:\n' "$line"
grep -i "$line" *.in
done < list.txt > results.txt
but it'd be extremely slow. See https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for why you should think long and hard before writing a shell loop just to manipulate text. The standard UNIX tool for manipulating text is awk:
awk '
NR==FNR { words2matches[$0]; next }
{
for (word in words2matches) {
if ( index(tolower($0),tolower(word)) ) {
words2matches[word] = words2matches[word] $0 ORS
}
}
}
END {
for (word in words2matches) {
print word ":" ORS words2matches[word]
}
}
' list.txt *.in > results.txt
The above is untested of course since you didn't provide sample input/output we could test against.
Possible problems:
bash path - use /bin/bash path instead of /usr/bin/bash
blank spaces - remove ' ' after IFS
echo - use -e option for handling escape characters (here: '\n')
semicolons - not required at end of line
Try following script:
#!/bin/bash
while IFS= read -r line; do
echo -e "$line:\n" >> "results.txt"
grep -i "$line" *.in >> "results.txt"
done < "list.txt"
You do not even need to write a bash script for this purpose:
INPUT FILES:
$ more file?.in
::::::::::::::
file1.in
::::::::::::::
abc
search1
def
search3
::::::::::::::
file2.in
::::::::::::::
search2
search1
abc
def
::::::::::::::
file3.in
::::::::::::::
abc
search1
search2
def
search3
PATTERN FILE:
$ more patterns
search1
search2
search3
CMD:
$ grep -inf patterns file*.in | sort -t':' -k3 | awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}'
OUTPUT:
search1
file1.in:2
file2.in:2
file3.in:2
search2
file2.in:1
file3.in:3
search3
file1.in:4
file3.in:5
EXPLANATIONS:
grep -inf patterns file*.in will grep all the file*.in with all the patterns located in patterns file thanks to -f option, using -i forces insensitive case, -n will add the line numbers
sort -t':' -k3 you sort the output with the 3rd column to regroup patterns together
awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}' then awk will print the display that you want by using : as Field Separator and Output Field Separator, you use a buffer variable to save the pattern (3rd field) and you print the pattern whenever it changes ($3!=buffer)

Copy first row to the last in file

The purpose here is to copy the first row in the file to the last
Here the input file
335418.75,2392631.25,36091,38466,1
335418.75,2392643.75,36092,38466,1
335418.75,2392656.25,36093,38466,1
335418.75,2392668.75,36094,38466,1
335418.75,2392681.25,36095,38466,1
335418.75,2392693.75,36096,38466,1
335418.75,2392706.25,36097,38466,1
335418.75,2392718.75,36098,38466,1
335418.75,2392731.25,36099,38466,1
Using the following code i got the output desired. Is there other easy option?
awk 'NR==1 {print}' FF1-1.csv > tmp1
cat FF1-1.csv tmp1
Output desired
335418.75,2392631.25,36091,38466,1
335418.75,2392643.75,36092,38466,1
335418.75,2392656.25,36093,38466,1
335418.75,2392668.75,36094,38466,1
335418.75,2392681.25,36095,38466,1
335418.75,2392693.75,36096,38466,1
335418.75,2392706.25,36097,38466,1
335418.75,2392718.75,36098,38466,1
335418.75,2392731.25,36099,38466,1
335418.75,2392631.25,36091,38466,1
Thanks in advance.
Save the line in a variable and print at end using the END block
$ seq 5 | awk 'NR==1{fl=$0} 1; END{print fl}'
1
2
3
4
5
1
headcan produce the same output as your awk, so you can cat that instead.
You can use process substitution to avoid the temporary file.
cat FF1-1.csv <(head -n 1 FF1-1.csv)
As mentionned by Sundeep if process substitution isn't available you can simply cat the file then head it sequentially to obtain the same result, putting both in a subshell if you need to redirect the output :
(cat FF1-1.csv; head -n1 FF1-1.csv) > dest
Another alternative would be to pipe the output of head to cat and refer to it with - which for cat represents standard input :
head -1 FF1-1.csv | cat FF1-1.csv -
When you want to overwrite the existing, normal solutions can fail: do not write to a file you are working with.
A solution for editing the file is:
printf "%s\n" 1y $ x w q | ed -s file > /dev/null
Explanation:
printf will help for entering all commands in new lines.
1y will put the first line in a buf.
$ moves to the last line.
x will paste the contents of the buf.
w will write the results.
q will quit the editor.
ed is the editor that performs all work.
-s is suppressing diagnostics.
file is your input file.
> /dev/null is suppressing output to your screen.
With GNU sed:
seq 1 5 | sed '1h;$G'
Output:
1
2
3
4
5
1
1h: In first row: copy current row (pattern space) to sed's hold space
$G: In last row ($): append content from hold space to pattern space
See: man sed
Following solution may also help on same:
Solution 1st: Simply using awk with using RS and FS here(without using variables):
awk -v RS="" -v FS="\n" '{print $0 ORS $1}' Input_file
Solution 2nd: Using cat and head:
cat Input_file && head -n1 Input_file

Set an external variable in awk

I have written a script in which I want to count the number of columns in data.txt . My problem is I am unable to set the x in awk script.
Any help would be highly appreciated.
while read p; do
x=1;
echo $p | awk -F' ' '{x=NF}'
echo $x;
file="$x"".txt";
echo $file;
done <$1
data.txt file:
4495125 94307025 giovy115p#live.it 94307025.094307025 12443
stazla deva1a23#gmail.com 1992/.:\1
1447585 gioao_87#hotmail.it h1st#1
saknit tomboro#seznam.cz 1233 1990
Expected output:
5.txt
3.txt
3.txt
4.txt
My output:
1.txt
1.txt
1.txt
1.txt
You just cannot import variable set in Awk to a shell context. In your example the value set inside x containing NF will be not reflected outside.
Either you need to use command substitution($(..)) syntax to get the value of NF and use it later
x=$(echo "$p" | awk '{print NF}')
Now x will contain the column count in each of the line. Note that you don't need to use -F' ' which is the default de-limiter in awk.
Besides your requirement can be fully done in Awk itself.
awk 'NF{print NF".txt"}' file
Here the NF{..} is to ensure that the actions inside {..} are applied only to non-empty rows. The for each row we print the length and append the extension .txt along with it.
Awk processes a line at a time -- processing each line in a separate Awk script inside a shell while read loop is horrendously inefficient. See also https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Maybe something like this:
awk '{ print >(NF ".txt") }' data.txt
to create a file with the five-column rows in 5.txt, the four-column ones in 4.txt, the three-column rows in 2.txt, etc for each unique column count.
The Awk variable NF contains the number of fields (by default, Awk splits fields on runs of whitespace -- use -F to change to some other separator) and the expression (NF ".txt") simply produces a string catenation of the number of fields with the suffix .txt which we pass as a file name to the print redirection.
With bash:
while read p; do p=($p); echo "${#p[#]}.txt"; done < file
or shorter:
while read -a p; do echo "${#p[#]}.txt"; done < file
Output:
5.txt
3.txt
3.txt
4.txt

Linux: Extract string from a line including delimiter character using sed command [duplicate]

For example
echo "abc-1234a :" | grep <do-something>
to print only abc-1234a
I think these are closer to what you're getting at, but without knowing what you're really trying to achieve, it's hard to say.
echo "abc-1234a :" | egrep -o '^[^:]+'
... though this will also match lines that have no colon. If you only want lines with colons, and you must use only grep, this might work:
echo "abc-1234a :" | grep : | egrep -o '^[^:]+'
Of course, this only makes sense if your echo "abc-1234a :" is an example that would be replace with possibly multiple lines of input.
The smallest tool you could use is probably cut:
echo "abc-1234a :" | cut -d: -f1
And sed is always available...
echo "abc-1234a :" | sed 's/ *:.*//'
For this last one, if you only want to print lines that include a colon, change it to:
echo "abc-1234a :" | sed -ne 's/ *:.*//p'
Heck, you could even do this in pure bash:
while read line; do
field="${line%%:*}"
# do stuff with $field
done <<<"abc-1234a :"
For information on the %% bit, you can man bash and search for "Parameter Expansion".
UPDATE:
You said:
It's the characters in the first line of input before the colon. The
input could have multiple line though.
The solutions with grep probably aren't your best choice, then, since they'll also print data from subsequent lines that might include colons. Of course, there are many ways to solve this requirement as well. We'll start with sample input:
$ function sample { printf "abc-1234a:foo\nbar baz:\nNarf\n"; }
$ sample
abc-1234a:foo
bar baz:
Narf
You could use multiple pipes, for example:
$ sample | head -1 | grep -Eo '^[^:]*'
abc-1234a
$ sample | head -1 | cut -d: -f1
abc-1234a
Or you could use sed to process only the first line:
$ sample | sed -ne '1s/:.*//p'
abc-1234a
Or tell sed to exit after printing the first line (which is faster than reading the whole file):
$ sample | sed 's/:.*//;q'
abc-1234a
Or do the same thing but only show output if a colon was found (for safety):
$ sample | sed -ne 's/:.*//p;q'
abc-1234a
Or have awk do the same thing (as the last 3 examples, respectively):
$ sample | awk '{sub(/:.*/,"")} NR==1'
abc-1234a
$ sample | awk 'NR>1{nextfile} {sub(/:.*/,"")} 1'
abc-1234a
$ sample | awk 'NR>1{nextfile} sub(/:.*/,"")'
abc-1234a
Or in bash, with no pipes at all:
$ read line < <(sample)
$ printf '%s\n' "${line%%:*}"
abc-1234a
It is possible to do what you want with only sed.
Here is an example:
#!/bin/sh
filename=$1
pattern=yourpattern
# flag -n disables print everyline (default behavior)
sed -n "
1,/$pattern/ {
/$pattern/n # skip line containing pattern
p # print lines ranging from line 1 untill pattern
}
" $filename
exit 0
This works at least for GNU's sed. It should work for other sed too, except
regarding the comments (some implementations of sed don't support comments).
Source: https://www.grymoire.com/Unix/Sed.html

Resources