Move line(s) to follow another line in a file - linux
I got a file that has a line in the file like this:
check=('78905905f5a4ed82160c327f3fd34cba')
I'd like to be able to move this line to follow a line that looks like this:
files=('somefile.txt')
The array though at times that can span multiple lines, for example:
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...')
text
in between
check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
The array/line always ends in a ) and no text in between will contain a closed parenthesis.
I got some advice that awk can do this:
awk '/files/{
f=0
print $0
for(i=1;i<=d;i++){ print a[i] }
g=0
delete a # remove array after found
next
}
/check/{ f=1; g=1 }
f{ a[++d]=$0 }
!g' file
This will only span one line though. I was told to expand the search:
awk '/source/ && /\)$/{
f=0
print $0
for(i=1;i<=d;i++){ print a[i] }
g=0
delete a # remove array after found
next
}
/md5sum/ && /\)$/{ f=1; g=1 }
f{ a[++d]=$0 }
!g'
Just learning awk so I'd appreciate help with this. Or if there is another tool that can do this, I'd like to hear about it. Someone told me that 'ed' these types of capabilities.
To answer your last question first, yes, awk is the typical Unix tool for this, other candidates are the incredibly powerful Perl, Python, or .. my favorite .. Ruby. One advantage of awk is that it's always there; it's part of the base system. Another way to solve this kind of problem is with an editor script that controls ed(1) or ex(1).
Ok, new program for the revised question. This program will move the "check" lines either up or down as necessary so that they follow the "files" lines.
BEGIN {
checkAt = 0
filesAt = 0
scanning = 0
}
/check=\(/ {
checkAt = NR
scanning = 1
}
/files=\(/ {
filesAt = NR
scanning = 1
}
/)$/ {
if (scanning) {
if (checkAt > filesAt) {
checkEnd = NR
} else {
filesEnd = NR
}
scanning = 0
}
}
{
lines[NR] = $0
}
END {
for (i = 1; i <= NR; ++i) {
if (checkAt <= i && i <= checkEnd) {
continue
}
print lines[i]
if (i == filesEnd) {
for (j = checkAt; j <= checkEnd; ++j) {
print lines[j]
}
}
}
}
I looked in to doing this with Awk, but it looked like you wouldn't really get anything clever out of it, it would just be the same logic, but with some Awk pain to go with it, so I did it in Perl :)
#!/usr/bin/perl
open(IN, $ARGV[0]) || die("Could not open file: " . $ARGV[0]);
my $buffer="";
foreach $line (<IN>) {
if ($line =~ /^check=/) {
$flag = 1;
$buffer .= $line;
} elsif ($flag == 1 && $line =~/\)/) {
$flag = 0;
$buffer .= $line;
} elsif ($flag == 1) {
$buffer .= $line;
} elsif ($flag == 0 && $line =~ /^files=/) {
$flag = 2;
print $line;
} elsif ($flag == 2 && $line =~ /\)/) {
$flag = 0;
print $line;
if (length($buffer) > 0) {
print $buffer;
$buffer = "";
}
} else {
print $line;
}
}
And the output :)
Chill:~ rus$ cat test check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...')
asdasdasd
check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...')
asdsd
check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...')
Chill:~ rus$ ./t.pl test
text in between
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...') check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
asdasdasd
text in between
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...') check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
asdsd
text in between
files=('somefile.txt'
'file2.png'
'another.txt'
'andanother...') check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
ta da ?! :D
Here's how to do it with sed:
sed -e /^check=(/,/)/{H;d} -e /)/{G;s/\n//} < filename
This assumes that there are no right parentheses after the "files=..." If there are then you'll need more precision:
sed -e /^check=(/,/)/{H;d} -e /^files=(/,/)/{/)/{G;s/\n//}} < filename
EDIT:
Working in bash? All right, try this:
sed -e /^check=(/,/)/H -e /^check=(/,/)/d -e '/)/G;s/\n//' < filename
This seems to work, but it's not clear to me why this variant and not a few other obvious ones. This dance-of-the-special-characters is always a problem with regexs.
#todd, I seem to have left you in the lurch after providing you the awk solution haven't i. ? :).
here's another method, this time not using method of flags. there are some loose ends (hint: check the patterns p,q and output again) that i leave it to you to tidy up.
gawk 'BEGIN{
RS="check=[(]"
q="files=(.*\047)" # pattern to replace files= part
p=".*(files=(.*\047)).*" # to get the whole files= part to variable
}
NR>1{
b=gensub(p, "\\1","g",$0) # get the files=part to var b
printf "%s\n\n",b
printf "check=("
gsub(q,"",$0)
print $0
}' file
NB: gensub is specific to gawk so if you have gawk, then that's alright
output
$ more file
check=('5277a9164001a4276837b59dade26af2'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between one
files=('somefile1.txt'
'file1.png'
'another1.txt'
'andanother1...')
asdasdasd blah blah
check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between two
files=('somefile2.txt'
'file2.png'
'another2.txt'
'andanother2...')
asdsd blaasdf aslasdfaslj aslfjsldfsa 123e12
check=('78905905fblah blah5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between
files=('somefile3.txt'
'file3.png'
'another3.txt'
'andanother3...')
$ ./shell.sh
files=('somefile1.txt'
'file1.png'
'another1.txt'
'andanother1...'
check=('5277a9164001a4276837b59dade26af2'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between one
)
asdasdasd blah blah
files=('somefile2.txt'
'file2.png'
'another2.txt'
'andanother2...'
check=('78905905f5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between two
)
asdsd blaasdf aslasdfaslj aslfjsldfsa 123e12
files=('somefile3.txt'
'file3.png'
'another3.txt'
'andanother3...'
check=('78905905fblah blah5a4ed82160c327f3fd34cba'
'5277a9164001a4276837b59dade26af2'
'3f8b60b6fbb993c18442b62ea661aa6b')
text in between
)
This might work for you:
sed ':a;$!N;/^files=.*\ncheck=/{/.*)$/!ba;s/\([^)]*)\)\(.*\)\(\ncheck=.*\)/\1\3\2/p;d};/^files=.*/ba;P;D' file
Related
How to delete a block of text in a file Shell Script?
I have the following scenario I have a block of text and example basketball: ball: round being that I don't know exactly what's inside basketball: but I like to delete everything inside it example: men: height: 170 athlete: basketball women: height:180 athlete: basketball I want to delete only the men block ignoring whatever is above or below this key
The AWK script filter.awk below removes all men sections which contains basketball. Is that what you mean? Run with awk -f filter.awk input.txt. /^[A-Za-z0-9]/ { if (sectionWanted) { printf "%s", section } sectionWanted = 1 section = "" sectionName = $1 } /basketball/ && sectionName == "men:" { sectionWanted = 0 } { section = section $0 "\n" } END { if (sectionWanted) { printf "%s", section } }
Split string to fixed length chunks and write in separate line in Raku
I have a file test.txt: Stringsplittingskills I want to read this file and write to another file out.txt with three characters in each line like Str ing spl itt ing ski lls What I did my $string = "test.txt".IO.slurp; my $start = 0; my $elements = $string.chars; # open file in writing mode my $file_handle = "out.txt".IO.open: :w; while $start < $elements { my $line = $string.substr($start,3); if $line.chars == 3 { $file_handle.print("$line\n") } elsif $line.chars < 3 { $file_handle.print("$line") } $start = $start + 3; } # close file handle $file_handle.close This runs fine when the length of string is not multiple of 3. When the string length is multiple of 3, it inserts extra newline at the end of output file. How can I avoid inserting new line at the end when the string length is multiple of 3? I tried another shorter approach, my $string = "test.txt".IO.slurp; my $file_handle = "out.txt".IO.open: :w; for $string.comb(3) -> $line { $file_handle.print("$line\n") } Still it suffers from same issue. I looked for here, here but still unable to solve it.
spurt "out.txt", "test.txt".IO.comb(3).join("\n")
Another approach using substr-rw. subset PositiveInt of Int where * > 0; sub break( Str $str is copy, PositiveInt $length ) { my $i = $length; while $i < $str.chars { $str.substr-rw( $i, 0 ) = "\n"; $i += $length + 1; } $str; } say break("12345678", 3); Output 123 456 78
The correct answer is of course to use .comb and .join. That said, this is how you might fix your code. You could change the if line to check if it is at the end, and use else. if $start+3 < $elements { $file_handle.print("$line\n") } else { $file_handle.print($line) } Personally I would change it so that only the addition of \n is conditional. while $start < $elements { my $line = $string.substr($start,3); $file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) )); $start += 3; } This works because < returns either True or False. Since True == 1 and False == 0, the x operator repeats the \n at most once. 'abc' x 1; # 'abc' 'abc' x True; # 'abc' 'abc' x 0; # '' 'abc' x False; # '' If you were very cautious you could use x+?. (Which is actually 3 separate operators.) 'abc' x 3; # 'abcabcabc' 'abc' x+? 3; # 'abc' infix:« x »( 'abc', prefix:« + »( prefix:« ? »( 3 ) ) ); I would probably use loop if I were going to structure it like this. loop ( my $start = 0; $start < $elements ; $start += 3 ) { my $line = $string.substr($start,3); $file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) )); } Or instead of adding a newline to the end of each line, you could add it to the beginning of every line except the first. while $start < $elements { my $line = $string.substr($start,3); my $nl = "\n"; # clear $nl the first time through once $nl = ""; $file_handle.print($nl ~ $line); $start = $start + 3; }
At the command line prompt, three one-liner solutions below. Using comb and batch (retains incomplete set of 3 letters at end): ~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.batch(3);' Str ing spl itt ing ski lls X Simplifying (no batch, only comb): ~$ echo 'StringsplittingskillsX' | perl6 -ne '.put for .comb(3);' Str ing spl itt ing ski lls X Alternatively, using comb and rotor (discards incomplete set of 3 letters at end): ~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.rotor(3);' Str ing spl itt ing ski lls
how to iterate over two sets of data?
I'm trying to create my own program to do a recursive listing: each line corresponds to the full path of a single file. The tricky part I'm working on now is: I don't want bind mounts to trick my program into listing files twice. So I already have a program that produces the right output except that if /foo is bind mounted to /bar then my program incorrectly lists /foo/file /bar/file I need the program to list just what's below (EDIT: even if it was asked to list the contents of /foo) /bar/file One approach I thought of is to mount | grep bind | awk '{print $1 " " $3}' and then iterate over this to sed every line of the output, then sort -u. My question is how do I iterate over the original output (a bunch of lines) and the output from mount (another bunch of lines)? (or is there a better approach) This needs to be POSIX (EDIT: and work with /bin/sh)
Place the 'mount | grep bind' command into the AWK within a BEGIN block and store the data. Something like: PROG | awk 'BEGIN{ # Define the data you want to store # Assign to global arrays command = "mount | grep bind"; while ((command | getline) > 0) { count++; mount[count] = $1; mountPt[count] = $3 } } # Assuming input is line-by-line and that mountPt is the value # that is undesired { replaceLine=0 for (i=1; i<=count; i++) { idx = index($1, mountPt[i]); if (idx == 1) { replaceLine = 1; break; } } if (replaceLine == 1) { sub(mountPt[i], mount[i], $1); } if (printed[$1] != 1) { print $1; } printed[$1] = 1; } ' Where I assume your current program, PROG, outputs to stdout.
find YourPath -print > YourFiles.txt mount > Bind.txt awk 'FNR == NR && $0 ~ /bind/ { Bind[ $1] = $3 if( ( ThisLevel = split( $3, Unused, "/") - 1 ) > Level) Level = ThisLevel } FNR != NR && $0 !~ /^ *$/ { RealName = $0 for( ThisLevel = Level; ThisLevel > 0; ThisLevel--){ match( $0, "(/[^/]*){" ThisLevel "}" ) UnBind = Bind[ substr( $0, 1, RLENGTH) ] if( UnBind !~ /^$/) { RealName = UnBind substr( $0, RLENGTH + 1) ThisLevel = 0 } } if( ! File[ RealName]++) print RealName } ' Bind.txt YourFiles.txt search based on a exact path/bind comparaison from a bind array loaded first Bind.txt and YourFiles.txt could be a direct redirection to be "1" instruction and no temporary files have to be adapted (first part of awk) if path in bind are using space character (assume not here) file path are changed live when reading, compare to an existing bind relation print file if not yet known
remove a line with special character with given pattern
I'm trying to get the lines with special characters which is not prefixed with \. Below are the special characters: ^$%.*+?!(){}[]|\ I need to check all the above special characters which is not prefixed with \ in 2nd column. I'm trying with awk to complete this, but no luck. I want the output as below. input.txt 1,ap^ple 2,o$range 3,bu+tter 4,gr(ape 5,sm\(ok\e 6,ra\in 7,p+la\\y 8,wor\+k output.txt 1,ap^ple 2,o$range 3,bu+tter 4,gr(ape 5,sm\(ok\e 6,ra\in 7,p+la\\y 7th row and 5 row are in output.txt because there is 2 special charcters(one is with backslash another without backslash)
"final" final edit: I wanted to allow "\x" whatever x is, but the OP seems to not want that, so I fixed it too. After trying to find a "clever" regexp (which choked on "\\" or any impair number of "\", but apparently worked for the rest...) I re-wrote it in awk to do it in a "state automata" way: The idea: If in "normal mode", we encounter a special char other than "\" ? : we print the line! If in "normal mode", we encounter a "\" ? : we enter "escaped mode", and in that mode, ignore the next char (but if we don't have a next char, we need to print that line too!) the script: awk -F"," ' { IN_ESCAPED_MODE=0 ; for (i=1 ; i<=length($2) ; i++) { char=substr($2,i,1) if ( IN_ESCAPED_MODE == 0) { if ( index(".^$%*+?!(){}[]|",char) > 0 ) { print $0 ; break ; } if ( index("\\" , char ) > 0 ) { IN_ESCAPED_MODE=1 ; continue ; } } if ( IN_ESCAPED_MODE == 1) { if ( index(".^$%*+?!(){}[]|\\",char) > 0 ) { IN_ESCAPED_MODE=0 ; continue ; } else { IN_ESCAPED_MODE=0 ; print $0; break; } } } if (IN_ESCAPED_MODE == 1) { print $0 ; break ; } } ' input.txt > output.txt With this change, you will have the same output as the OP, which prints a line when it contains "\e" for example... Which I find weird: to me "\e" is fine, we can "escape" anything? With that input: 1,ap^ple 2,o$range 3,bu+tter 4,gr(ape 5,sm\(ok\e 6,ra\in 7,p+la\\y 8,wor\+k 10,\ 11,\\ 12,\\\ 13,. 14,\. 15,.. 16,^ 17,\^ 18,$ 19,\$ 20,% 21,\% 22,* 23,\* 24,+ 25,\+ 26,? 27,\? 28,! 29,\! 30,( 31,\( 32,) 33,\) 34,{ 35,\{ 36,} 37,\} 38,[ 39,\[ 40,] 41,\] 42,| 43,\| it outputs: 1,ap^ple 2,o$range 3,bu+tter 4,gr(ape 5,sm\(ok\e 6,ra\in 7,p+la\\y 10,\ 12,\\\ 13,. 15,.. 16,^ 18,$ 20,% 22,* 24,+ 26,? 28,! 30,( 32,) 34,{ 36,} 38,[ 40,] 42,| (so it appears to really work this time !) If you prefer to allow any "\x" and NOT only if "x" is a SPECIAL char: change the "middle lines": if ( IN_ESCAPED_MODE == 1) { if ( index(".^$%*+?!(){}[]|\\",char) > 0 ) { IN_ESCAPED_MODE=0 ; continue ; } else { IN_ESCAPED_MODE=0 ; print $0; break; } } into: if ( IN_ESCAPED_MODE == 1) { IN_ESCAPED_MODE=0 ; continue ; } for historical reason : the regexp (which worked in "most" cases but choked in some, for example if there was "\\") : egrep '[^\][].^$%*+?!(){}[|]|[^\][\][^].^$%*+?!(){}[|\]' input.txt > output.txt But that one will not display the line 12, for example... A good read: http://www.regular-expressions.info/charclass.html .... and http://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html (scary ...)
You can try the following: awk ' { line=$0 sub(/\\[\^$%.*+?!(){}\[\]|\\]/,"") if(/[\^$%.*+?!(){}\[\]|\\]/) print line }' input.txt
sed '/[]\\^$%.*+?!(){}[|]/ { h s/\\[]\\^$%.*+?!(){}[|]/_/g /[]\\^$%.*+?!(){}[|]/ { x p } }' YourFile Depending of shell and sed could be interpreted (especialy the \) differently. Works on my AIX/KSH
Why is my word frequency counter example written in Perl failing to produce useful output?
I am very new to Perl, and I am trying to write a word frequency counter as a learning exercise. However, I am not able to figure out the error in my code below, after working on it. This is my code: $wa = "A word frequency counter."; #wordArray = split("",$wa); $num = length($wa); $word = ""; $flag = 1; # 0 if previous character was an alphabet and 1 if it was a blank. %wordCount = ("null" => 0); if ($num == -1) { print "There are no words.\n"; } else { print "$length"; for $i (0 .. $num) { if(($wordArray[$i]!=' ') && ($flag==1)) { # start of a new word. print "here"; $word = $wordArray[$i]; $flag = 0; } elsif ($wordArray[$i]!=' ' && $flag==0) { # continuation of a word. $word = $word . $wordArray[$i]; } elsif ($wordArray[$i]==' '&& $flag==0) { # end of a word. $word = $word . $wordArray[$i]; $flag = 1; $wordCount{$word}++; print "\nword: $word"; } elsif ($wordArray[$i]==" " && $flag==1) { # series of blanks. # do nothing. } } for $i (keys %wordCount) { print " \nword: $i - count: $wordCount{$i} "; } } It's neither printing "here", nor the words. I am not worried about optimization at this point, though any input in that direction would also be much appreciated.
This is a good example of a problem where Perl will help you work out what's wrong if you just ask it for help. Get used to always adding the lines: use strict; use warnings; to the top of your Perl programs.
Fist off, $wordArray[$i]!=' ' should be $wordArray[$i] ne ' ' according to the Perl documentation for comparing strings and characters. Basically use numeric operators (==, >=, …) for numbers, and string operators for text (eq, ne, lt, …). Also, you could do #wordArray = split(" ",$wa); instead of #wordArray = split("",$wa); and then #wordArray wouldn't need to do the wonky character checking and you never would have had the problem. #wordArray will be split into the words already and you'll just have to count the occurrences.
You seem to be writing C in Perl. The difference is not just one of style. By exploding a string into a an array of individual characters, you cause the memory footprint of your script to explode as well. Also, you need to think about what constitutes a word. Below, I am not suggesting that any \w+ is a word, rather pointing out the difference between \S+ and \w+. #!/usr/bin/env perl use strict; use warnings; use YAML; my $src = '$wa = "A word frequency counter.";'; print Dump count_words(\$src, 'w'); print Dump count_words(\$src, 'S'); sub count_words { my $src = shift; my $class = sprintf '\%s+', shift; my %counts; while ($$src =~ /(?<sequence> $class)/gx) { $counts{ $+{sequence} } += 1; } return \%counts; } Output: --- A: 1 counter: 1 frequency: 1 wa: 1 word: 1 --- '"A': 1 $wa: 1 =: 1 counter.";: 1 frequency: 1 word: 1