how to iterate over two sets of data?

how to iterate over two sets of data? - linux

I'm trying to create my own program to do a recursive listing: each line corresponds to the full path of a single file. The tricky part I'm working on now is: I don't want bind mounts to trick my program into listing files twice.
So I already have a program that produces the right output except that if /foo is bind mounted to /bar then my program incorrectly lists
/foo/file
/bar/file
I need the program to list just what's below (EDIT: even if it was asked to list the contents of /foo)
/bar/file
One approach I thought of is to mount | grep bind | awk '{print $1 " " $3}' and then iterate over this to sed every line of the output, then sort -u.
My question is how do I iterate over the original output (a bunch of lines) and the output from mount (another bunch of lines)? (or is there a better approach) This needs to be POSIX (EDIT: and work with /bin/sh)

Place the 'mount | grep bind' command into the AWK within a BEGIN block and store the data.
Something like:
PROG | awk 'BEGIN{
# Define the data you want to store
# Assign to global arrays
command = "mount | grep bind";
while ((command | getline) > 0) {
count++;
mount[count] = $1;
mountPt[count] = $3
}
}
# Assuming input is line-by-line and that mountPt is the value
# that is undesired
{
replaceLine=0
for (i=1; i<=count; i++) {
idx = index($1, mountPt[i]);
if (idx == 1) {
replaceLine = 1;
break;
}
}
if (replaceLine == 1) {
sub(mountPt[i], mount[i], $1);
}
if (printed[$1] != 1) {
print $1;
}
printed[$1] = 1;
} '
Where I assume your current program, PROG, outputs to stdout.

find YourPath -print > YourFiles.txt
mount > Bind.txt
awk 'FNR == NR && $0 ~ /bind/ {
Bind[ $1] = $3
if( ( ThisLevel = split( $3, Unused, "/") - 1 ) > Level) Level = ThisLevel
}
FNR != NR && $0 !~ /^ *$/ {
RealName = $0
for( ThisLevel = Level; ThisLevel > 0; ThisLevel--){
match( $0, "(/[^/]*){" ThisLevel "}" )
UnBind = Bind[ substr( $0, 1, RLENGTH) ]
if( UnBind !~ /^$/) {
RealName = UnBind substr( $0, RLENGTH + 1)
ThisLevel = 0
}
}
if( ! File[ RealName]++) print RealName
}
' Bind.txt YourFiles.txt
search based on a exact path/bind comparaison from a bind array loaded first
Bind.txt and YourFiles.txt could be a direct redirection to be "1" instruction and no temporary files
have to be adapted (first part of awk) if path in bind are using space character (assume not here)
file path are changed live when reading, compare to an existing bind relation
print file if not yet known

Related

Any one give me a solution for SORT

I want to sort data from shortest to longest line ,the data contains
space ,character ,number,-,","
,i use sort -n ,but it did not solve the job.many thanks for help
Data here
0086
0086-
0086---
0086-------
0086-1358600966
0086-18868661318
00860
00860-13081022659
00860-131111111
00860-13176880028
00860-13179488252
00860-18951041771
00861
008629-83023520
0086000
0086010-61281306
and the rerult i want is
0086
0086-
00860
00861
0086000
0086---
0086-------
0086-1358600966
00860-131111111
008629-83023520
0086-18868661318
0086010-61281306
00860-13081022659
00860-13176880028
00860-13179488252
00860-18951041771
I do not care what characters ,just from short to long .2 lines with the same long can exchange ,it is not a problem .many thanks

Perl one-liner
perl -0777 -ne 'print join("\n", map {$_->[1]} sort {$a->[0] <=> $b->[0]} map {[length, $_]} split /\n/), "\n"' file
Explanation on demand.
With GNU awk, it's very simple:
gawk '
{len[$0] = length($0)}
END {
PROCINFO["sorted_in"] = "#val_num_asc"
for (line in len) print line
}
' file
See https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html#Controlling-Scanning

Just try this once, May be it will help you.
awk '{ print length($0) " " $0; }' $file | sort -n | cut -d ' ' -f 2-
the -r option was for reversing the sort.

Using awk:
#!/usr/bin/awk -f
(l = length($0)) && !($0 in nextof) {
if (l in start) {
nextof[$0] = start[l]
} else {
if (!max || l > max) max = l
if (!min || l < min) min = l
nextof[$0] = 0
}
start[l] = $0
++count[l]
}
END {
for (i = min; i <= max; ++i) {
if (j = count[i]) {
t = start[i]
print t
while (--j) {
t = nextof[t]
print t
}
}
}
}
Usage:
awk -f script.awk file
Output:
0086
00861
00860
0086-
0086000
0086---
0086-------
008629-83023520
00860-131111111
0086-1358600966
0086010-61281306
0086-18868661318
00860-18951041771
00860-13179488252
00860-13176880028
00860-13081022659
Another Version:
#!/usr/bin/awk -f
(l = length($0)) && !($0 in nextof) {
if (l in start) {
nextof[lastof[l]] = $0
} else {
if (!max || l > max) max = l
if (!min || l < min) min = l
start[l] = $0
}
lastof[l] = $0
++count[l]
}
END {
for (i = min; i <= max; ++i) {
if (j = count[i]) {
t = start[i]
print t
while (--j) {
t = nextof[t]
print t
}
}
}
}
Output:
0086
0086-
00860
00861
0086---
0086000
0086-------
0086-1358600966
00860-131111111
008629-83023520
0086-18868661318
0086010-61281306
00860-13081022659
00860-13176880028
00860-13179488252
00860-18951041771

remove a line with special character with given pattern

I'm trying to get the lines with special characters which is not prefixed with \. Below are the special characters:
^$%.*+?!(){}[]|\
I need to check all the above special characters which is not prefixed with \ in 2nd column. I'm trying with awk to complete this, but no luck. I want the output as below.
input.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
output.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
7th row and 5 row are in output.txt because there is 2 special charcters(one is with backslash another without backslash)

"final" final edit: I wanted to allow "\x" whatever x is, but the OP seems to not want that, so I fixed it too.
After trying to find a "clever" regexp (which choked on "\\" or any impair number of "\", but apparently worked for the rest...)
I re-wrote it in awk to do it in a "state automata" way:
The idea:
If in "normal mode", we encounter a special char other than "\" ? : we print the line!
If in "normal mode", we encounter a "\" ? : we enter "escaped mode", and in that mode, ignore the next char
(but if we don't have a next char, we need to print that line too!)
the script:
awk -F"," '
{
IN_ESCAPED_MODE=0 ;
for (i=1 ; i<=length($2) ; i++)
{ char=substr($2,i,1)
if ( IN_ESCAPED_MODE == 0)
{ if ( index(".^$%*+?!(){}[]|",char) > 0 )
{ print $0 ; break ;
}
if ( index("\\" , char ) > 0 )
{ IN_ESCAPED_MODE=1 ; continue ;
}
}
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
}
if (IN_ESCAPED_MODE == 1)
{
print $0 ; break ;
}
}
' input.txt > output.txt
With this change, you will have the same output as the OP, which prints a line when it contains "\e" for example... Which I find weird: to me "\e" is fine, we can "escape" anything?
With that input:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
10,\
11,\\
12,\\\
13,.
14,\.
15,..
16,^
17,\^
18,$
19,\$
20,%
21,\%
22,*
23,\*
24,+
25,\+
26,?
27,\?
28,!
29,\!
30,(
31,\(
32,)
33,\)
34,{
35,\{
36,}
37,\}
38,[
39,\[
40,]
41,\]
42,|
43,\|
it outputs:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
10,\
12,\\\
13,.
15,..
16,^
18,$
20,%
22,*
24,+
26,?
28,!
30,(
32,)
34,{
36,}
38,[
40,]
42,|
(so it appears to really work this time !)
If you prefer to allow any "\x" and NOT only if "x" is a SPECIAL char:
change the "middle lines":
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
into:
if ( IN_ESCAPED_MODE == 1)
{ IN_ESCAPED_MODE=0 ; continue ;
}
for historical reason : the regexp (which worked in "most" cases but choked in some, for example if there was "\\") :
egrep '[^\][].^$%*+?!(){}[|]|[^\][\][^].^$%*+?!(){}[|\]' input.txt > output.txt
But that one will not display the line 12, for example...
A good read: http://www.regular-expressions.info/charclass.html .... and http://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html (scary ...)

You can try the following:
awk '
{
line=$0
sub(/\\[\^$%.*+?!(){}\[\]|\\]/,"")
if(/[\^$%.*+?!(){}\[\]|\\]/)
print line
}' input.txt

sed '/[]\\^$%.*+?!(){}[|]/ {
h
s/\\[]\\^$%.*+?!(){}[|]/_/g
/[]\\^$%.*+?!(){}[|]/ {
x
p
}
}' YourFile
Depending of shell and sed could be interpreted (especialy the \) differently. Works on my AIX/KSH

how to check if awk array is empty

I am brand new to AWK and trying to determine if my array is empty or not so i can print a message if so. Typically i am use to length functions and can check like that, but it does not seem AWK has those. Here is my working code, i just want to print out a different message if there is nothing in the array after parsing all my data.
#add to array if condition is met
if ($2 == "SOURCE" && $4 == "RESTRICTED"){
sourceAndRestricted[$3]++;
}
#print out array
for (var in sourceAndRestricted){
printf "\t\t"var"\n"
}
ive tried something like this and its not working. Suggestions?
for (var in sourceAndRestricted){
if (var > 1){
printf "\t\t"var"\n"
}
else {
print "NONE"
}
}

Check it with length() function:
if ( length(sourceAndRestricted) > 0 ) {
printf "\t\t"var"\n"
}
else
print "NONE"
}

$ cat tst.awk
function isEmpty(arr, idx) {for (idx in arr) return 0; return 1}
BEGIN {
map[3] = 27
print isEmpty(map)
delete map[3]
print isEmpty(map)
}
$ awk -f tst.awk
0
1

grep lines before and after in aix/ksh shell

I want to extract lines before and after a matched pattern.
eg: if the file contents are as follows
absbasdakjkglksagjgj
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
klgdsgklsdgkldskgdsg
I need find hello and display line before and after 'hello'
the output should be
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
This is possible with GNU but i need a method that works in AIX / KSH SHELL WHERE NO GNU IS INSTALLED.

sed -n '/hello/{x;G;N;p;};h' filename

I've found it is generally less frustrating to build the GNU coreutils once, and benefit from many more features http://www.gnu.org/software/coreutils/

Since you'll have Perl on the machine, you could use the following code, but you'd probably do better to install the GNU utilities. This has options -b n1 for lines before and -f n1 for lines following the match. It works with PCRE matches (so if you want case-insensitive matching, add an i after the regex instead using a -i option. I haven't implemented -v or -l; I didn't need those.
#!/usr/bin/env perl
#
# #(#)$Id: sgrep.pl,v 1.7 2013/01/28 02:07:18 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use warnings;
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}

I had a situation where I was stuck with a slow telnet session on a tablet, believe it or not, and I couldn't write a Perl script very easily with that keyboard. I came up with this hacky maneuver that worked in a pinch for me with AIX's limited grep. This won't work well if your grep returns hundreds of lines, but if you just need one line and one or two above/below it, this could do it. First I ran this:
cat -n filename |grep criteria
By including the -n flag, I see the line number of the data I'm seeking, like this:
2543 my crucial data
Since cat gives the line number 2 spaces before and 1 space after, I could grep for the line number right before it like this:
cat -n filename |grep " 2542 "
I ran this a couple of times to give me lines 2542 and 2544 that bookended line 2543. Like I said, it's definitely fallable, like if you have reams of data that might have " 2542 " all over the place, but just to grab a couple of quick lines, it worked well.

Sorting files with unordered multi-part key

Using any combination of Linux tools (without going into any full featured programming language) how can I sort this list
A,C 1
C,B 2
B,A 3
into
A,B 3
A,C 1
B,C 2

Not applying for any beauty contest, this seems to come close:
#!/bin/bash
while read one two; do
one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e '
1 {h; d}
$! {H; d}
H; g; s/\n/,/g;
'`
echo $one $two
done | sort

Change the internal field separator, then compare the the first two letters with ">":
(
IFS=" ,";
while read a b n; do
if [ "$a" \> "$b" ]; then
echo "$b,$a $n";
else
echo "$a,$b $n";
fi;
done;
) <<EOF | sort
A,C 1
C,B 2
B,A 3
EOF

In case somebody is interested. I was not realy satisfied with any suggestions. Probably because I hoped for view lines solution and such doesn't exist as far as I know.
Anyway I did wrote an utility, called ljoin (for left join like in databases) which does exactly what I was asking for (of course :D)
#!/usr/bin/perl
=head1 NAME
ljoin.pl - Utility to left join files by specified key column(s)
=head1 SYNOPSIS
ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>
To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.
=head1 DESCRIPTION
This utility merges multiple file into one using specified column as a key
=head2 OPTIONS
=item --field-separator=<separator>, -fs <separator>
Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.
=item --no-sort-fields, -no-sf
Do not sort columns when creating a key for merging files
=item --complex-key-separator=<separator>, -ks <separator>
Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.
=item --no-sort-complex-keys, -no-sk
Do not sort complex column values when creating a key for merging files
=item --include-primary-field, -i
Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.
=item --primary-field-index=<index>, -f <index>
Specifies index of the column which should be used for matching lines. You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"
=item --help, -?
Get help and documentation
=cut
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my #primaryFieldIndexes;
GetOptions(
"field-separator|fs=s" => \$fieldSeparator,
"sort-fields|sf!" => \$sortFields,
"complex-key-separator|ks=s" => \$complexKeySeparator,
"sort-complex-keys|sk!" => \$sortComplexKeys,
"contains-titles|t!" => \$containsTitles,
"include-primary-field|i!" => \$includePrimaryField,
"primary-field-index|f=i#" => \#primaryFieldIndexes,
"help|?!" => sub { pod2usage(0) }
) or pod2usage(2);
pod2usage(0) if $#ARGV < 1;
push #primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;
my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
$primaryFieldIndexesHash{$i} = 1;
}
print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = #primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";
my $fieldsCount = 0;
my %keys_hash = ();
my %files = ();
my %titles = ();
# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1))
{
# Find files with specified pattern
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
open INPUT_FILE, $inputPath or die $!;
my %lines;
my $lineNumber = -1;
while (my $line = <INPUT_FILE>)
{
next if $containsTitles && $lineNumber == 0;
# Don't use chomp line. It doesn't handle unix input files on windows and vice versa
$line =~ s/[\r\n]+$//g;
# Skip lines that don't have columns
next if $line !~ m/($fieldSeparator)/;
# Split fields and count them (store maximum number of columns in files for later use)
my #fields = split($fieldSeparator, $line);
$fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;
# Sort complex key
my #multipleKey;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
my #complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
#complexKey = sort(#complexKey) if $sortFields;
push #multipleKey, join($complexKeySeparator, #complexKey)
}
# sort multiple keys and create key string
#multipleKey = sort(#multipleKey) if $sortFields;
my $fullKey = join $fieldSeparator, #multipleKey;
$lines{$fullKey} = \#fields;
$keys_hash{$fullKey} = 1;
}
close INPUT_FILE;
$files{$inputPath} = \%lines;
}
}
# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my #keys = sort keys(%keys_hash);
# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
print OUTPUT_FILE $fieldSeparator;
}
# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
print OUTPUT_FILE $inputPath;
for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
{
print OUTPUT_FILE $fieldSeparator;
}
}
}
# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key ( #keys )
{
print OUTPUT_FILE $key;
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
my $lines = $files{$inputPath};
for(my $i = 0; $i < $fieldsCount; $i++)
{
next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
print OUTPUT_FILE $fieldSeparator;
print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
}
}
}
print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to iterate over two sets of data? - linux

Related

Any one give me a solution for SORT

remove a line with special character with given pattern

how to check if awk array is empty

grep lines before and after in aix/ksh shell

Sorting files with unordered multi-part key

Categories

Resources