Scripting - copy line and the second line IF the second line has a string - linux

I have a problem where I have a large amount of files that I need to scan and return a line and its following line, but only when the following line begins with a string.
String one - line one must begin with 'Bill'
String two - line two must begin with 'Jones'.
If these two criteria are matched, it returns the two lines. Repeat for the whole file.
ie. original file:
Edith Blue
Edith Green
Edith Red
Bill Blue
Jones Red
Edith Green
Bill Green
Edith Red
Jones Green
Bill Blue
I'd want it to return only:
Bill Blue
Jones Red
Any ideas? No idea where to begin with this, I only have basic scripting skills with sed/awk etc... At the moment I am using this to get the filename and its following line, but it is giving me too much useless information that I have to strip off with other sed commands.
grep -A 1 "^Bill" * > test.txt
I guess there's a far more elegant way of getting only the lines I need. Any help would be lovely!

As an extension of your initial approach, a simple solution is to grep lines starting with "Bill" returning one after, then find lines starting with "Jones" returning one before....
grep -A1 "^Bill" myfile.txt | grep "^Jones" -B1
Output:
Bill Blue
Jones Red
Side note: as a true test, your input file should probably have some lines where Bill and Jones are not at the start of the line...
Edith Blue
Edith Jones
Edith Red
Bill Blue
Jones Red
Edith Bill
Bill Jones
Edith Red
Jones Green
Bill Blue

Use the getline() instruction of awk for each line that begins with Bill:
awk '
$1 ~ /^Bill/ {
getline l
if ( l ~ /^Jones/ ) {
printf "%s\n%s\n", $0, l
}
}
' infile
It yields:
Bill Blue
Jones Red

And here is another way using awk with a flag:
$ awk '$1=="Bill"{p=1;a=$0;next};$1=="Jones"&&p{print a;print};{p=0}' file
Bill Blue
Jones Red

Here is a simple python script:
FILE = 'test.text'
f = open(FILE,'r')
one = 'Bill'
two = 'Jones'
prev = ''
for line in f:
if prev.startswith(one) and line.startswith(two):
print prev,line.rstrip()
prev = line
Yields:
python FileRead.py
Bill Blue
Jones Red

This might work for you (GNU sed):
sed -n '$!N;/^Bill.*\nJones/p;D' file

Related

How to add 10 spaces at the frontend | Unix |

I have a text file : ABC.txt which contain below data
A Apple a day keeps a doctor away
B I like to play with Ball
C I have cat at my home
D My Dog name is bob
I want to display output on my screen with 10 spaces in a frontend and then my file data
Expected output :
A Apple a day keeps a doctor away
B I like to play with Ball
C I have cat at my home
D My Dog name is bob
I have tried this but not working
Command :
cat ABC.txt | column -t
prefix=' '
sed "s/^/$prefix/" ABC.txt
^ matches the beginning of the line, and this replaces it with 10 spaces.
If the number of spaces can vary, you can calculate the prefix variable instead of hard-coding it. See How can I repeat a character in Bash?

Printing First Variable in Awk but Only If It's Less than X

I have a file with words and I need to print only the lines that are less than or equal to 4 characters but I'm having trouble with my code. There is other text on the end of the lines but I shortened it for here.
file:
John Doe
Jane Doe
Mark Smith
Abigail Smith
Bill Adams
What I want to do is print the names that have less than 4 characters.
What I've tried:
awk '$1 <= 4 {print $1}' inputfile
What I'm hoping to get:
John
Jane
Mark
Bill
So far, I've got nothing. Either it prints out everything, with no length restrictions or it doesn't even print anything at all. Could someone take a look at this and see what they think?
Thanks
First, let understand why
awk '$1 <= 4 {print $1}' inputfile
gives you whole inputfile, $1 <= 4 is numeric comparison, so this prompt GNU AWK to try to convert first column value to numeric value, but what is numeric value of say
John
? As GNU AWK manual Strings And Numbers put it
A string is converted to a number by interpreting any numeric prefix
of the string as numerals(...)Strings that can’t be interpreted as
valid numbers convert to zero.
Therefore numeric value for John from GNU AWK point of view is zero.
In order to get desired output you might use length function which returns number of characters as follows
awk 'length($1)<=4{print $1}' inputfile
or alternatively pattern matching from 0 to 4 characters that is
awk '$1~/^.{0,4}$/{print $1}' inputfile
where $1~ means check if 1st field match, . denotes any character, {0,4} from 0 to 4 repetitions, ^ begin of string, $ end of string (these 2 are required as otherwise it would also match longer string, as they do contain substring .{0,4})
Both codes for inputfile
John Doe
Jane Doe
Mark Smith
Abigail Smith
Bill Adams
give output
John
Jane
Mark
Bill
(tested in gawk 4.2.1)

Manipulate CSV file: increment cell coordinates/position

I have a csv file with one entry on each line, three entries form a whole dataset. So what I need to do now, is to put these sets in the columns in one row. I have difficutlies to describe the problem (thus my search was not giving me a solution), so here's an example.
Sample CSV file:
1 Joe
2 Doe
3 7/7/1990
4 Jane
5 Done
6 6/6/2000
What I want in the end is this:
1 Name Surname Birthdate
2 Joe Doe 7/7/1990
3 Jane Done 6/6/2000
I'm trying to find a solution to make this automatically, as my actual file consists of 480 datasets, each set containing 16 entries, and it would take me days to do it manually.
I was able to fill the first line with Excel's indirect function:
=INDIRECT("A"&COLUMN()-COLUMN($A1))
As COLUMN returns the column number, if I drag the first line down in Excel, obviously this shows exactly the same as the first line:
1 Name Surname Birthdate
2 Joe Doe 7/7/1990
3 Joe Doe 7/7/1990
Now I'm looking for a way to increment the cell position by one:
A B C D
1 Joe =A1 =B1+1 =C1+1
2 Doe =D1+1
3 7/7/1990
4 Jane
What should lead to:
A B C D
1 Joe =A1 =A2 =A3
2 Doe =A4 =A5 =A4
3 7/7/1990
4 Jane
As you can see in the example given, the cell coordinates for A increment by one, and I have no idea how to do this automatically in Excel. I think there must be a better way than using nested Excel function, as the task (increment +1) seems actually pretty easy.
I'm also open to solutions involving sed, awk (of which I only have a very superficial knowledge) or other command line tools.
You're help is appreciated very much!
awk 'BEGIN { y=1; printf "Name Surname Birthdate\n%s",y; x=1;}
{if (x == 3) {
y = y + 1;
printf "%s\n%s",$2,y;
x=1;
}
else {
printf " %s ",$2;
x = x + 1;
}}' input_file.txt
This may work for what you want to do. Your sample does not include the commas, so I'm not sure if they are really in there or not. If they are, you will need to modify the code slightly with the -F, flag so that it treats them as such.
This second code snippet will provide the output with a comma delimiter. Again, it is assuming that your sample input file did not have commas to delimit the 1 Joe and 2 Doe.
awk 'BEGIN { y=1; printf "Name Surname Birthdate\n%s",y; x=1;}
{if (x == 3) {
y = y + 1;
printf "%s\n%s,",$2,y;
x=1;
}
else {
printf " %s,",$2;
x = x + 1;
}}' input_file.txt
Both of the awk scripts will set x and y variables to one, where the y variable will increment your line numbering. The x variable will count up to 3 and then reset itself back to one. This is so that it prints each line in a row, until it gets to the 3rd item where it will then insert a newline character.
There are easier/more complex ways to do this with regexes and a language like perl, but since you mentioned awk, I believe this will work fine.

How to sort lines in textfile according to a second textfile

I have two text files.
File A.txt:
john
peter
mary
alex
cloey
File B.txt
peter does something
cloey looks at him
franz is the new here
mary sleeps
I'd like to
merge the two
sort one file according to the other
put the unknown lines of B at the end
like this:
john
peter does something
mary sleeps
alex
cloey looks at him
franz is the new here
$ awk '
NR==FNR { b[$1]=$0; next }
{ print ($1 in b ? b[$1] : $1); delete b[$1] }
END { for (i in b) print b[i] }
' fileB fileA
john
peter does something
mary sleeps
alex
cloey looks at him
franz is the new here
The above will print the remaining items from fileB in a "random" order (see http://www.gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array for details). If that's a problem then edit your question to clarify your requirements for the order those need to be printed in.
It also assumes the keys in each file are unique (e.g. peter only appears as a key value once in each file). If that's not the case then again edit your question to include cases where a key appears multiple times in your ample input/output and additionally explain how you want the handled.

Expand one column while preserving another

I am trying to get column one repeated for every value in column two which needs to be on a new line.
cat ToExpand.txt
Pete horse;cat;dog
Claire car
John house;garden
My first attempt:
cat expand.awk
BEGIN {
FS="\t"
RS=";"
}
{
print $1 "\t" $2
}
awk -f expand.awk ToExpand.txt
Pete horse
cat
dog
Claire car
John
garden
The desired output is:
Pete horse
Pete cat
Pete dog
Claire car
John house
John garden
Am I on the right track here or would you use another approach? Thanks in advance.
You could also change the FS value into a regex and do something like this:
awk -F"\t|;" -v OFS="\t" '{for(i=2;i<=NF;i++) print $1, $i}' ToExpand.txt
Pete horse
Pete cat
Pete dog
Claire car
John house
John garden
I'm assuming that:
The first tab is the delimiter for the name
There's only one tab delimiter - If tab delimited data occurs after the ; section use fedorqui's implementation.
It's using an alternate form of setting the OFS value ( using the -v flag ) and loops over the fields after the first to print the expected output.
You can think of RS in your example as making "lines" out of your data ( records really ) and your print block is acting on those "lines"(records) instead of the normal newline. Then each record is further parsed by your FS. That's why you get the output from your first attempt. You can explore that by printing out the value of NF in your example.
Try:
awk '{gsub(/;/,ORS $1 OFS)}1' OFS='\t' file
This replaces every occurrence of a semicolon with a newline, the first field and the output field separator..
Output:
Pete horse
Pete cat
Pete dog
Claire car
John house
John garden

Resources