Expand one column while preserving another - text

I am trying to get column one repeated for every value in column two which needs to be on a new line.
cat ToExpand.txt
Pete horse;cat;dog
Claire car
John house;garden
My first attempt:
cat expand.awk
BEGIN {
FS="\t"
RS=";"
}
{
print $1 "\t" $2
}
awk -f expand.awk ToExpand.txt
Pete horse
cat
dog
Claire car
John
garden
The desired output is:
Pete horse
Pete cat
Pete dog
Claire car
John house
John garden
Am I on the right track here or would you use another approach? Thanks in advance.

You could also change the FS value into a regex and do something like this:
awk -F"\t|;" -v OFS="\t" '{for(i=2;i<=NF;i++) print $1, $i}' ToExpand.txt
Pete horse
Pete cat
Pete dog
Claire car
John house
John garden
I'm assuming that:
The first tab is the delimiter for the name
There's only one tab delimiter - If tab delimited data occurs after the ; section use fedorqui's implementation.
It's using an alternate form of setting the OFS value ( using the -v flag ) and loops over the fields after the first to print the expected output.
You can think of RS in your example as making "lines" out of your data ( records really ) and your print block is acting on those "lines"(records) instead of the normal newline. Then each record is further parsed by your FS. That's why you get the output from your first attempt. You can explore that by printing out the value of NF in your example.

Try:
awk '{gsub(/;/,ORS $1 OFS)}1' OFS='\t' file
This replaces every occurrence of a semicolon with a newline, the first field and the output field separator..
Output:
Pete horse
Pete cat
Pete dog
Claire car
John house
John garden

Related

Printing First Variable in Awk but Only If It's Less than X

I have a file with words and I need to print only the lines that are less than or equal to 4 characters but I'm having trouble with my code. There is other text on the end of the lines but I shortened it for here.
file:
John Doe
Jane Doe
Mark Smith
Abigail Smith
Bill Adams
What I want to do is print the names that have less than 4 characters.
What I've tried:
awk '$1 <= 4 {print $1}' inputfile
What I'm hoping to get:
John
Jane
Mark
Bill
So far, I've got nothing. Either it prints out everything, with no length restrictions or it doesn't even print anything at all. Could someone take a look at this and see what they think?
Thanks
First, let understand why
awk '$1 <= 4 {print $1}' inputfile
gives you whole inputfile, $1 <= 4 is numeric comparison, so this prompt GNU AWK to try to convert first column value to numeric value, but what is numeric value of say
John
? As GNU AWK manual Strings And Numbers put it
A string is converted to a number by interpreting any numeric prefix
of the string as numerals(...)Strings that can’t be interpreted as
valid numbers convert to zero.
Therefore numeric value for John from GNU AWK point of view is zero.
In order to get desired output you might use length function which returns number of characters as follows
awk 'length($1)<=4{print $1}' inputfile
or alternatively pattern matching from 0 to 4 characters that is
awk '$1~/^.{0,4}$/{print $1}' inputfile
where $1~ means check if 1st field match, . denotes any character, {0,4} from 0 to 4 repetitions, ^ begin of string, $ end of string (these 2 are required as otherwise it would also match longer string, as they do contain substring .{0,4})
Both codes for inputfile
John Doe
Jane Doe
Mark Smith
Abigail Smith
Bill Adams
give output
John
Jane
Mark
Bill
(tested in gawk 4.2.1)

How to sort a column with $ and ',' '.' sign bash command line?

I have a file, and i want to use something like "Cat" command on that file which print out the sorted list.
For example a column loooks like This
Mike $1.00
Mason $1,000,000.00
Tyler $100,000.00
Nick $0.10
Result
Nick $0.10
Mike $1.00
Tyler $100,000.00
Mason $1,000,000.00
You can try this
sort -t$ -nk2 fileName
Description :
-t$ : use $ as separator
-nk2 : sort using numbers in column 2

save multiple matches in a list (grep or awk)

I have a file that looks something like this:
# a mess of text
Hello. Student Joe Deere has
id number 1. Over.
# some more messy text
Hello. Student Steve Michael Smith has
id number 2. Over.
# etc.
I want to record the pairs (Joe Deere, 1), (Steve Michael Smith, 2), etc. into a list (or two separate lists with the same order). Namely, I will need to loop over those pairs and do something with the names and ids.
(names and ids are on distinct lines, but come in the order: name1, id1, name2, id2, etc. in the text). I am able to extract the lines of interest with
VAR=$(awk '/Student/,/Over/' filename.txt)
I think I know how to extract the names and ids with grep, but it will give me the result as one big block like
`Joe Deere 1 Steve Michael Smith 2 ...`
(and maybe even with a separator between names and ids). I am not sure at this point how to go forward with this, and in any case it doesn't feel like the right approach.
I am sure that there is a one-liner in awk that will do what I need. The possibilities are infinite and the documentation monumental.
Any suggestion?
$ cat tst.awk
/^id number/ {
gsub(/^([^ ]+ ){2}| [^ ]+$/,"",prev)
printf "(%s, %d)\n", prev, $3
}
{ prev = $0 }
$ awk -f tst.awk file
(Joe Deere, 1)
(Steve Michael Smith, 2)
Could you please try following too.
awk '
/id number/{
sub(/\./,"",$3)
print val", "$3
val=""
next
}
{
gsub(/Hello\. Student | has.*/,"")
val=$0
}
' Input_file
grep -oP 'Hello. Student \K.+(?= has)|id number \K\d+' file | paste - -

How to sort lines in textfile according to a second textfile

I have two text files.
File A.txt:
john
peter
mary
alex
cloey
File B.txt
peter does something
cloey looks at him
franz is the new here
mary sleeps
I'd like to
merge the two
sort one file according to the other
put the unknown lines of B at the end
like this:
john
peter does something
mary sleeps
alex
cloey looks at him
franz is the new here
$ awk '
NR==FNR { b[$1]=$0; next }
{ print ($1 in b ? b[$1] : $1); delete b[$1] }
END { for (i in b) print b[i] }
' fileB fileA
john
peter does something
mary sleeps
alex
cloey looks at him
franz is the new here
The above will print the remaining items from fileB in a "random" order (see http://www.gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array for details). If that's a problem then edit your question to clarify your requirements for the order those need to be printed in.
It also assumes the keys in each file are unique (e.g. peter only appears as a key value once in each file). If that's not the case then again edit your question to include cases where a key appears multiple times in your ample input/output and additionally explain how you want the handled.

Scripting - copy line and the second line IF the second line has a string

I have a problem where I have a large amount of files that I need to scan and return a line and its following line, but only when the following line begins with a string.
String one - line one must begin with 'Bill'
String two - line two must begin with 'Jones'.
If these two criteria are matched, it returns the two lines. Repeat for the whole file.
ie. original file:
Edith Blue
Edith Green
Edith Red
Bill Blue
Jones Red
Edith Green
Bill Green
Edith Red
Jones Green
Bill Blue
I'd want it to return only:
Bill Blue
Jones Red
Any ideas? No idea where to begin with this, I only have basic scripting skills with sed/awk etc... At the moment I am using this to get the filename and its following line, but it is giving me too much useless information that I have to strip off with other sed commands.
grep -A 1 "^Bill" * > test.txt
I guess there's a far more elegant way of getting only the lines I need. Any help would be lovely!
As an extension of your initial approach, a simple solution is to grep lines starting with "Bill" returning one after, then find lines starting with "Jones" returning one before....
grep -A1 "^Bill" myfile.txt | grep "^Jones" -B1
Output:
Bill Blue
Jones Red
Side note: as a true test, your input file should probably have some lines where Bill and Jones are not at the start of the line...
Edith Blue
Edith Jones
Edith Red
Bill Blue
Jones Red
Edith Bill
Bill Jones
Edith Red
Jones Green
Bill Blue
Use the getline() instruction of awk for each line that begins with Bill:
awk '
$1 ~ /^Bill/ {
getline l
if ( l ~ /^Jones/ ) {
printf "%s\n%s\n", $0, l
}
}
' infile
It yields:
Bill Blue
Jones Red
And here is another way using awk with a flag:
$ awk '$1=="Bill"{p=1;a=$0;next};$1=="Jones"&&p{print a;print};{p=0}' file
Bill Blue
Jones Red
Here is a simple python script:
FILE = 'test.text'
f = open(FILE,'r')
one = 'Bill'
two = 'Jones'
prev = ''
for line in f:
if prev.startswith(one) and line.startswith(two):
print prev,line.rstrip()
prev = line
Yields:
python FileRead.py
Bill Blue
Jones Red
This might work for you (GNU sed):
sed -n '$!N;/^Bill.*\nJones/p;D' file

Resources