In Unix, I am trying to write a sequence of cut and paste commands (saving result of each command in a file) that inverts every name in the file(below) shortlist and places a coma after the last name(for example, bill johnson becomes johnson, bill).
here is my file shortlist:
2233:charles harris :g.m. :sales :12/12/52: 90000
9876:bill johnson :director :production:03/12/50:130000
5678:robert dylan :d.g.m. :marketing :04/19/43: 85000
2365:john woodcock :director :personnel :05/11/47:120000
5423:barry wood :chairman :admin :08/30/56:160000
I am able to cut from shortlist but not sure how to paste it on to my filenew file in same command line. Here is my code for cut:
cut -d: -f2 shortlist
result:
charles harris
bill johnson
robert dylan
john woodcock
barry wood
Now I want this to be pasted in my filenew file and when I cat filenew, result should look like below,
harris, charles
johnson, bill
dylan, robert
woodcock, john
wood, barry
Please guide me through this. Thank you.
You could do it with a single awk:
awk -F: '{split($2,a, / /); if(a[2]) l=a[2] ", "; print l a[1]}' shortlist
I am assuming that if you don't have a second name, you don't want to print the comma (and you don't have more than 2 words in the name).
Once you've used cut to split up the string, it may be easier to use awk than paste to produce the result you want:
$ cut -d":" -f2 shortlist | awk '{printf "%s, %s\n", $2, $1}'
Related
I am trying to separate a column of strings using the values from another column, maybe an example will be easier for you to understand.
The input is a table, with strings in column 2 separated with a comma ,.
The third column is the field number that should be outputted, with , as the delimited in the second column.
Ben mango,apple 1
Mary apple,orange,grape 2
Sam apple,melon,* 3
Peter melon 1
The output should look like this, where records that correspond to an asterisk should not be outputted (the Sam row is not outputted):
Ben mango
Mary orange
Peter melon
I am able to generate the desired output using a for loop, but I think it is quite cumbersome:
IFS=$'\n'
for i in $(cat input.txt)
do
F=`echo $i | cut -f3`
paste <(echo $i | cut -f1) <(echo $i | cut -f2 | cut -d "," -f$F) | grep -v "\*"
done
Is there any one-liner to do it maybe using sed or awk? Thanks in advance.
The key to doing it in awk is the split() function, which populates an array based on a regular expression that matches the delimiters to split a string on:
$ awk '{ split($2, fruits, /,/); if (fruits[$3] != "*") print $1, fruits[$3] }' input.txt
Ben mango
Mary orange
Peter melon
I have a csv file with 3 columns like below
Jones Smith 656220665
I would like to convert it to
Jones Smith 000000000
The problem i have is not all the numbers are the same length. some are 7 digits long. i can't seem to find a way to change them from their current format to 0,s and has to use sed and cut
Here is 2 of the codes i tried and tried to manipulate to suit my needs
sed 's/\([^ ]*\) \([^_]*\)_\(.*\)/\1 \3/g' Input_file
and
$ sed 's/\(\([^,]\+,\)\{1\}\)\([^,]\+,\)\(.*\)/\1\3\3\4/' /path/to/your/file
Instead of using sed, how about this:
echo 'Jones Smith 656220665' | tr '[0-9]' '0'
Jones Smith 000000000
For the whole file that's then:
tr '[0-9]' '0' < file > file.tmp
Edit 1:
added a sed solution:
sed 's/[0-9]/0/g' smith
Jones Smith 000000000
stil
Edit 2:
cat ClientData.csv > ClientData.csv.bak
sed 's/[0-9]/0/g; w ClientData.csv' ClientData.csv.bak | cut -d" " -f 1-2
You can do this very simply with sed general substitution of:
sed 's/[0-9]/0/g`
(where the 'g' provides a global replacement of all instances of [0-9] with '0'), e.g.
$ echo "Jones Smith 656220665" | sed 's/[0-9]/0/g'
Jones Smith 000000000
Give it a shot and let me know if you have further issues.
I have a sample.txt file as follows:
Name City ST Zip CTY
John Smith BrooklynNY10050USA
Paul DavidsonQueens NY10040USA
Michael SmithNY NY10030USA
George HermanBronx NY10020USA
Image of input (in case if upload doesn't show properly)
Input
Desired output is into separate columns as shown below:
Desired Output
I tried this:
#!/bin/bash
awk '{printf "%13-s %-8s %-2s %-5s %-3s\n", $1, $2, $3, $4, $5}' sample.txt > new.txt
And it's unsuccessful with this result:
Name City ST Zip CTY
John Smith BrooklynNY10050USA
Paul DavidsonQueens NY10040USA
Michael SmithNY NY10030USA
George HermanBronx NY10020USA
Would appreciate it if anyone could tweak this so the text file will be in delimited format as shown above. Thank you so much!!
You can use sed to insert spaces to specific positions:
cat data.txt | sed -e 's#\(.\{13\}\)\(.*\)#\1 \2#g' | sed -e 's#\(.\{22\}\)\(.*\)#\1 \2#g' |sed -e '1s#\(.\{29\}\)\(.*\)#\1 \2#g' | sed -e '2,$s#\(.\{25\}\)\(.*\)#\1 \2#g' | sed -e 's#\(.\{31\}\)\(.*\)#\1 \2#g'
With gawk you can set the input field widths in the BEGIN block:
$ gawk 'BEGIN { FIELDWIDTHS = "13 8 2 5 3" } { print $1, $2, $3, $4, $5 }' fw.txt
Name City ST Zip CTY
John Smith Brooklyn NY 10050 USA
Paul Davidson Queens NY 10040 USA
Michael Smith NY NY 10030 USA
George Herman Bronx NY 10020 USA
If your awk does not have FIELDWIDTHS, it's a bit tedious but you can use substr:
$ awk '{ print substr($0,1,13), substr($0,14,8), substr($0,22,2), substr($0,24,5), substr($0,29,3) }' fw.txt
Name City ST Zip CTY
John Smith Brooklyn NY 10050 USA
Paul Davidson Queens NY 10040 USA
Michael Smith NY NY 10030 USA
George Herman Bronx NY 10020 USA
You can split the field lengths into an array then loop over $0 and gather the substrings in regular awk:
awk 'BEGIN {n=split("13 8 2 5 3",ar)}
{
j=1
s=""
sep="\t"
for(i=1;i<n;i++)
{s=s substr($0, j, ar[i]) sep; j+=ar[i]}
s=s substr($0, j, ar[i])
print s
}' file
That uses a tab to delimit the fields, but you can also use a space if preferred.
I'm trying to edit some text files by replacing single characters with a new line.
Before:
Bill Fred Jack L Max Sam
After:
Bill Fred Jack
Max Sam
This is the closest I have gotten, but the single character is not always going to be 'L'.
cat File.txt | tr "L" "\n
you can try this
sed "s/\s\S\s/\n/g" File.txt
explanation,
You want to convert any word formed by a single character in special character: line break \n,
\s : Space and tab
\S : Non-whitespace characters
bash-4.3$ cat file.txt
1.Bill Fred Jack L Max Sam
2.Bill Fred Jack M Max Sam
3.Bill Fred Jack N Max Sam
bash-4.3$ sed 's/\s[A-Z]\s/\n/g' file.txt
1.Bill Fred Jack
Max Sam
2.Bill Fred Jack
Max Sam
3.Bill Fred Jack
Max Sam
sed "s/[[:blank:]][[:alpha:]][[:blank:]]/\
/g" YourFile
posix version
assuming that single letter is inside the string and not to the edge (start or end)
From the input file:
I am Peter
I am Mary
I am Peter Peter Peter
I am Peter Peter
I want output to be like this:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
Where 1, 3 and 2 are occurrences of "Peter".
I tried this, but the info is not formatted the way I wanted:
grep -o -n Peter inputfile
This is not easily solved with grep, I would suggest moving "two tools up" to awk:
awk '$0 ~ FS { print NF-1, $0 }' FS="Peter" inputfile
Output:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
###Edit
To answer a question in the comments:
What if I want case insensitive? and what if I want multiple pattern
like "Peter|Mary|Paul", so "I am Peter peter pAul Mary marY John",
will yield the count of 5?
If you are using GNU awk, you do it by enabling IGNORECASE and setting the pattern in FS like this:
awk '$0 ~ FS { print NF-1, $0 }' IGNORECASE=1 FS="Peter|Mary|Paul" inputfile
Output:
1 I am Peter
1 I am Mary
3 I am Peter Peter Peter
2 I am Peter Peter
5 I am Peter peter pAul Mary marY John
You don’t need -o or -n. From grep --help:
-o, --only-matching show only the part of a line matching PATTERN
...
-n, --line-number print line number with output lines
Remove them and your output will be better. I think you’re misinterpreting -n -- it just shows the line number, not the occurrence count.
It looks like you’re trying to get the count of “Peter” appearances per line. You’d need something beyond a single grep for that. awk could be a good choice. Or you could loop over each each line to split into words (say an array) and grep -c the array for each line, to print the line’s count.