(Unix) Changing A Row To A Column In A Text File - linux

I currently have a text file that has the following data in row format:
TIME (HR) 0 6 12 18 24 36 48 60 72 84 96 108 120
I would like to "flip" this row into a column so that it reads:
TIME (HR)
0
6
12
18
24
etc...
Is there a way to do this with sed/awk?

grep could do:
grep -Po '.*\)|\d+' file
this line works too:
grep -Po '.*?(?= \d)|\d+' file
test:
kent$ cat f
TIME (HR) 0 6 12 18 24 36 48 60 72 84 96 108 120
kent$ grep -Po '.*\)|\d+' f
TIME (HR)
0
6
12
18
24
36
48
60
72
84
96
108
120

$ awk -v RS=' ' '{ORS=(NR<2?" ":"\n")}1' file
TIME (HR)
0
6
12
18
24

Through awk,
awk '{print $1,$2;for(i=3;i<=NF;i++) print $i}' file
Through perl,
perl -pe 's/(^\S+\s+\S+)(*SKIP)(*F)| /\n/g' file

Another perl one:
perl -pe 's/\s+(?=\d+)/\n/g'
Test:
$ echo 'TIME (HR) 0 6 12 18 24 36 48 60 72 84 96 108 120' | perl -pe 's/ (?=\d+)/\n/g'
TIME (HR)
0
6
12
18
24
36
48
60
72
84
96
108
120
Another GREAT solutions (from the comments from #AvinashRaj)
perl -pe 's/\s+(?!\()/\n/g'
perl -pe 's/ (?=\b)/\n/g'

sed 's/ \([0-9]\)/\
\1/g' YourFile
posix version (so --posix for GNU sed)
chanage any space followed by a digit by a return. Digit is keep in memory and set back bacause there is no back reference in sed regex

Related

Linux execute php file with arguments

I have .php takes three parameters. For example: ./execute.php 11 111 111
I have like list of data in text file with spacing. For example:
22 222 222
33 333 333
44 444 444
I was thinking for using xargs to pass in the arguements but its not working.
here is my try
cat raw.txt | xargs -I % ./execute.php %0 %1 %2
doesn't work, any idea?
thanks for the help
As per the following transcript, you are not handling the data correctly:
pax> printf '2 22 222\n3 33 333\n4 44 444\n' | xargs -I % echo %0 %1 %2
2 22 2220 2 22 2221 2 22 2222
3 33 3330 3 33 3331 3 33 3332
4 44 4440 4 44 4441 4 44 4442
Each % is giving you the entire line, and the digit following the % is just tacked on to the end.
To investigate, lets first create a fake processing file proc.sh (and chmod 700 it so we can run it easily):
#!/usr/bin/env bash
echo "$# '$1' '$2' '$3'"
Even if you switch to xargs -I % ./proc.sh %, you'll find you get one argument with embedded spaces, not three individual arguments:
pax> vi proc.sh ; printf '2 22 222\n3 33 333\n4 44 444\n' | xargs -I % ./proc.sh %
1 '2 22 222' '' ''
1 '3 33 333' '' ''
1 '4 44 444' '' ''
The easiest solution is probably to switch to a for read loop, something like:
pax:~> printf '2 22 222\n3 33 333\n4 44 444\n' | while read p1 p2 p3 ; do ./proc.sh ${p1} ${p2} ${p3} ; done
3 '2' '22' '222'
3 '3' '33' '333'
3 '4' '44' '444'
You can see there the program is called with three arguments, you just have to adapt it to your own program:
while read p1 p2 p3 ; do ./proc.sh ${p1} ${p2} ${p3} ; done < raw.txt

Compare two files and write the unmatched numbers in a new file

I have two files where ifile1.txt is a subset of ifile2.txt.
ifile1.txt ifile2.txt
2 2
23 23
43 33
51 43
76 50
81 51
100 72
76
81
89
100
Desire output
ofile.txt
33
50
72
89
I was trying with
diff ifile1.txt ifile2.txt > ofile.txt
but it is giving different format of output.
Since your files are sorted, you can use the comm command for this:
comm -1 -3 ifile1.txt ifile2.txt > ofile.txt
-1 means omit the lines unique to the first file, and -3 means omit the lines that are in both files, so this shows just the lines that are unique to the second file.
This will do your job:
diff file1 file2 |awk '{print $2}'
You could try:
diff file1 file2 | awk '{print $2}' | grep -v '^$' > output.file

How to identify lines ending with 5 in a file

I have a file test.lst whose contents are like below.
Using CYGWIN_NT-6.1-WOW64.
I need to select only those lines which do not end with 5.
12
23
45
56
45
23
09
12
99
100
0000
9999999
The output should be:
12
23
56
23
09
12
99
100
0000
9999999
with grep -v '5$' test.txt, I am getting below:
[2014-11-28 17:42.57] /drives/d/Shantanu/MyScript
[463615.PC172645] ➤ grep -v '5$' test.txt
12
23
45
56
45
23
09
12
99
100
0000
9999999
[2014-11-28 17:43.21]
Just grep out them:
grep -v '5$' file
This looks for lines ending with 5 ($ refers to the end of line). Then -v inverts the match.
For your input it returns:
12
23
56
23
09
12
99
100
0000
9999999
You could use inverse grep search or inverse search using sed like below:
sed -n '/5$/!p' test.txt
or
grep -v "5$" test.txt
You could use a negated character class.
grep '[^5]$' file
[^5]$ matches the last character at the last which wouldn't be a number 5. By default grep would print all the lines which has a match.

remove portion of a column in a .tab in unix

Can someone please help! I'm trying to delete the last portion (following "_cO") in the second column of the following list in the bash shell. E.G where it says "_seq1" in this particular list. I do not want to change any other info in the remaining columns.
Thanks!
XP_003962102 comp1000054_c0_seq1 24.07 54 41 0 164 3
XP_003962102 comp1000054_c0_seq1 24.07 54 41 0 164 3
XP_003962102 comp1000054_c0_seq1 24.07 54 41 0 164 3
XP_003962102 comp1000054_c0_seq1 24.07 54 41 0 164 3
Here you go, a simple substitution using sed:
sed -e s/_seq1//
Using sed:
sed -i.bak 's/^\(.*_c0\)[^ ]*\( .*\)$/\1\2/' file
OR using awk:
awk '{sub(/_c0[^ ]*/, "_c0", $2)} 1' file

how to subset a file - select a numbers of rows or columns

I would like to have your advice/help on how to subset a big file (millions of rows or lines).
For example,
(1)
I have big file (millions of rows, tab-delimited). I want to a subset of this file with only rows from 10000 to 100000.
(2)
I have big file (millions of columns, tab-delimited). I want to a subset of this file with only columns from 10000 to 100000.
I know there are tools like head, tail, cut, split, and awk or sed. I can use them to do simple subsetting. But, I do not know how to do this job.
Could you please give any advice? Thanks in advance.
Filtering rows is easy, for example with AWK:
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
Filtering columns is easier with CUT:
cat largefile | cut -d '\t' -f 10000-100000
As Rahul Dravid mentioned, cat is not a must here, and as Zsolt Botykai added you can improve performance using:
awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefile
cut -d '\t' -f 10000-100000 largefile
Some different solutions:
For row ranges:
In sed :
sed -n 10000,100000p somefile.txt
For column ranges in awk:
awk -v f=10000 -v t=100000 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt
For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.
tail -n +10000 largefile | head -n 90000 -
Was beaten to it for the sed solution, so I'll post a perl dito instead.
To print selected lines.
$ seq 100 | perl -ne 'print if $. >= 10 && $. <= 20'
10
11
12
13
14
15
16
17
18
19
20
To print selective columns, use
perl -lane 'print $F[1] .. $F[3] '
-F is used in conjunction with -a, to choose the delimiter on which to split lines.
To test, use seq and paste to get generate some columns
$ seq 50 | paste - - - - -
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45
46 47 48 49 50
Lets's print everything except the first and the last column
$ seq 50 | paste - - - - - | perl -lane 'print join " ", $F[1] .. $F[3]'
2 3 4
7 8 9
12 13 14
17 18 19
22 23 24
27 28 29
32 33 34
37 38 39
42 43 44
47 48 49
In the join statement above, there is a tab, you get it by doing a ctrl-v tab.

Resources