I have .php takes three parameters. For example: ./execute.php 11 111 111
I have like list of data in text file with spacing. For example:
22 222 222
33 333 333
44 444 444
I was thinking for using xargs to pass in the arguements but its not working.
here is my try
cat raw.txt | xargs -I % ./execute.php %0 %1 %2
doesn't work, any idea?
thanks for the help
As per the following transcript, you are not handling the data correctly:
pax> printf '2 22 222\n3 33 333\n4 44 444\n' | xargs -I % echo %0 %1 %2
2 22 2220 2 22 2221 2 22 2222
3 33 3330 3 33 3331 3 33 3332
4 44 4440 4 44 4441 4 44 4442
Each % is giving you the entire line, and the digit following the % is just tacked on to the end.
To investigate, lets first create a fake processing file proc.sh (and chmod 700 it so we can run it easily):
#!/usr/bin/env bash
echo "$# '$1' '$2' '$3'"
Even if you switch to xargs -I % ./proc.sh %, you'll find you get one argument with embedded spaces, not three individual arguments:
pax> vi proc.sh ; printf '2 22 222\n3 33 333\n4 44 444\n' | xargs -I % ./proc.sh %
1 '2 22 222' '' ''
1 '3 33 333' '' ''
1 '4 44 444' '' ''
The easiest solution is probably to switch to a for read loop, something like:
pax:~> printf '2 22 222\n3 33 333\n4 44 444\n' | while read p1 p2 p3 ; do ./proc.sh ${p1} ${p2} ${p3} ; done
3 '2' '22' '222'
3 '3' '33' '333'
3 '4' '44' '444'
You can see there the program is called with three arguments, you just have to adapt it to your own program:
while read p1 p2 p3 ; do ./proc.sh ${p1} ${p2} ${p3} ; done < raw.txt
Related
I have the following folder structure:
A/B/C/D/E/00
A/B/C/D/E/01
.
.
A/B/C/D/E/23
Similarly,
M/N/O/P/Q/00
M/N/O/P/Q/01
.
.
M/N/O/P/Q/23
Now, each folder from 00 to 23 has many files inside, which I would like to count.
If I run this simple command:
ls /A/B/C/D/E/00 | wc -l
I can get the count of files in each of these sub directories. I want to automate this or get it iteratively. Can anyone suggest a way?
Also, the final output I am looking at is a file that should look like this:
C E RESULT OF ls /A/B/C/D/E/00 | wc -l RESULT OF ls /A/B/C/D/E/01 | wc -l
M Q RESULT OF ls /M/N/O/P/Q/00 | wc -l RESULT OF ls /M/N/O/P/Q/01 | wc -l
So, the output should look like this finally
C E 23 23 4 6 7 4 76 98 57 2 67 9 12 34 67 0 2 3 78 98 12 3 57 213
M Q 12 10 2 34 32 1 35 65 87 8 32 2 65 87 98 0 4 12 1 35 34 76 9 67
Please note, the values after the alphabets are the values of file counts of the 24 folders 00, 01 through 23.
Using the eval approach: I can hardcode and get the exact results. But, I wanted it in a way that would show me the data for the previous day. So this is what I did:
d=`date --date ="1 days ago" +%Y%m%d`
month= `date +%Y%m`
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{20150800..20150823})'| wc -l)"'
This works perfectly because in the given location there are files inside child directories 20150800,20150801..20150823. However when I try to generalize this like below, it gives me the total count of the folder instead of the count of each sub folder:
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{"$d"00.."$d"23})'| wc -l)"'
Something like this (not tested):
for d in [A-Z]/[A-Z]/[A-Z]/[A-Z]/[A-Z]/[0-9][0-9]
do
[[ -d $d ]] && echo $d : $(ls $d|wc -l)
done
Note that this gives an inccorect line count if one of the file names contains a newline character.
I need help
From a list I would like to get the addition of characters like the example below:
Start:
1
1
13
5
14
4
1
5
12
7
8
9
4
18
3
20
11
17
13
===============================================
Final results :
9001
9001
9013
9005
9014
9004
9001
9005
9012
9007
9008
9009
9004
9018
9003
9020
9011
9017
9013
this command does not work:
sed "s/^/9000/g" file.txt
This might work for you (GNU sed):
sed -r 's/^/0000/;s/^0*(.{3})$/9\1/' file
Prepend zeroes to the front of the number. Prepend a 9 and remove excess zeroes.
This should work for you.
for num in `cat file.txt`; do if [ $num -le 9000 ]; then echo "$(($num + 9000))"; else echo $num; fi; done
You can do it like this:
for num in 1 1 13 5 14 4 1 5 12 7 8 9 4 18 3 20 11 17; do echo "$(($num + 9000))"; done
If you also have numbers in the list, which you don't want to process, because they are already in the 90XX format you can throw in an if statement:
for num in 1 1 13 5 14 4 1 5 12 7 8 9 4 18 3 20 11 17 9005; do if [ $(($num)) -le 9000 ]; then echo "$(($num + 9000))"; else echo $num; fi; done
For loop in bash - for; do; done;
Bash arithmetic expression to add the numbers - $((EXPR))
You can try (GNU sed):
sed 's/.*/echo $((9000+&))/e' infile
I currently have a text file that has the following data in row format:
TIME (HR) 0 6 12 18 24 36 48 60 72 84 96 108 120
I would like to "flip" this row into a column so that it reads:
TIME (HR)
0
6
12
18
24
etc...
Is there a way to do this with sed/awk?
grep could do:
grep -Po '.*\)|\d+' file
this line works too:
grep -Po '.*?(?= \d)|\d+' file
test:
kent$ cat f
TIME (HR) 0 6 12 18 24 36 48 60 72 84 96 108 120
kent$ grep -Po '.*\)|\d+' f
TIME (HR)
0
6
12
18
24
36
48
60
72
84
96
108
120
$ awk -v RS=' ' '{ORS=(NR<2?" ":"\n")}1' file
TIME (HR)
0
6
12
18
24
Through awk,
awk '{print $1,$2;for(i=3;i<=NF;i++) print $i}' file
Through perl,
perl -pe 's/(^\S+\s+\S+)(*SKIP)(*F)| /\n/g' file
Another perl one:
perl -pe 's/\s+(?=\d+)/\n/g'
Test:
$ echo 'TIME (HR) 0 6 12 18 24 36 48 60 72 84 96 108 120' | perl -pe 's/ (?=\d+)/\n/g'
TIME (HR)
0
6
12
18
24
36
48
60
72
84
96
108
120
Another GREAT solutions (from the comments from #AvinashRaj)
perl -pe 's/\s+(?!\()/\n/g'
perl -pe 's/ (?=\b)/\n/g'
sed 's/ \([0-9]\)/\
\1/g' YourFile
posix version (so --posix for GNU sed)
chanage any space followed by a digit by a return. Digit is keep in memory and set back bacause there is no back reference in sed regex
I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.
The file names look like:
sample_problem1_part1.txt
sample_problem1_part2.txt
sample_problem2_part1.txt
sample_problem2_part2.txt
sample_problem3_part1.txt
sample_problem3_part2.txt
......
Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines.
The content looks like:
sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7
sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14
sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g
The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)
1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...
I was using:
paste sample_problem1_part1.txt sample_problem1_part2.txt > \
sample_problem1_partall.txt
paste sample_problem2_part1.txt sample_problem2_part2.txt > \
sample_problem2_partall.txt
paste sample_problem3_part1.txt sample_problem3_part2.txt > \
sample_problem3_partall.txt
And then:
for i in `find . -name "sample_problem*_partall.txt"`
do
l=`echo $i | sed 's/sample/extracted_col_/'`
`awk '{print $5, $10}' $i > $l`
done
And:
paste extracted_col_problem1_partall.txt \
extracted_col_problem2_partall.txt \
extracted_col_problem3_partall.txt > \
extracted_col_problemall_partall.txt
It works fine with a few files, but it's a crazy method when the number of files is large (over 4000).
Could anyone help me with simpler solutions that are capable of dealing with multiple files, please?
Thanks!
Here's one way using awk and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
Use a C-style loop to iterate through the array, printing each of the arrays values.
For only ~4000 files, you should be able to do:
find . -name sample_problem*_part*.txt | xargs paste
If find is giving names in the wrong order, pipe it to sort:
find . -name sample_problem*_part*.txt | sort ... | xargs paste
# print filenames in sorted order
find -name sample\*.txt | sort |
# extract 5-th column from each file and print it on a single line
xargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |
# transpose
python transpose.py ?
where transpose.py:
#!/usr/bin/env python
"""Write lines from stdin as columns to stdout."""
import sys
from itertools import izip_longest
missing_value = sys.argv[1] if len(sys.argv) > 1 else '-'
for row in izip_longest(*[column.split() for column in sys.stdin],
fillvalue=missing_value):
print " ".join(row)
Output
1 8 a
2 9 b
3 10 c
4 11 d
5 ? e
6 ? f
? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?').
Try this one. My script assumes that every file has the same number of lines.
# get number of lines
lines=$(wc -l sample_problem1_part1.txt | cut -d' ' -f1)
for ((i=1; i<=$lines; i++)); do
for file in sample_problem*; do
# get line number $i and delete everything except the last column
# and then print it
# echo -n means that no newline is appended
echo -n $(sed -n ${i}'s%.*\ %%p' $file)" "
done
echo
done
This works. For 4800 files, each 7 lines long it took 2 minutes 57.865 seconds on a AMD Athlon(tm) X2 Dual Core Processor BE-2400.
PS: The time for my script increases linearly with the number of lines. It would take very long time to merge files with 1000 lines. You should consider learning awk and use the script from steve. I tested it: For 4800 files, each with 1000 lines it took only 65 seconds!
You can pass awk output to paste and redirect it to a new file as follows:
paste <(awk '{print $3}' file1) <(awk '{print $3}' file2) <(awk '{print $3}' file3) > file.txt
I would like to have your advice/help on how to subset a big file (millions of rows or lines).
For example,
(1)
I have big file (millions of rows, tab-delimited). I want to a subset of this file with only rows from 10000 to 100000.
(2)
I have big file (millions of columns, tab-delimited). I want to a subset of this file with only columns from 10000 to 100000.
I know there are tools like head, tail, cut, split, and awk or sed. I can use them to do simple subsetting. But, I do not know how to do this job.
Could you please give any advice? Thanks in advance.
Filtering rows is easy, for example with AWK:
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
Filtering columns is easier with CUT:
cat largefile | cut -d '\t' -f 10000-100000
As Rahul Dravid mentioned, cat is not a must here, and as Zsolt Botykai added you can improve performance using:
awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefile
cut -d '\t' -f 10000-100000 largefile
Some different solutions:
For row ranges:
In sed :
sed -n 10000,100000p somefile.txt
For column ranges in awk:
awk -v f=10000 -v t=100000 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt
For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.
tail -n +10000 largefile | head -n 90000 -
Was beaten to it for the sed solution, so I'll post a perl dito instead.
To print selected lines.
$ seq 100 | perl -ne 'print if $. >= 10 && $. <= 20'
10
11
12
13
14
15
16
17
18
19
20
To print selective columns, use
perl -lane 'print $F[1] .. $F[3] '
-F is used in conjunction with -a, to choose the delimiter on which to split lines.
To test, use seq and paste to get generate some columns
$ seq 50 | paste - - - - -
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45
46 47 48 49 50
Lets's print everything except the first and the last column
$ seq 50 | paste - - - - - | perl -lane 'print join " ", $F[1] .. $F[3]'
2 3 4
7 8 9
12 13 14
17 18 19
22 23 24
27 28 29
32 33 34
37 38 39
42 43 44
47 48 49
In the join statement above, there is a tab, you get it by doing a ctrl-v tab.