I have the following code to merge several tables in Linux based on the first column. But I'm looking for codes to put several tables side by side not based on column or row. I need to have all columns and rows in each table side by side or beneath each other setting.
For example, if I have these three tables:
AA
BB
CC
25
40
20
13
36
19
DD
EE
16
35
17
30
FF
GG
15
35
17
38
So I would want this resulting table:
AA
BB
CC
DD
EE
FF
GG
25
40
20
16
35
15
35
13
36
19
17
30
17
38
I appreciate it if you can help me.
LANG=en_EN sort AFGEN_2018.txt | sed 's/ */\t/g' | cut -f 1 > tmp.tmp
for f in `ls results/*.txt`
do
join tmp.tmp $f > tmpf
mv tmpf tmp.tmp
done
mv tmp.tmp GSN_ALL.txt
cat GSN_ALL.txt
done
I found this code helpful in putting tables below each other (where *.txt is table 1, 2 and 3).
cat *.txt >Table.txt
Related
I have the following folder structure:
A/B/C/D/E/00
A/B/C/D/E/01
.
.
A/B/C/D/E/23
Similarly,
M/N/O/P/Q/00
M/N/O/P/Q/01
.
.
M/N/O/P/Q/23
Now, each folder from 00 to 23 has many files inside, which I would like to count.
If I run this simple command:
ls /A/B/C/D/E/00 | wc -l
I can get the count of files in each of these sub directories. I want to automate this or get it iteratively.
Also, the final output I am looking at is a file that should look like this:
C E RESULT OF ls /A/B/C/D/E/00 | wc -l RESULT OF ls /A/B/C/D/E/01 | wc -l
M Q RESULT OF ls /M/N/O/P/Q/00 | wc -l RESULT OF ls /M/N/O/P/Q/01 | wc -l
So, the output should look like this finally
C E 23 23 4 6 7 4 76 98 57 2 67 9 12 34 67 0 2 3 78 98 12 3 57 213
M Q 12 10 2 34 32 1 35 65 87 8 32 2 65 87 98 0 4 12 1 35 34 76 9 67
Please note, the values after the alphabets are the values of file counts of the 24 folders 00, 01 through 23.
Using the eval approach: I can hardcode and get the exact results. But, I wanted it in a way that would show me the data for the previous day. So this is what I did:
d=`date --date ="1 days ago" +%Y%m%d`
month= `date +%Y%m`
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{20150800..20150823})'| wc -l)"'
This works perfectly because in the given location there are files inside child directories 20150800,20150801..20150823. However when I try to generalize this like below, it shows no such file or directory:
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{"$d"00.."$d"23})'| wc -l)"'
Is there something I am missing in the above line?
A very safe way of counting files:
find . -mindepth 1 -exec printf x \; | wc -c
To not count recursively add -maxdepth 1 before -exec.
Some other notes:
eval is evil. Don't use it. There is only one place I've ever seen where it's appropriate, and that's when using getopt.
You should not parse the output of ls.
Use $() for command substitutions.
I have the following folder structure:
A/B/C/D/E/00
A/B/C/D/E/01
.
.
A/B/C/D/E/23
Similarly,
M/N/O/P/Q/00
M/N/O/P/Q/01
.
.
M/N/O/P/Q/23
Now, each folder from 00 to 23 has many files inside, which I would like to count.
If I run this simple command:
ls /A/B/C/D/E/00 | wc -l
I can get the count of files in each of these sub directories. I want to automate this or get it iteratively. Can anyone suggest a way?
Also, the final output I am looking at is a file that should look like this:
C E RESULT OF ls /A/B/C/D/E/00 | wc -l RESULT OF ls /A/B/C/D/E/01 | wc -l
M Q RESULT OF ls /M/N/O/P/Q/00 | wc -l RESULT OF ls /M/N/O/P/Q/01 | wc -l
So, the output should look like this finally
C E 23 23 4 6 7 4 76 98 57 2 67 9 12 34 67 0 2 3 78 98 12 3 57 213
M Q 12 10 2 34 32 1 35 65 87 8 32 2 65 87 98 0 4 12 1 35 34 76 9 67
Please note, the values after the alphabets are the values of file counts of the 24 folders 00, 01 through 23.
Using the eval approach: I can hardcode and get the exact results. But, I wanted it in a way that would show me the data for the previous day. So this is what I did:
d=`date --date ="1 days ago" +%Y%m%d`
month= `date +%Y%m`
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{20150800..20150823})'| wc -l)"'
This works perfectly because in the given location there are files inside child directories 20150800,20150801..20150823. However when I try to generalize this like below, it gives me the total count of the folder instead of the count of each sub folder:
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{"$d"00.."$d"23})'| wc -l)"'
Something like this (not tested):
for d in [A-Z]/[A-Z]/[A-Z]/[A-Z]/[A-Z]/[0-9][0-9]
do
[[ -d $d ]] && echo $d : $(ls $d|wc -l)
done
Note that this gives an inccorect line count if one of the file names contains a newline character.
I have a file test.lst whose contents are like below.
Using CYGWIN_NT-6.1-WOW64.
I need to select only those lines which do not end with 5.
12
23
45
56
45
23
09
12
99
100
0000
9999999
The output should be:
12
23
56
23
09
12
99
100
0000
9999999
with grep -v '5$' test.txt, I am getting below:
[2014-11-28 17:42.57] /drives/d/Shantanu/MyScript
[463615.PC172645] ➤ grep -v '5$' test.txt
12
23
45
56
45
23
09
12
99
100
0000
9999999
[2014-11-28 17:43.21]
Just grep out them:
grep -v '5$' file
This looks for lines ending with 5 ($ refers to the end of line). Then -v inverts the match.
For your input it returns:
12
23
56
23
09
12
99
100
0000
9999999
You could use inverse grep search or inverse search using sed like below:
sed -n '/5$/!p' test.txt
or
grep -v "5$" test.txt
You could use a negated character class.
grep '[^5]$' file
[^5]$ matches the last character at the last which wouldn't be a number 5. By default grep would print all the lines which has a match.
I only want to sort a file by the second character in the second column by the number order.
the sample file like this:
aa 19
aa 189
aa 167
ab 13
nd 23
at 32
ca 90
I expect the result like
ca 90
at 32
ab 13
nd 23
aa 167
aa 189
aa 19
I use the command sort -n -k 2.2,2.2 [filename].
But it shows me the result like this:
aa 167
aa 189
aa 19
ab 13
nd 23
at 32
ca 90
It is not the right answer. Does anybody know what's wrong with my command?
The problem is that you didn't specify the correct column delimiter, and sort assumes it's a tab instead of a space.
sort -t ' ' -nk 2.2
works just fine.
Edit: in my man page it says that any whitespace is counted as delimiter by default, but the fact is that adding -t ' ' solves it.
sort -t ' ' -k2.2,2.2 filename
I would like to have your advice/help on how to subset a big file (millions of rows or lines).
For example,
(1)
I have big file (millions of rows, tab-delimited). I want to a subset of this file with only rows from 10000 to 100000.
(2)
I have big file (millions of columns, tab-delimited). I want to a subset of this file with only columns from 10000 to 100000.
I know there are tools like head, tail, cut, split, and awk or sed. I can use them to do simple subsetting. But, I do not know how to do this job.
Could you please give any advice? Thanks in advance.
Filtering rows is easy, for example with AWK:
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
Filtering columns is easier with CUT:
cat largefile | cut -d '\t' -f 10000-100000
As Rahul Dravid mentioned, cat is not a must here, and as Zsolt Botykai added you can improve performance using:
awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefile
cut -d '\t' -f 10000-100000 largefile
Some different solutions:
For row ranges:
In sed :
sed -n 10000,100000p somefile.txt
For column ranges in awk:
awk -v f=10000 -v t=100000 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt
For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.
tail -n +10000 largefile | head -n 90000 -
Was beaten to it for the sed solution, so I'll post a perl dito instead.
To print selected lines.
$ seq 100 | perl -ne 'print if $. >= 10 && $. <= 20'
10
11
12
13
14
15
16
17
18
19
20
To print selective columns, use
perl -lane 'print $F[1] .. $F[3] '
-F is used in conjunction with -a, to choose the delimiter on which to split lines.
To test, use seq and paste to get generate some columns
$ seq 50 | paste - - - - -
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45
46 47 48 49 50
Lets's print everything except the first and the last column
$ seq 50 | paste - - - - - | perl -lane 'print join " ", $F[1] .. $F[3]'
2 3 4
7 8 9
12 13 14
17 18 19
22 23 24
27 28 29
32 33 34
37 38 39
42 43 44
47 48 49
In the join statement above, there is a tab, you get it by doing a ctrl-v tab.