print the count of files for each sub folder iterativly - linux

I have the following folder structure:
A/B/C/D/E/00
A/B/C/D/E/01
.
.
A/B/C/D/E/23
Similarly,
M/N/O/P/Q/00
M/N/O/P/Q/01
.
.
M/N/O/P/Q/23
Now, each folder from 00 to 23 has many files inside, which I would like to count.
If I run this simple command:
ls /A/B/C/D/E/00 | wc -l
I can get the count of files in each of these sub directories. I want to automate this or get it iteratively.
Also, the final output I am looking at is a file that should look like this:
C E RESULT OF ls /A/B/C/D/E/00 | wc -l RESULT OF ls /A/B/C/D/E/01 | wc -l
M Q RESULT OF ls /M/N/O/P/Q/00 | wc -l RESULT OF ls /M/N/O/P/Q/01 | wc -l
So, the output should look like this finally
C E 23 23 4 6 7 4 76 98 57 2 67 9 12 34 67 0 2 3 78 98 12 3 57 213
M Q 12 10 2 34 32 1 35 65 87 8 32 2 65 87 98 0 4 12 1 35 34 76 9 67
Please note, the values after the alphabets are the values of file counts of the 24 folders 00, 01 through 23.
Using the eval approach: I can hardcode and get the exact results. But, I wanted it in a way that would show me the data for the previous day. So this is what I did:
d=`date --date ="1 days ago" +%Y%m%d`
month= `date +%Y%m`
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{20150800..20150823})'| wc -l)"'
This works perfectly because in the given location there are files inside child directories 20150800,20150801..20150823. However when I try to generalize this like below, it shows no such file or directory:
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{"$d"00.."$d"23})'| wc -l)"'
Is there something I am missing in the above line?

A very safe way of counting files:
find . -mindepth 1 -exec printf x \; | wc -c
To not count recursively add -maxdepth 1 before -exec.
Some other notes:
eval is evil. Don't use it. There is only one place I've ever seen where it's appropriate, and that's when using getopt.
You should not parse the output of ls.
Use $() for command substitutions.

Related

How to put tables side by side in linux

I have the following code to merge several tables in Linux based on the first column. But I'm looking for codes to put several tables side by side not based on column or row. I need to have all columns and rows in each table side by side or beneath each other setting.
For example, if I have these three tables:
AA
BB
CC
25
40
20
13
36
19
DD
EE
16
35
17
30
FF
GG
15
35
17
38
So I would want this resulting table:
AA
BB
CC
DD
EE
FF
GG
25
40
20
16
35
15
35
13
36
19
17
30
17
38
I appreciate it if you can help me.
LANG=en_EN sort AFGEN_2018.txt | sed 's/ */\t/g' | cut -f 1 > tmp.tmp
for f in `ls results/*.txt`
do
join tmp.tmp $f > tmpf
mv tmpf tmp.tmp
done
mv tmp.tmp GSN_ALL.txt
cat GSN_ALL.txt
done
I found this code helpful in putting tables below each other (where *.txt is table 1, 2 and 3).
cat *.txt >Table.txt

Illogical number priority in file names in BASH [duplicate]

This question already has answers here:
How to loop over files in natural order in Bash?
(7 answers)
Closed 1 year ago.
It so happens that I wrote a script in BASH, part of which is supposed to take files from a specified directory in numerical order. Obviously, files in that directory are named as follows: 1, 2, 3, 4, 5, etc. The thing is, I discovered that while running this script with 10 files in the directory, something that appears quite illogical to me, occurs, as the script takes files in strange order: 10, 1, 2, 3, etc.
How do I make it run from minimum value of name of a file to maximum in decimals?
Also, I am using the following line of code to define loop and path:
for file in /dir/*
Don't know if it matters, but I'm using Fedora 33 as OS.
Directories are sorted by alphabetical order. So "10" is before "2".
If I list 20 files whose names correspond to the 20 first integers, I get:
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
I can call the function 'sort -n' so I'll sort them numerically rather than alphabetically. The following command:
for i in $(ls | sort -n) ; do echo $i ; done
produces the following output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
i.e. your command:
for file in /dir/*
should be rewritten:
for file in "dir/"$(ls /dir/* | sort -n)
If you have GNU sort then use the -V flag.
for file in /dir/* ; do echo "$file" ; done | sort -V
Or store the data in an array.
files=(/dir/*); printf '%s\n' "${files[#]}" | sort -V
As an aside, if you have the option and work once ahead of time is preferable to sorting every time, you could also format the names of your directories with leading zeroes. This is frequently a better design when possible.
I made both for some comparisons.
$: echo [0-9][0-9]/ # perfect list based on default string sort
00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/ 16/ 17/ 18/ 19/ 20/
That also filters out any non-numeric names, and any non-directories.
$: for d in [0-9][0-9]/; do echo "${d%/}"; done
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
If I show both single- and double-digit versions (I made both)
$: shopt -s extglob
$: echo #(?|??)
0 00 01 02 03 04 05 06 07 08 09 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
Only the single-digit versions without leading zeroes get out of order.
The shell sorts the names by the locale order (not necessarily the byte value) of each individual character. Anything that starts with 1 will go before anything that starts with 2, and so on.
There's two main ways to tackle your problem:
sort -n (numeric sort) the file list, and iterate that.
Rename or recreate the target files (if you can), so all numbers are the same length (in bytes/characters). Left pad shorter numbers with 0 (eg. 01). Then they'll expand like you want.
Using sort (properly):
mapfile -td '' myfiles <(printf '%s\0' * | sort -zn)
for file in "${myfiles[#]}"; do
# what you were going to do
sort -z for zero/null terminated lines is common but not posix. It makes processing paths/data that contains new lines safe. Without -z:
mapfile -t myfiles <(printf '%s\n' * | sort -n)
# Rest is the same.
Rename the target files:
#!/bin/bash
cd /path/to/the/number/files || exit 1
# Gets length of the highest number. Or you can just hardcode it.
length=$(printf '%s\n' * | sort -n | tail -n 1)
length=${#length}
for i in *; do
mv -n "$i" "$(printf "%.${length}d" "$i")"
done
Examples for making new files with zero padded numbers for names:
touch {000..100} # Or
for i in {000..100}; do
> "$i"
done
If it's your script that made the target files, something like $(printf %.Nd [file]) can be used to left pad the names before you write to them. But you need to know the length in characters of the highest number first (N).

Print the count of files in a specific format iteratively in shell script

I have the following folder structure:
A/B/C/D/E/00
A/B/C/D/E/01
.
.
A/B/C/D/E/23
Similarly,
M/N/O/P/Q/00
M/N/O/P/Q/01
.
.
M/N/O/P/Q/23
Now, each folder from 00 to 23 has many files inside, which I would like to count.
If I run this simple command:
ls /A/B/C/D/E/00 | wc -l
I can get the count of files in each of these sub directories. I want to automate this or get it iteratively. Can anyone suggest a way?
Also, the final output I am looking at is a file that should look like this:
C E RESULT OF ls /A/B/C/D/E/00 | wc -l RESULT OF ls /A/B/C/D/E/01 | wc -l
M Q RESULT OF ls /M/N/O/P/Q/00 | wc -l RESULT OF ls /M/N/O/P/Q/01 | wc -l
So, the output should look like this finally
C E 23 23 4 6 7 4 76 98 57 2 67 9 12 34 67 0 2 3 78 98 12 3 57 213
M Q 12 10 2 34 32 1 35 65 87 8 32 2 65 87 98 0 4 12 1 35 34 76 9 67
Please note, the values after the alphabets are the values of file counts of the 24 folders 00, 01 through 23.
Using the eval approach: I can hardcode and get the exact results. But, I wanted it in a way that would show me the data for the previous day. So this is what I did:
d=`date --date ="1 days ago" +%Y%m%d`
month= `date +%Y%m`
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{20150800..20150823})'| wc -l)"'
This works perfectly because in the given location there are files inside child directories 20150800,20150801..20150823. However when I try to generalize this like below, it gives me the total count of the folder instead of the count of each sub folder:
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{"$d"00.."$d"23})'| wc -l)"'
Something like this (not tested):
for d in [A-Z]/[A-Z]/[A-Z]/[A-Z]/[A-Z]/[0-9][0-9]
do
[[ -d $d ]] && echo $d : $(ls $d|wc -l)
done
Note that this gives an inccorect line count if one of the file names contains a newline character.

How to identify lines ending with 5 in a file

I have a file test.lst whose contents are like below.
Using CYGWIN_NT-6.1-WOW64.
I need to select only those lines which do not end with 5.
12
23
45
56
45
23
09
12
99
100
0000
9999999
The output should be:
12
23
56
23
09
12
99
100
0000
9999999
with grep -v '5$' test.txt, I am getting below:
[2014-11-28 17:42.57] /drives/d/Shantanu/MyScript
[463615.PC172645] ➤ grep -v '5$' test.txt
12
23
45
56
45
23
09
12
99
100
0000
9999999
[2014-11-28 17:43.21]
Just grep out them:
grep -v '5$' file
This looks for lines ending with 5 ($ refers to the end of line). Then -v inverts the match.
For your input it returns:
12
23
56
23
09
12
99
100
0000
9999999
You could use inverse grep search or inverse search using sed like below:
sed -n '/5$/!p' test.txt
or
grep -v "5$" test.txt
You could use a negated character class.
grep '[^5]$' file
[^5]$ matches the last character at the last which wouldn't be a number 5. By default grep would print all the lines which has a match.

How to extract one column from multiple files, and paste those columns into one file?

I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.
The file names look like:
sample_problem1_part1.txt
sample_problem1_part2.txt
sample_problem2_part1.txt
sample_problem2_part2.txt
sample_problem3_part1.txt
sample_problem3_part2.txt
......
Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines.
The content looks like:
sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7
sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14
sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g
The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)
1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...
I was using:
paste sample_problem1_part1.txt sample_problem1_part2.txt > \
sample_problem1_partall.txt
paste sample_problem2_part1.txt sample_problem2_part2.txt > \
sample_problem2_partall.txt
paste sample_problem3_part1.txt sample_problem3_part2.txt > \
sample_problem3_partall.txt
And then:
for i in `find . -name "sample_problem*_partall.txt"`
do
l=`echo $i | sed 's/sample/extracted_col_/'`
`awk '{print $5, $10}' $i > $l`
done
And:
paste extracted_col_problem1_partall.txt \
extracted_col_problem2_partall.txt \
extracted_col_problem3_partall.txt > \
extracted_col_problemall_partall.txt
It works fine with a few files, but it's a crazy method when the number of files is large (over 4000).
Could anyone help me with simpler solutions that are capable of dealing with multiple files, please?
Thanks!
Here's one way using awk and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
Use a C-style loop to iterate through the array, printing each of the arrays values.
For only ~4000 files, you should be able to do:
find . -name sample_problem*_part*.txt | xargs paste
If find is giving names in the wrong order, pipe it to sort:
find . -name sample_problem*_part*.txt | sort ... | xargs paste
# print filenames in sorted order
find -name sample\*.txt | sort |
# extract 5-th column from each file and print it on a single line
xargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |
# transpose
python transpose.py ?
where transpose.py:
#!/usr/bin/env python
"""Write lines from stdin as columns to stdout."""
import sys
from itertools import izip_longest
missing_value = sys.argv[1] if len(sys.argv) > 1 else '-'
for row in izip_longest(*[column.split() for column in sys.stdin],
fillvalue=missing_value):
print " ".join(row)
Output
1 8 a
2 9 b
3 10 c
4 11 d
5 ? e
6 ? f
? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?').
Try this one. My script assumes that every file has the same number of lines.
# get number of lines
lines=$(wc -l sample_problem1_part1.txt | cut -d' ' -f1)
for ((i=1; i<=$lines; i++)); do
for file in sample_problem*; do
# get line number $i and delete everything except the last column
# and then print it
# echo -n means that no newline is appended
echo -n $(sed -n ${i}'s%.*\ %%p' $file)" "
done
echo
done
This works. For 4800 files, each 7 lines long it took 2 minutes 57.865 seconds on a AMD Athlon(tm) X2 Dual Core Processor BE-2400.
PS: The time for my script increases linearly with the number of lines. It would take very long time to merge files with 1000 lines. You should consider learning awk and use the script from steve. I tested it: For 4800 files, each with 1000 lines it took only 65 seconds!
You can pass awk output to paste and redirect it to a new file as follows:
paste <(awk '{print $3}' file1) <(awk '{print $3}' file2) <(awk '{print $3}' file3) > file.txt

Resources