Row count of each file in a `.zip` folder - linux

I have one zip folder with 5 text files inside. I have to check the row count of each file without unzipping the zip folder.
I tried zcat file.zip | wc -l but it gives the count of the first file only.
Can you guys help me to get the result as mentioned below:
File_Name Rowcount
file1 100
file2 100
file3 100
file4 100
file5 100

If your file is a gzipped tar archive, then you can simply loop over each filename in the archive to get the number of lines in each. For example if you archive contains:
$ tar -tzf /tmp/tmp-david/zipfile.tar.gz
yon.c
yourmachinecode.s
zeroonect.c
zeros
zz
You can loop over the filenames with:
$ for i in $(tar -tzf /tmp/tmp-david/zipfile.tar.gz); do
printf "%8d lines - %s\n" $(wc -l <"$i") "$i"
done
61 lines - yon.c
5 lines - yourmachinecode.s
63 lines - zeroonect.c
0 lines - zeros
1 lines - zz
You can keep a sum and increment with each count as required to get the total.
If your file is a .zip (MS-DOS) archive, then you can do the same thing, but the parsing of the individual filenames from the output of unzip -l takes a bit more work, e.g.
$ unzip -l /tmp/tmp-david/zipfile.zip | grep -v '^-' | \
tail -n+3 | head -n-1 | awk '{print $4}'
(you would use the above as in your command substitution to drive the for loop)

Related

Pipe each row of csv into bash command [duplicate]

This question already has answers here:
Count all occurrences of a string in lots of files with grep
(16 answers)
Closed 25 days ago.
This post was edited and submitted for review 25 days ago and failed to reopen the post:
Original close reason(s) were not resolved
I have a single column CSV file with no header and I want to iteratively find the value of each row and count the number of times it appears in several files.
Something like this:
for i in file.csv:
zcat *json.gz | grep i | wc -l
However, I don't know how to iterate through the csv and pass the values forward
Imagine that file.csv is:
foo,
bar
If foo exists 20 times in *json.gz and bar exists 30 times in *json.gz, I would expect the output of my command to be:
20
30
Here is the solution I found:
while IFS=',' read -r column; do
count=$(zgrep -o "$column" *json.gz | wc -l)
echo "$column,$count"; done < file.csv
You can achieve that with a single grep operation treating file.csv as a patterns file (obtaining patterns one per line):
grep -f file.csv -oh *.json | wc -l
-o - to print only matched parts
-h - to suppress file names from the output
You can iterate through output of cat run through subprocess:
for i in `cat file.csv`: # iterates through all the rows in file.csv
do echo "My value is $i"; done;
using chatgpt :), try this:
#!/bin/bash
# Define the name of the CSV file
csv_file="path/to/file.csv"
# Extract the values from each row of the CSV file
values=$(cut -f1 "$csv_file" | uniq -c)
# Loop through each file
for file in path/to/file1 path/to/file2 path/to/file3
do
# Extract the values from each row of the file
file_values=$(cut -f1 "$file" | uniq -c)
# Compare the values and print the results
for value in $values
do
count=$(echo $value | cut -f1 -d' ')
val=$(echo $value | cut -f2 -d' ')
file_count=$(echo $file_values | grep -o "$val" | wc -l)
echo "$val appears $count times in $csv_file and $file_count times in $file"
done
done

Extracting the user with the most amount of files in a dir

I am currently working on a script that should receive a standard input, and output the user with the highest amount of files in that directory.
I've wrote this so far:
#!/bin/bash
while read DIRNAME
do
ls -l $DIRNAME | awk 'NR>1 {print $4}' | uniq -c
done
and this is the output I get when I enter /etc for an instance:
26 root
1 dip
8 root
1 lp
35 root
2 shadow
81 root
1 dip
27 root
2 shadow
42 root
Now obviously the root folder is winning in this case, but I don't want only to output this, i also want to sum the number of files and output only the user with the highest amount of files.
Expected output for entering /etc:
root
is there a simple way to filter the output I get now, so that the user with the highest sum will be stored somehow?
ls -l /etc | awk 'BEGIN{FS=OFS=" "}{a[$4]+=1}END{ for (i in a) print a[i],i}' | sort -g -r | head -n 1 | cut -d' ' -f2
This snippet returns the group with the highest number of files in the /etc directory.
What it does:
ls -l /etc lists all the files in /etc in long form.
awk 'BEGIN{FS=OFS=" "}{a[$4]+=1}END{ for (i in a) print a[i],i}' sums the number of occurrences of unique words in the 4th column and prints the number followed by the word.
sort -g -r sorts the output descending based on numbers.
head -n 1 takes the first line
cut -d' ' -f2 takes the second column while the delimiter is a white space.
Note: In your question, you are saying that you want the user with the highest number of files, but in your code you are referring to the 4th column which is the group. My code follows your code and groups on the 4th column. If you wish to group by user and not group, change {a[$4]+=1} to {a[$3]+=1}.
Without unreliable parsing the output of ls:
read -r dirname
# List user owner of files in dirname
stat -c '%U' "$dirname/" |
# Sort the list of users by name
sort |
# Count occurrences of user
uniq -c |
# Sort by higher number of occurrences numerically
# (first column numerically reverse order)
sort -k1nr |
# Get first line only
head -n1 |
# Keep only starting at character 9 to get user name and discard counts
cut -c9-
I have an awk script to read standard input (or command line files) and sum up the unique names.
summer:
awk '
{ sum[ $2 ] += $1 }
END {
for ( v in sum ) {
print v, sum[v]
}
}
' "$#"
Let's say we are using your example of /etc:
ls -l /etc | summer
yields:
0
dip 2
shadow 4
root 219
lp 1
I like to keep utilities general so I can reuse them for other purposes. Now you can just use sort and head to get the maximum result output by summer:
ls -l /etc | summer | sort -r -k2,2 -n | head -1 | cut -f1 -d' '
Yields:
root

How can I combine one file's tail with another's head?

I know how to take e.g the first 2 lines from a .txt data and appending it to the end of a .txt data. But how should I add the last 2 lines of a .txt data to the 1st line of a .txt data
I've tried :
tail -n 2 test1.txt >> head test1.txt # takes last 2 lines of text and adding
it to the head
Looks awfully wrong but I can't find the answer anywhere, doing it with tail and head.
tail n 2 test1.txt >> head test1.txt
cat test1.txt
Someone please correct my code so I get my expected result.
Just run the two commands one after the other -- the stdout resulting from doing so will be exactly the same as what you'd get by concatenating their output together, without needing to do an explicit/extra concatenation step:
tail -n 2 test1.txt
head -n 1 test1.txt
If you want to redirect their output together, put them in a brace group:
{
tail -n 2 test1.txt
head -n 1 test1.txt
} >out.txt
What about:
$ cat file1.txt
file 1 line 1
file 1 line 2
file 1 line 3
file 1 line 4
$ cat file2.txt
file 2 line 1
file 2 line 2
file 2 line 3
file 2 line 4
$ tail -n 2 file1.txt > output.txt
$ head -n 1 file2.txt >> output.txt
$ cat output.txt
file 1 line 3
file 1 line 4
file 2 line 1

bash remove the same in file

I have one issue with getting number different strings.
I have two files, for example :
file1 :
aaa1
aaa4
bbb3
ccc2
and
file2:
bbb3
ccc2
aaa4
How from this get value 1 (in this case aaa1 string reason)?
I have one query but it calculates not only different strings, them also takes into account the order of the rows.
diff file1 file2 | grep "<" | wc -l
Thanks.
You can use grep -v -c with other options as this:
grep -cvwFf file2 file1
1
Options used are:
-c - get the count of matches
-v - invert matches
-w - full word match (to avoid partial matches)
-F - fixed string match
-f - Use a file for matching patterns
As far as I understand your requirements, sorting the files prior to the diff is a quick solution:
sort file1 > file1.sorted
sort file2 > file2.sorted
diff file1.sorted file2.sorted | egrep "[<>]" | wc -l

wc -l is NOT counting last of the file if it does not have end of line character

I need to count all lines of an unix file. The file has 3 lines but wc -l gives only 2 count.
I understand that it is not counting last line because it does not have end of line character
Could any one please tell me how to count that line as well ?
grep -c returns the number of matching lines. Just use an empty string "" as your matching expression:
$ echo -n $'a\nb\nc' > 2or3.txt
$ cat 2or3.txt | wc -l
2
$ grep -c "" 2or3.txt
3
It is better to have all lines ending with EOL \n in Unix files. You can do:
{ cat file; echo ''; } | wc -l
Or this awk:
awk 'END{print NR}' file
This approach will give the correct line count regardless of whether the last line in the file ends with a newline or not.
awk will make sure that, in its output, each line it prints ends with a new line character. Thus, to be sure each line ends in a newline before sending the line to wc, use:
awk '1' file | wc -l
Here, we use the trivial awk program that consists solely of the number 1. awk interprets this cryptic statement to mean "print the line" which it does, being assured that a trailing newline is present.
Examples
Let us create a file with three lines, each ending with a newline, and count the lines:
$ echo -n $'a\nb\nc\n' >file
$ awk '1' f | wc -l
3
The correct number is found.
Now, let's try again with the last new line missing:
$ echo -n $'a\nb\nc' >file
$ awk '1' f | wc -l
3
This still provides the right number. awk automatically corrects for a missing newline but leaves the file alone if the last newline is present.
Respect
I respect the answer from John1024 and would like to expand upon it.
Line Count function
I find myself comparing line counts A LOT especially from the clipboard, so I have defined a bash function. I'd like to modify it to show the filenames and when passed more than 1 file a total. However, it hasn't been important enough for me to do so far.
# semicolons used because this is a condensed to 1 line in my ~/.bash_profile
function wcl(){
if [[ -z "${1:-}" ]]; then
set -- /dev/stdin "$#";
fi;
for f in "$#"; do
awk 1 "$f" | wc -l;
done;
}
Counting lines without the function
# Line count of the file
$ cat file_with_newline | wc -l
3
# Line count of the file
$ cat file_without_newline | wc -l
2
# Line count of the file unchanged by cat
$ cat file_without_newline | cat | wc -l
2
# Line count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -l
3
# Line count of the file changed by only the first call to awk
$ cat file_without_newline | awk 1 | awk 1 | awk 1 | wc -l
3
# Line count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -l
3
Counting characters (why you don't want to put a wrapper around wc)
# Character count of the file
$ cat file_with_newline | wc -c
6
# Character count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -c
6
# Character count of the file
$ cat file_without_newline | wc -c
5
# Character count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -c
6
Counting lines with the function
# Line count function used on stdin
$ cat file_with_newline | wcl
3
# Line count function used on stdin
$ cat file_without_newline | wcl
3
# Line count function used on filenames passed as arguments
$ wcl file_without_newline file_with_newline
3
3

Resources