reverse diff functionality - linux

i have two files a.txt and b.txt
I would like to report to output ONLY when file are same ?
so:
if file same -> report
if file different -> do not report anything
I know that in diff there is a -s option which report when file are the same but when file are different it will report as well (and I want to not report when files are different)
oh one more think I am not able to install anything additional

You tagged your question linux and batch-file, which is contradictory. Here is a batch-file solution:
fc file1 file2 >nul && (echo same) || (echo different)
"to not report when files are different", just skip the || (echo different) part

Chances are, if you have diff available, you will have grep available too. So pipe the diff output through grep to check which result you get, and act accordingly. diff -qs will output "Files a.txt and b.txt are identical" or "Files a.txt and b.txt differ". So you can check for the presence of "identical" in the output to find your case.
if diff -qs a.txt b.txt | grep -q identical; then
echo "Files are identical. Reporting"
else
# Do nothing
fi
Or as a oneliner:
(diff -qs a.txt b.txt | grep -q identical) && echo "Files are identical."

Related

Bash script to move first N files with specific name

I'm trying to move only 100 files with a specific extensions (from the current directory to the parent directory), but the following attempt of mine does not work
for file in $(ls -U | grep *.txt | tail -100)
do
mv $file ../
done
Can you point me to the correct approach?
Since you didn't quote *.txt, the shell expanded it to all the filenames ending in .txt. So your command is something like:
ls -U | grep file1.txt file2.txt file3.txt ... | tail -100
Since grep has filename arguments, it ignores its standard input. It outputs all the lines matching file1.txt in the remaining files. There's probably no matches, so nothing is piped to tail -100. And even if there were matches, the output would be the lines from the files, not filenames, so it wouldn't be useful for the mv command.
You can loop over the filenames directly, and use a counter variable to stop after 100 files.
counter=0
for file in *.txt
do
if (( counter >= 100 ))
then break
fi
mv "$file" ../
((counter++))
done
This avoids the pitfalls of parsing the output of ls.
this will do the job:
ls -U *.txt | tail -100 | while read filename; do mv "$filename" ../; done
while read filename respect spaces in the filename.
Run this in the text file directory:
#!/bin/bash
for txt_file in ./*.txt; do
((c++==100)) && break
mv "$txt_file" ../
done

Linux: Overwrite all files in folder with specified data?

I have a question, can linux overwrite all files in specified folder with data?
I have multiple files in folder:
file1.mp4 file2.mp3, file3.sh, file4.jpg
Both have some data (music, videos.. etc)
And I want to automatically overwrite these files with custom data (for example dummy file)
You can use tee
tee - read from standard input and write to standard output and files
$ echo "writing to file" > file1
$ echo "writing something else to all files" | tee file1 file2 file3
$ head *
==> file1 <==
writing something else to all files
==> file2 <==
writing something else to all files
==> file3 <==
writing something else to all files
With cat command:
for f in folder/*; do cat dummyfile > "$f"; done

Concatenation of huge number of selective files from a directory in Shell

I have more than 50000 files in a directory such as file1.txt, file2.txt, ....., file50000.txt. I would like to concatenate of some files whose file numbers are listed in the following text file (need.txt).
need.txt
1
4
35
45
71
.
.
.
I tried with the following. Though it is working, but I look for more simpler and short way.
n1=1
n2=$(wc -l < need.txt)
while [ $n1 -le $n2 ]
do
f1=$(awk 'NR=="$n1" {print $1}' need.txt)
cat file$f1.txt >> out.txt
(( n1++ ))
done
This might also work for you:
sed 's/.*/file&.txt/' < need.txt | xargs cat > out.txt
Something like this should work for you:
sed -e 's/.*/file&.txt/' need.txt | xargs cat > out.txt
It uses sed to translate each line into the appropriate file name and then hands the filenames to xargs to hand them to cat.
Using awk it could be done this way:
awk 'NR==FNR{ARGV[ARGC]="file"$1".txt"; ARGC++; next} {print}' need.txt > out.txt
Which adds each file to the ARGV array of files to process and then prints every line it sees.
It is possible do it without any sed or awk command. Directly using bash built-in functions and cat (of course).
for i in $(cat need.txt); do cat file${i}.txt >> out.txt; done
And as you want, it is quite simple.

save the output of a bash file

i have some files in a folder, and i need the first line of each folder
transaction1.csv
transaction2.csv
transaction3.csv
transaction4.csv
and i have the next code
#All folders that begin with the word transaction
folder='"transaction*"'
ls `echo $folder |sed s/"\""/\/g` >testFiles
# The number of lines of testFiles that is the number of transaction files
num=`cat testFiles | wc -l`
for i in `seq 1 $num`
do
#The first transaction file
b=`cat testFiles | head -1`
#The first line of the first transaction file
cat `echo $b` | sed -n 1p
#remove the first line of the testFiles
sed -i '1d' testFiles
done
This code works, the problem is that i need save the first line of each file in a file
and if i change the line:
cat `echo $b` | sed -n 1p > salida
it not works =(
In bash:
for file in *.csv; do head -1 "$file" >> salida; done
As Adam mentioned in the comment this has an overhead of opening the file each time through the loop. If you need better performance and reliability use the following:
for file in *.csv; do head -1 "$file" ; done > salida
head -qn1 *.csv
head -n1 will print the first line of each file, and -q will suppress the header when more than one file is given on the command-line.
=== Edit ===
If the files are not raw text (for example, if they're compressed with "bzip2" as mentinoned in your comment) and you need to do some nontrivial preprocessing on each file, you're probably best off going with a for loop. For example:
for f in *.csv.bz2 ; do
bzcat "$f" | head -n1
done > salida
(Another option would be to bunzip2 the files and then head them in two steps, such as bunzip2 *.csv.bz2 && head -qn1 *.csv > salida; however, this will of course change the files in place by decompressing them, which is probably undesirable.)
this awk one-liner should do what you want:
awk 'FNR==1{print > "output"}' *.csv
the first line of each csv will be saved into file: output
Using sed:
for f in *.csv; do sed -n "1p" "$f"; done >salida

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul
The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0
A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done
you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Resources