How to compare filenames in two text files on Linux bash?

How to compare filenames in two text files on Linux bash? - linux

I have two lists list1 and list2 with a filename on each line. I want a result with all filenames that are only in list2 and not in list1, regardless of specific file extensions (but not all). Using Linux bash, any commands that do not require any extra installations. In the example lists, I do know all file extensions that I wish to ignore. I made an attempt but it does not work at all, I don't know how to fix it. Apologies for my inexperience.
I wish to ignore the following extensions:
.x
.xy
.yx
.y
.jpg
list1.txt
text.x
example.xy
file.yx
data.y
edit
edit.jpg
list2.txt
text
rainbow.z
file
data.y
sunshine
edit.test.jpg
edit.random
result.txt
rainbow.z
sunshine
edit.test.jpg
edit.random
My try:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Edit: I forgot two requirements. The filenames can have . in them and not all filenames must have an extension. I know the extensions that must be ignored. I ammended the lists accordingly.

An awk solution might be more efficient for this task:
awk '
{ f=$0; sub(/\.(xy?|yx?|jpg)$/,"",f) }
NR==FNR { a[f]; next }
!(f in a)
' list1.txt list2.txt > result.txt

comm can do precisely this.
You can preprocess the input:
strip the suffices
sort (comm expects sorted input)
remove duplicates
ss()( sed 's/\.\(x\|xy\|yx\|y\|jpg\)$//' "$#" | sort -u )
comm -13 <(ss list1.txt) <(ss list2.txt) >result.txt
Your code was:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Some issues that immediately jump out:
syntax error - then/fi but no matching if
you never access list1
you don't quote variables when you use them, so whitespace and special characters will cause problems
while read ... sed ... sed ... sed ... is inefficient - multiple invocations of sed instead of just one, and a loop that sed would perform implicitly
sed expects file arguments not strings
sed -i will try to overwrite input file arguments
you use result.txt as both input and output to sed but never assign any contents to it
you try to use data ($line) as sed commands, instead of applying sed commands to that data
because you used single-quotes, sed -i -e '$line' will attempt to run a (non-existent) sed command line on the last line of input ($)
g option to s/// does nothing when search is anchored

I'd use join:
$ join -t. -j1 -v2 -o 2.1,2.2 <(sort list1.txt) <(sort list2.txt) | sed 's/\.$//'
rainbow.z
sunshine
(The bit of sed is needed to turn sunshine. into sunshine)

Related

How to apply my sed command to some lines of all my files?

I've 95 files that looks like :
2019-10-29-18-00/dev/xx;512.00;0.4;/var/x/xx/xxx
2019-10-29-18-00/dev/xx;512.00;0.68;/xx
2019-10-29-18-00/dev/xx;512.00;1.84;/xx/xx/xx
2019-10-29-18-00/dev/xx;512.00;80.08;/opt/xx/x
2019-10-29-18-00/dev/xx;20480.00;83.44;/var/x/x
2019-10-29-18-00/dev/xx;3584.00;840.43;/var/xx/x
2019-10-30-00-00/dev/xx;2048.00;411.59;/
2019-10-30-00-00/dev/xx;7168.00;6168.09;/usr
2019-10-30-00-00/dev/xx;3072.00;1036.1;/var
2019-10-30-00-00/dev/xx;5120.00;348.72;/tmp
2019-10-30-00-00/dev/xx;20480.00;2033.19;/home
2019-10-30-12-00;/dev/xx;5120.00;348.72;/tmp
2019-10-30-12-00;/dev/hd1;20480.00;2037.62;/home
2019-10-30-12-00;/dev/xx;512.00;0.43;/xx
2019-10-30-12-00;/dev/xx;3584.00;794.39;/xx
2019-10-30-12-00;/dev/xx;512.00;0.4;/var/xx/xx/xx
2019-10-30-12-00;/dev/xx;512.00;0.68;/xx
2019-10-30-12-00;/dev/xx;512.00;1.84;/var/xx/xx
2019-10-30-12-00;/dev/xx;512.00;80.08;/opt/xx/x
2019-10-30-12-00;/dev/xx;20480.00;83.44;/var/xx/xx
2019-10-30-12-00;/dev/x;3584.00;840.43;/var/xx/xx
For some lines I've 2019-10-29-18-00/dev and for some other lines, I've 2019-10-30-12-00;/dev/
I want to add the ; before the /dev/ where it is missing, so for that I use this sed command :
sed 's/\/dev/\;\/dev/'
But How I can apply this command for each lines where the ; is missing ? I try this :
for i in $(cat /home/xxx/xxx/xxx/*.txt | grep -e "00/dev/")
do
sed 's/\/dev/\;\/dev/' $i > $i
done
But it doesn't work... Can you help me ?

Could you please try following with GNU awkif you are ok with it.
awk -i inplace '/00\/dev\//{gsub(/00\/dev\//,"/00;/dev/")} 1' *.txt
sed solution: Tested with GNU sed for few files and it worked fine.
sed -i.bak '/00\/dev/s/00\/dev/00\;\/dev/g' *.txt

This might work for you (GNU sed & parallel):
parallel -q sed -i 's#;*/dev#;/dev#' ::: *.txt
or if you prefer:
sed -i 's#;*/dev#;/dev#' *.txt

Ignore lines with ;/dev.
sed '/;\/dev/{p;d}; s^/dev^;/dev^'
The /;\/dev/ check if the line has ;/dev. If it has ;/dev do: p - print the current line and d - start from the beginning.
You can use any character with s command in sed. Also, there is no need in escaping \;, just ;.
How I can apply this command for each lines where the ; is missing ? I try this
Don't edit the same file redirecting to the same file $i > $i. Think about it. How can you re-write and read from the same file at the same time? You can't, the resulting file will be in most cases empty, as the > $i will "execute" first making the file empty, then sed $i will start running and it will read an empty file. Use a temporary file sed ... "$i" > temp.txt; mv temp.txt "$i" or use gnu extension -i sed option to edit in place.
What you want to do really is:
grep -l '00/dev/' /home/xxx/xxx/xxx/*.txt |
xargs -n1 sed -i '/;\/dev/{p;d}; s^/dev^;/dev^'
grep -l prints list of files that match the pattern, then xargs for each single one -n1 of the files executes sed which -i edits files in place.

grep for filtering can be eliminated in your case, we can accomplish the task with a single sed command:
for f in $(cat /home/xxx/xxx/xxx/*.txt)
do
[[ -f "$f" ]] && sed -Ei '/00\/dev/ s/([^;])(\/dev)/\1;\2/' "$f"
done

The easiest way would be to adjust your regex so that it's looking a bit wider than '/dev/', e.g.
sed -i -E 's|([0-9])/dev|\1;/dev|'
(note that I'm taking advantage of sed's flexible approach to delimiters on substitute. Also, -E changes the group syntax)
Alternatively, sed lets you filter which lines it handles:
sed -i '/[0-9]\/dev/ s/\/dev/;/dev/'
This uses the same substitution you already have but only applied on lines that match the filter regex

How to move files where the first line contains a string?

I am currently using the following command:
grep -l -Z -E '.*?FindMyRegex' /home/user/folder/*.csv | xargs -0 -I{} mv {} /home/destination/folder
This works fine. The problem is it uses grep on the entire file.
I would like to use the grep command on the FIRST line of the file only.
I have tried to use head -1 file | at the beginning, but it did not work.

A change I would add to your script is -
for file in *.csv; do
head -1 "$file" | grep -l -Z -E '.*?FindMyRegex' | xargs -0 -I{} mv {} /home/destination/folder;
done

you can maybe try sed '1q' file.csv | grep ... to search the regexp only in the first line.

You don't need grep or find, as long as your files don't have embedded newlines.
I don't know an easy way off the top of my head to get sed to delimit with nulls.
mv $( for f in /home/user/folder/*.csv;
do sed -ns '1 { /yourPattern/F; q; }' $f;
done ) /home/destination/folder/
EDIT
Rewrote with a loop. This will run a separate instance of sed to check each file, but at least it shouldn't read beyond the first line. It will fail syntactically if there are no hits.
You might need -E depending on your regex.
-n says don't print records from the files.
-s says treat each file as a distinct input - this is so the filenames aren't always the first one.
This does require GNU sed for the F.

gawk 'FNR==1{if($0~/PATTERN/)
printf "mv %s %s\n",FILENAME, "/target";nextfile}' /path/*.csv
First of all, in your regex: .*?FindMyRegex the .*? doesn't make any sense, they could be removed.
The above awk (gawk) one-liner will build up mv file target command lines for you. You can check them, if you are satisfied with them, pipe the output to |sh , the commands are gonna be executed.
replace PATTERN by your regex pattern, and /target by the real target dir.
The one-liner is assuming that the filenames don't contain special chars (space i.e.), if it is the case, add "s to the mv cmd.

using GNU awk to find the filenames, pipe the filenames into xargs
gawk -v pattern="myRegex" '
FNR == 1 {if ($0 ~ pattern) printf "%s\0", FILENAME; nextfile}
' *.csv | xargs -0 echo mv -t destination
If it looks OK, remove "echo"

Try this Shellcheck-clean Bash code:
#! /bin/bash
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs match files whose names start with '.'
dest=/home/destination/folder
for file in *.csv ; do
head -n 1 -- "$file" | grep -qE '.*?FindMyRegex' && mv -- "$file" "$dest"
done
shopt -s nullglob prevents an error if there are no .csv files in the directory.
shopt -s dotglob ensures that files whose name starts with '.' are handled.
The -- in the options for head and mv ensures that files whose names begin with - are handled correctly.
The quotes in "$file" and "$dest" ensure that names that contain whitespace (actually $IFS) characters (including newlines) or glob metacharacters are handled correctly.
Note that the .*? in the reqular expression is probably redundant, and may not do what you think it does (grep -E doesn't do non-greedy matching).

How to to delete a line given with a variable in sed?

I am attempting to use sed to delete a line, read from user input, from a file whose name is stored in a variable. Right now all sed does is print the line and nothing else.
This is a code snippet of the command I am using:
FILE="/home/devosion/scripts/files/todo.db"
read DELETELINE
sed -e "$DELETELINE"'d' "$FILE"
Is there something I am missing here?
Edit: Switching out the -e option with -i fixed my woes!

You need to delimit the search.
#!/bin/bash
read -r Line
sed "/$Line/d" file
Will delete any line containing the typed input.
Bear in mind that sed matches on regex though and any special characters will be seen as such.
For example searching for 1* will actually delete lines containing any number of 1's not an actual 1 and a star.
Also bear in mind that when the variable expands, it cannot contain the delimiters or the command will break or have unexpexted results.
For example if "$Line" contained "/hello" then the sed command will fail with
sed: -e expression #1, char 4: extra characters after command.
You can either escape the / in this case or use different delimiters.
Personally i would use awk for this
awk -vLine="$Line" '!index($0,Line)' file
Which searches for an exact string and has none of the drawbacks of the sed command.

You might have success with grep instead of sed
read -p "Enter a regex to remove lines: " filter
grep -v "$filter" "$file"
Storing in-place is a little more work:
tmp=$(mktemp)
grep -v "$filter" "$file" > "$tmp" && mv "$tmp" "$file"
or, with sponge (apt install moreutils)
grep -v "$filter" "$file" | sponge "$file"
Note: try to get out of the habit of using ALLCAPSVARS: one day you'll accidentally use PATH=... and then wonder why your script is broken.

I found this, it allows for a range deletion with variables:
#!/bin/bash
lastline=$(whatever you need to do to find the last line)` //or any variation
lines="1,$lastline"
sed -i "$lines"'d' yourfile
keeps it all one util.

Please try this :
sed -i "${DELETELINE}d" $FILE

Text formating - sed, awk, shell

I need some assistance trying to build up a variable using a list of exclusions in a file.
So I have a exclude file I am using for rsync that looks like this:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_
dcswebrdtEAR_weblogic_
I need to build up a string to be used as a variable to feed into egrep -v, so that I can use the same exclusion list for rsync as I do when egrep -v from a find -ls.
So I have created this so far to remove all "*" and "/" - and then when it sees certain special characters it escapes them:
cat exclude-list.supt | while read line
do
echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'
What I need the ouput too look like is this and then export that as a variable:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver\_weblogic\_|dcswebrdtEAR\_weblogic\_"
Can anyone help?

A few issues with the following:
cat exclude-list.supt | while read line
do
echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'
Sed reads files line by line so cat | while read line;do echo $line | sed is completely redundant also sed can do multiple substitutions by either passing them as a comma separated list or using the -e option so piping to sed three times is two too many. A problem with '[.-+_]' is the - is between . and + so it's interpreted as a range .-+ when using - inside a character class put it at the end beginning or end to lose this meaning like [._+-].
A much better way:
$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file
\.log
\.out
\.csv
logs
shared
tracing
jdk
8\.6\_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
lost\+found
dlxwhsr
regression
tmp
working
investigation
Investigation
dcsserver\_weblogic\_
dcswebrdtEAR\_weblogic\_
Now we can pipe through tr '\n' '|' to replace the newlines with pipes for the alternation ready for egrep:
$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|"
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...
$ EXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|")
$ echo $EXCLUDE
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...
Note: If your file ends with a newline character you will want to remove the final trailing |, try sed 's/\(.*\)|/\1/'.

This might work for you (GNU sed):
SEXCLUDE_supt=$(sed '1h;1!H;$!d;g;s/[*\/]//g;s/\([.-+_]\)/\\\1/g;s/\n/|/g' file)

This should work but I guess there are better solutions. First store everything in a bash array:
SEXCLUDE_supt=$( sed -e 's/\*//g' -e 's/\///g' -e 's/\([.-+_]\)/\\\1/g' exclude-list.supt)
and then process it again to substitute white space:
SEXCLUDE_supt=$(echo $SEXCLUDE_supt |sed 's/\s/|/g')

How can I prepend a string to the beginning of each line in a file?

I have the following bash code which loops through a text file, line by line .. im trying to prefix the work 'prefix' to each line but instead am getting this error:
rob#laptop:~/Desktop$ ./appendToFile.sh stusers.txt kp
stusers.txt
kp
./appendToFile.sh: line 11: /bin/sed: Argument list too long
115000_210org#house.com,passw0rd
This is the bash script ..
#!/bin/bash
file=$1
string=$2
echo "$file"
echo "$string"
for line in `cat $file`
do
sed -e 's/^/prefix/' $line
echo "$line"
done < $file
What am i doing wrong here?
Update:
Performing head on file dumps all the lines onto a single line of the terminal, probably related?
rob#laptop:~/Desktop$ head stusers.txt
rob#laptop:~/Desktop$ ouse.com,passw0rd

a one-line awk command should do the trick also:
awk '{print "prefix" $0}' file

Concerning your original error:
./appendToFile.sh: line 11: /bin/sed: Argument list too long
The problem is with this line of code:
sed -e 's/^/prefix/' $line
$line in this context is file name that sed is running against. To correct your code you should fix this line as such:
echo $line | sed -e 's/^/prefix/'
(Also note that your original code should not have the < $file at the end.)
William Pursell addresses this issue correctly in both of his suggestions.
However, I believe you have correctly identified that there is an issue with your original text file. dos2unix will not correct this issue, as it only strips the carriage returns Windows sticks on the end of lines. (However, if you are attempting to read a Linux file in Windows, you would get a mammoth line with no returns.)
Assuming that it is not an issue with the end of line characters in your text file, William Pursell's, Andy Lester's, or nullrevolution's answers will work.
A variation on the while read... suggestion:
while read -r line; do echo "PREFIX " $line; done < $file
This could be run directly from the shell (no need for a batch / script file):
while read -r line; do echo "kp" $line; done < stusers.txt

The entire loop can be replaced by a single sed command that operates on the entire file:
sed -e 's/^/prefix/' $file

A Perl way to do it would be:
perl -p -e's/^/prefix' filename
or
perl -p -e'$_ = "prefix $_"' filename
In either case, that reads from filename and prints the prefixed lines to STDOUT.
If you add a -i flag, then Perl will modify the file in place. You can also specify multiple filenames and Perl will magically do all of them.

Instead of the for loop, it is more appropriate to use while read...:
while read -r line; do
do
echo "$line" | sed -e 's/^/prefix/'
done < $file
But you would be much better off with the simpler:
sed -e 's/^/prefix/' $file

Use sed. Just change the word prefix.
sed -e 's/^/prefix/' file.ext
If you want to save the output in another file
sed -e 's/^/prefix/' file.ext > file_new.ext

You don't need sed, just concatenate the strings in the echo command
while IFS= read -r line; do
echo "prefix$line"
done < filename
Your loop iterates over each word in the file:
for line in `cat file`; ...

sed -i '1a\
Your Text' file1 file2 file3

A solution without sed/awk and while loops:
xargs -n1 printf "$prefix%s\n" < "$file"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to compare filenames in two text files on Linux bash? - linux

An awk solution might be more efficient for this task: awk ' { f=$0; sub(/\.(xy?|yx?|jpg)$/,"",f) } NR==FNR { a[f]; next } !(f in a) ' list1.txt list2.txt > result.txt

I'd use join: $ join -t. -j1 -v2 -o 2.1,2.2 <(sort list1.txt) <(sort list2.txt) | sed 's/\.$//' rainbow.z sunshine (The bit of sed is needed to turn sunshine. into sunshine)

Related

How to apply my sed command to some lines of all my files?

How to move files where the first line contains a string?

How to to delete a line given with a variable in sed?

Text formating - sed, awk, shell

How can I prepend a string to the beginning of each line in a file?

Categories

Resources