Linux merging files - linux

I'm doing a linux online course but im stuck with a question, you can find the question below.
You will get three files called a.bf, b.bf and c.bf. Merge the contents of these three files and write it to a new file called abc.bf. Respect the order: abc.bf must contain the contents of a.bf first, followed by those of b.bf, followed by those of c.bf.
Example
Suppose the given files have the following contents:
a.bf contains +++.
b.bf contains [][][][].
c.bf contains <><><>.
The file abc.bf should then have
+++[][][][]<><><>
as its content.
I know how to merge the 3 files but when i use cat my output is:
+++
[][][]
<><><>
When i use paste my output is "+++ 'a lot of spaces' [][][][] 'a lot of spaces' <><><>"
My output that i need is +++[][][][]<><><>, i dont want the spaces between the content. Can someone help me?

What you want to do is delete the newline characters.
With tr:
cat {a,b,c}.bf | tr --delete '\n' > abc.bf
With echo & sed:
echo $(cat {a,b,c}.bf) | sed -E 's/ //g' > abc.bf
With xargs & sed:
<{a,b,c}.bf xargs | sed -E 's/ //g' > abc.bf
Note that sed is only used to remove the spaces.
With cat & sed:
cat {a,b,c}.bf | sed -z 's/\n//g'

echo -n "$(cat a.bf)$(cat b.bf)$(cat c.bf)" > abc.bf
echo -n will not output trailing newlines

Related

How to compare filenames in two text files on Linux bash?

I have two lists list1 and list2 with a filename on each line. I want a result with all filenames that are only in list2 and not in list1, regardless of specific file extensions (but not all). Using Linux bash, any commands that do not require any extra installations. In the example lists, I do know all file extensions that I wish to ignore. I made an attempt but it does not work at all, I don't know how to fix it. Apologies for my inexperience.
I wish to ignore the following extensions:
.x
.xy
.yx
.y
.jpg
list1.txt
text.x
example.xy
file.yx
data.y
edit
edit.jpg
list2.txt
text
rainbow.z
file
data.y
sunshine
edit.test.jpg
edit.random
result.txt
rainbow.z
sunshine
edit.test.jpg
edit.random
My try:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Edit: I forgot two requirements. The filenames can have . in them and not all filenames must have an extension. I know the extensions that must be ignored. I ammended the lists accordingly.
An awk solution might be more efficient for this task:
awk '
{ f=$0; sub(/\.(xy?|yx?|jpg)$/,"",f) }
NR==FNR { a[f]; next }
!(f in a)
' list1.txt list2.txt > result.txt
comm can do precisely this.
You can preprocess the input:
strip the suffices
sort (comm expects sorted input)
remove duplicates
ss()( sed 's/\.\(x\|xy\|yx\|y\|jpg\)$//' "$#" | sort -u )
comm -13 <(ss list1.txt) <(ss list2.txt) >result.txt
Your code was:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Some issues that immediately jump out:
syntax error - then/fi but no matching if
you never access list1
you don't quote variables when you use them, so whitespace and special characters will cause problems
while read ... sed ... sed ... sed ... is inefficient - multiple invocations of sed instead of just one, and a loop that sed would perform implicitly
sed expects file arguments not strings
sed -i will try to overwrite input file arguments
you use result.txt as both input and output to sed but never assign any contents to it
you try to use data ($line) as sed commands, instead of applying sed commands to that data
because you used single-quotes, sed -i -e '$line' will attempt to run a (non-existent) sed command line on the last line of input ($)
g option to s/// does nothing when search is anchored
I'd use join:
$ join -t. -j1 -v2 -o 2.1,2.2 <(sort list1.txt) <(sort list2.txt) | sed 's/\.$//'
rainbow.z
sunshine
(The bit of sed is needed to turn sunshine. into sunshine)

How to apply my sed command to some lines of all my files?

I've 95 files that looks like :
2019-10-29-18-00/dev/xx;512.00;0.4;/var/x/xx/xxx
2019-10-29-18-00/dev/xx;512.00;0.68;/xx
2019-10-29-18-00/dev/xx;512.00;1.84;/xx/xx/xx
2019-10-29-18-00/dev/xx;512.00;80.08;/opt/xx/x
2019-10-29-18-00/dev/xx;20480.00;83.44;/var/x/x
2019-10-29-18-00/dev/xx;3584.00;840.43;/var/xx/x
2019-10-30-00-00/dev/xx;2048.00;411.59;/
2019-10-30-00-00/dev/xx;7168.00;6168.09;/usr
2019-10-30-00-00/dev/xx;3072.00;1036.1;/var
2019-10-30-00-00/dev/xx;5120.00;348.72;/tmp
2019-10-30-00-00/dev/xx;20480.00;2033.19;/home
2019-10-30-12-00;/dev/xx;5120.00;348.72;/tmp
2019-10-30-12-00;/dev/hd1;20480.00;2037.62;/home
2019-10-30-12-00;/dev/xx;512.00;0.43;/xx
2019-10-30-12-00;/dev/xx;3584.00;794.39;/xx
2019-10-30-12-00;/dev/xx;512.00;0.4;/var/xx/xx/xx
2019-10-30-12-00;/dev/xx;512.00;0.68;/xx
2019-10-30-12-00;/dev/xx;512.00;1.84;/var/xx/xx
2019-10-30-12-00;/dev/xx;512.00;80.08;/opt/xx/x
2019-10-30-12-00;/dev/xx;20480.00;83.44;/var/xx/xx
2019-10-30-12-00;/dev/x;3584.00;840.43;/var/xx/xx
For some lines I've 2019-10-29-18-00/dev and for some other lines, I've 2019-10-30-12-00;/dev/
I want to add the ; before the /dev/ where it is missing, so for that I use this sed command :
sed 's/\/dev/\;\/dev/'
But How I can apply this command for each lines where the ; is missing ? I try this :
for i in $(cat /home/xxx/xxx/xxx/*.txt | grep -e "00/dev/")
do
sed 's/\/dev/\;\/dev/' $i > $i
done
But it doesn't work... Can you help me ?
Could you please try following with GNU awkif you are ok with it.
awk -i inplace '/00\/dev\//{gsub(/00\/dev\//,"/00;/dev/")} 1' *.txt
sed solution: Tested with GNU sed for few files and it worked fine.
sed -i.bak '/00\/dev/s/00\/dev/00\;\/dev/g' *.txt
This might work for you (GNU sed & parallel):
parallel -q sed -i 's#;*/dev#;/dev#' ::: *.txt
or if you prefer:
sed -i 's#;*/dev#;/dev#' *.txt
Ignore lines with ;/dev.
sed '/;\/dev/{p;d}; s^/dev^;/dev^'
The /;\/dev/ check if the line has ;/dev. If it has ;/dev do: p - print the current line and d - start from the beginning.
You can use any character with s command in sed. Also, there is no need in escaping \;, just ;.
How I can apply this command for each lines where the ; is missing ? I try this
Don't edit the same file redirecting to the same file $i > $i. Think about it. How can you re-write and read from the same file at the same time? You can't, the resulting file will be in most cases empty, as the > $i will "execute" first making the file empty, then sed $i will start running and it will read an empty file. Use a temporary file sed ... "$i" > temp.txt; mv temp.txt "$i" or use gnu extension -i sed option to edit in place.
What you want to do really is:
grep -l '00/dev/' /home/xxx/xxx/xxx/*.txt |
xargs -n1 sed -i '/;\/dev/{p;d}; s^/dev^;/dev^'
grep -l prints list of files that match the pattern, then xargs for each single one -n1 of the files executes sed which -i edits files in place.
grep for filtering can be eliminated in your case, we can accomplish the task with a single sed command:
for f in $(cat /home/xxx/xxx/xxx/*.txt)
do
[[ -f "$f" ]] && sed -Ei '/00\/dev/ s/([^;])(\/dev)/\1;\2/' "$f"
done
The easiest way would be to adjust your regex so that it's looking a bit wider than '/dev/', e.g.
sed -i -E 's|([0-9])/dev|\1;/dev|'
(note that I'm taking advantage of sed's flexible approach to delimiters on substitute. Also, -E changes the group syntax)
Alternatively, sed lets you filter which lines it handles:
sed -i '/[0-9]\/dev/ s/\/dev/;/dev/'
This uses the same substitution you already have but only applied on lines that match the filter regex

Extract strings in a text file using grep

I have file.txt with names one per line as shown below:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
....
I want to grep each of these from an input.txt file.
Manually i do this one at a time as
grep "ABCB8" input.txt > output.txt
Could someone help to automatically grep all the strings in file.txt from input.txt and write it to output.txt.
You can use the -f flag as described in Bash, Linux, Need to remove lines from one file based on matching content from another file
grep -o -f file.txt input.txt > output.txt
Flag
-f FILE, --file=FILE:
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.
for line in `cat text.txt`; do grep $line input.txt >> output.txt; done
Contents of text.txt:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
Edit:
A safer solution with while read:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done
Edit 2:
Sample text.txt:
ABCB8
ABCB8XY
ABCC12
Sample input.txt:
You were hired to do a job; we expect you to do it.
You were hired because ABCB8 you kick ass;
we expect you to kick ass.
ABCB8XY You were hired because you can commit to a rational deadline and meet it;
ABCC12 we'll expect you to do that too.
You're not someone who needs a middle manager tracking your mouse clicks
If You don't care about the order of lines, the quick workaround would be to pipe the solution through a sort | uniq:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done; cat output.txt | sort | uniq > output2.txt
The result is then in output.txt.
Edit 3:
cat text.txt | while read line; do grep "\<${line}\>" input.txt >> output.txt; done
Is that fine?

Text formating - sed, awk, shell

I need some assistance trying to build up a variable using a list of exclusions in a file.
So I have a exclude file I am using for rsync that looks like this:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_
dcswebrdtEAR_weblogic_
I need to build up a string to be used as a variable to feed into egrep -v, so that I can use the same exclusion list for rsync as I do when egrep -v from a find -ls.
So I have created this so far to remove all "*" and "/" - and then when it sees certain special characters it escapes them:
cat exclude-list.supt | while read line
do
echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'
What I need the ouput too look like is this and then export that as a variable:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver\_weblogic\_|dcswebrdtEAR\_weblogic\_"
Can anyone help?
A few issues with the following:
cat exclude-list.supt | while read line
do
echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'
Sed reads files line by line so cat | while read line;do echo $line | sed is completely redundant also sed can do multiple substitutions by either passing them as a comma separated list or using the -e option so piping to sed three times is two too many. A problem with '[.-+_]' is the - is between . and + so it's interpreted as a range .-+ when using - inside a character class put it at the end beginning or end to lose this meaning like [._+-].
A much better way:
$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file
\.log
\.out
\.csv
logs
shared
tracing
jdk
8\.6\_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
lost\+found
dlxwhsr
regression
tmp
working
investigation
Investigation
dcsserver\_weblogic\_
dcswebrdtEAR\_weblogic\_
Now we can pipe through tr '\n' '|' to replace the newlines with pipes for the alternation ready for egrep:
$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|"
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...
$ EXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|")
$ echo $EXCLUDE
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...
Note: If your file ends with a newline character you will want to remove the final trailing |, try sed 's/\(.*\)|/\1/'.
This might work for you (GNU sed):
SEXCLUDE_supt=$(sed '1h;1!H;$!d;g;s/[*\/]//g;s/\([.-+_]\)/\\\1/g;s/\n/|/g' file)
This should work but I guess there are better solutions. First store everything in a bash array:
SEXCLUDE_supt=$( sed -e 's/\*//g' -e 's/\///g' -e 's/\([.-+_]\)/\\\1/g' exclude-list.supt)
and then process it again to substitute white space:
SEXCLUDE_supt=$(echo $SEXCLUDE_supt |sed 's/\s/|/g')

How to filter data out of tabulated stdout stream in Bash?

Here's what output looks like, basically:
? RESTRequestParamObj.cpp
? plugins/dupfields2/_DupFields.cpp
? plugins/dupfields2/_DupFields.h
I need to get the filenames from second column and pass them to rm. There's AWK script that goes like awk '{print $2}' but I was wondering if there's another solution.
If you have spaces between the ? and the filename then:
cut -c9-
If they're tabs then:
cut -f2
Placed your output in file
$> cat ./text
? RESTRequestParamObj.cpp
? plugins/dupfields2/_DupFields.cpp
? plugins/dupfields2/_DupFields.h
Edit it with sed
$> cat ./text | sed -r -e 's/(\?[\ \t]*)(.*)/\2/g'
RESTRequestParamObj.cpp
plugins/dupfields2/_DupFields.cpp
plugins/dupfields2/_DupFields.h
Sed in here is matching 2 parts of line -
? with tabs or spaces
Other characters until the end f the line
And then it changes whole line only with second part.
This might work for you:
echo "? RESTRequestParamObj.cpp" | sed -e 's/^\S\+/rm /' | sh
or using GNU sed
echo "? RESTRequestParamObj.cpp"| sed -r 's/^\S+/rm /e'
bash only solution, assuming your output comes from stdin:
while read line; do echo ${line##* }; done
use cut/perl instead
cut -f2 -t'\t'|xargs rm -rf
<your output>|perl -ne '#cols = split /\t/; print $cols[1]'|xargs rm -rf

Resources