Creating a file by merging two files - linux

I would like to merge two files and create a new file using Linux command.
I have the two files named as a1b.txt and a1c.txt
Content of a1b.txt
Hi,Hi,Hi
How,are,you
Content of a1c.txt
Hadoop|are|world
Data|Big|God
And I need a new file called merged.txt with the below content(expected output)
Hi,Hi,Hi
How,are,you
Hadoop|are|world
Data|Big|God
To achieve that in terminal I am running the below command,but it gives me output like below
Hi,Hi,Hi
How,are,youHadoop|are|world
Data|Big|God
cat /home/cloudera/inputfiles/a1* > merged.txt
Could somebody help on getting the expected ouput

Probably your files do not have newline characters. Here is how to put the newline character to them.
$ sed -i -e '$a\' /home/cloudera/inputfiles/a1*
$ cat /home/cloudera/inputfiles/a1* > merged.txt

If you are allowed to be destructive (not have to keep the original two files unmodified) then:
robert#debian:/tmp$ cat fileB.txt >> fileA.txt
robert#debian:/tmp$ cat fileA.txt
this is file A
This is file B.

Related

How to replace text strings (by bulk) after getting the results by using grep

One of my Linux MySQL servers suffered from a crash. So I put back a backup, however this time the MySQL is running local (localhost) instead of remotely (IP-address).
Thanks to Stack Overflow users I found an excellent command to find the IP-address in all .php files in a given directory! The command I am using for this is:
grep -r -l --include="*.php" "100.110.120.130" .
This outputs the necessary files with its location ofcourse. If it were less than 10 results, I would simply change them by hand obviously. However I received over 200 hits/results.
So now I want to know if there is a safe command which replaces the IP-address (example: 100.110.120.130) with the text "localhost" instead for all .php files in the given directory (/var/www/vhosts/) recursively.
And maybe, if only possible and not to much work, also output the changed lines to a file? I don't know if thats even possible.
Maybe someone can provide me with a working solution? To be honest, I dont dare to fool around out of the blue with this. Thats why I created a new thread.
The most standard way of replacing a string in multiple files would be to use a tool such as sed. The list of files you've obtained via grep could be read line by line (when output to a file) using a while loop in combination with sed.
$ grep -r -l --include="*.php" "100.110.120.130" . > list.txt
# this will output all matching files to list.txt
Replacing IP in matched files:
while read -r line ; do echo "$line" >> updated.txt ; sed -i 's/100.110.120.130/localhost/g' "${line}" ; done<list.txt
This will take list.txt and read it line by line to the sed command which should replace all occurrences of the IP to "localhost". The echo command directly before sed outputs all the filenames that will be modified into a file updated.txt (it isn't necessary though as list.txt contains the same exact filenames, although it could be used as a means of verification perhaps).
To do a dry run before modifying all of the matched files remove the
-i from the sed command and it will print the output to stdout
instead of in-place modifying the files.

Merge Files and Prepend Filename and Directory

I need to merge files in a directory and include the directory, filename, and line number in each line of the output. I've found many helpful posts about including the filename and line number but not the directory name. Grep -n gets line numbers and I've seen some find commands that get some of the other parts but I can't seem to pull them all together. (I'm using Ubuntu for all of the data processing.)
Imagine two files in directory named "8". (Each directory in the data I have is a number. The data were provided that way.)
file1.txt
JohnPaulGeorgeRingo
file2.txt
MickKeefBillBrianCharlie
The output should look like this:
8:file1.txt:1:John8:file1.txt:2:Paul8:file1.txt:3:George8:file1.txt:4:Ringo8:file2.txt:1:Mick8:file2.txt:2:Keef8:file2.txt:3:Bill8:file2.txt:4:Brian8:file2.txt:5:Charlie
The separators don't have to be colons. Tabs would work just fine.
Thanks much!
If it's just one directory level deep you could try something like so. We go into each directory, print each line with its number and then append the directory name to the front with sed:
$ for x in `ls`; do
(cd $x ; grep -n . *) | sed -e 's/^/'$x:'/g'
done
1:c.txt:2:B
1:c.txt:3:C
2:a.txt:1:A
2:a.txt:2:B

How do I update a file using commands run against the same file?

As an easy example, consider the following command:
$ sort file.txt
This will output the file's data in sorted order. How do I put that data right back into the same file? I want to update the file with the sorted results.
This is not the solution:
$ sort file.txt > file.txt
... as it will cause the file to come out blank. Is there a way to update this file without creating a temporary file?
Sure, I could do something like this:
sort file.txt > temp.txt; mv temp.txt file.txt
But I would rather keep the results in memory until processing is done, and then write them back to the same file. sort actually has a flag that will allow this to be possible:
sort file.txt -o file.txt
...but I'm looking for a solution that doesn't rely on the binary having a special flag to account for this, as not all are guaranteed to. Is there some kind of linux command that will hold the data until the processing is finished?
For sort, you can use the -o option.
For a more general solution, you can use sponge, from the moreutils package:
sort file.txt | sponge file.txt
As mentioned below, error handling here is tricky. You may end up with an empty file if something goes wrong in the steps before sponge.
This is a duplicate of this question, which discusses the solutions above: How do I execute any command editing its file (argument) "in place" using bash?
You can do it with sed (with its r command), and Process Substitution:
sed -ni r<(sort file) file
In this way, you're telling sed not to print the (original) lines (-n option) and to append the file generated by <(sort file).
The well known -i option is the one which does the trick.
Example
$ cat file
b
d
c
a
e
$ sed -ni r<(sort file) file
$ cat file
a
b
c
d
e
Try vim-way:
$ ex -s +'%!sort' -cxa file.txt

Command to open a file which contains the given data

I had this question in interview.
He put a situation in front of me that there are 12 files in your Linux operating system.
Give me a command which will open a file containing data "Hello"..
I told him I just know grep command which will give you the names of files having "Hello" data.
Please tell me if there is any command to open a file in this way..
Assuming it will be only one file containing the word hello:
less $(grep -H "hello" *.txt | sed s/:.*//)
Here it is first capturing the file name using grep with -H parameter. Then using sed removing everything except the filename. And finally its using less to open the file.
Maybe this could help:
$ echo "foo" > file1.txt
$ echo "bar" > file2.txt
$ grep -l foo * | xargs cat
foo
You have 2 files, and you are looking for the one with the string "foo" in it. Change cat with your command of choice to open files. Might try vi, emacs, nano, pico... (no, another flame war!)
You may want to try a different approach if there are several files that contains the string you are looking for... Just thought of only one file containing the string.

Comparing part of a filename from a text file to filenames from a directory (grep + awk)

This is not exactly the easiest one to explain in a title.
I have a file inputfile.txt that contains parts of filenames:
file1.abc
filed.def
fileq.lmn
This file is an input file that I need to use to find the full filenames of an actual directory. The ends of the filenames are different from case to case, but part of them is always the same.
I figured that I could grep text from the input file to the ls command in said directory (or the ls command to a simple text file), and then use awk to output my full desired result, but I'm having some trouble doing that.
file1.abc is read from the input file inputfile.txt
It's checked against the directory contents.
If the file exists, specific directories based on the filename are created.
(I'm also in a Busybox environment.. I don't have a lot at my disposal)
Something like this...
cat lscommandoutput.txt \
| awk -F: '{print("mkdir" system("grep $0"); inputfile.txt}' \
| /bin/sh
Thank you.
Edit: My apologies for not being clear on this.
The output should be the full filename of each line found in lscommandoutput.txt using the inputfile.txt to grep those specific lines.
If inputfile.txt contains:
file1.abc
filed.def
fileq.lmn
and lscommandoutput.txt contains:
file0.oba.ca-1.fil
file1.abc.de-1.fil
filed.def.com-2.fil
fileh.jkl.open-1.fil
fileq.lmn.he-2.fil
The extra lines that aren't contained in the inputfile.txt are ignored. The ones that are in the inputfile.txt have a directory created for them with the name that got grepped from lscommandoutput.txt.
/dir/dir2/file1.abc.de-1.fil/ <-- directory in which files can be placed in
/dir/dir2/filed.def.com-2.fil/
/dir/dir2/fileq.lmn.he-2.fil/
Hopefully that is a little bit clearer.
First, you win a useless use of cat award
Secondly, you've explained this really badly. If you can't describe the problem clearly in plain English it's not surprising you are having trouble turning it into a script or set of commands.
grep -f is a good way to get the directory names, but I don't understand what you want to do with them afterwards.
My problem now is using the outputted file with the one file I want to put the folders
Wut? What does "the one file I want to put the folders" mean? Where does the file come from? Is it the file named in inputlist.txt? Does it go in the directory that it matched?
If you just want to create the directories you can do:
fgrep -f ./inputfile.txt ./lscommandoutput.txt | xargs mkdir
N.B. you probably want fgrep so that the input strings aren't treated as regular expressions and regex metacharacters such as . are ignored.

Resources