How to avoid break line symbol when merging two files?

How to avoid break line symbol when merging two files? - linux

- tell me about yourself^M
20 - bye for now
21 - tell me about you^M
22 - catch you later
23 - about yourself^M
I use the following command to merge two text files:
cat 1.txt >> 2.txt
But the merged file introduced lots of ^M. How to void that? I am working on Mac Pro.

The ^M is a carriage return character, octal 015. Delete it with tr'
cat 1.txt | tr -d '\015' >> 2.txt

Related

Bash: redirect `cat` to file without newline

I'm sure this question has been answered somewhere, but while searching for it, I can't find my exact scenario. I have two files that I am concatenating into a single file, but I am also adding user input between the two files. The problem is that a newline is being inserted after the first file, the rest of it works as desired.
touch newFile
cat file1 > newFile
echo -n $userInput >> newFile
cat file2 >> newFile
How do I prevent or remove the newline when file1 is added to newFile? If I cat file1 there seems to be a newline added by cat but everything I see about cat says it doesn't do that. If I vim file1 there's not a blank line at the end of the file that would indicate the newline is a part of the file, so either cat is actually adding a newline, or the redirect > is doing it, or echo adds a newline at the beginning of its output, none of which would be desirable in this situation. One solution I saw was to use
cat file1 | tr -d '\n'
but that discards all the newlines in the file, also not desirable. So, to repeat my question:
How do I cat file1 into the new file and add user input without adding the newline between them?
(cat is not a requirement, but I am not familiar with printf, so if that's the solution then please elaborate on its use).

With these inputs:
userInput="Test Test Test"
echo "Line 1
Line 2
Line 3" >file1
echo "Line 4
Line 5
Line 6" >file2
I would do:
printf "%s%s%s" "$(cat file1)" "$userInput" "$(cat file2)" >newfile
The creation of >newfile is equivalent to touch and adding content in your first step. A bit easier to see intent with this.
I get:
$ cat newfile
Line 1
Line 2
Line 3Test Test TestLine 4
Line 5
Line 6

Like all other Unix tools, Vim considers \n a line terminator and not a line separator.
This means that a linefeed after the last piece of text will be considered part of the last line, and will not show an additional blank line.
If there is no trailing linefeed, Vim will instead show [noeol] in the status bar when the file is loaded:
foo
~
~
~
~
~
"file" [noeol] 1L, 3C 1,1 All
^---- Here
So no, the linefeed is definitely part of your file and not being added by bash in any way.
If you want to strip all trailing linefeeds, you can do this as a side effect of command expansion:
printf '%s' "$(<file1)" >> newfile

touch newFile
echo -n "$(cat file1)" > newFile
echo -n $userInput >> newFile
cat file2 >> newFile
That did the trick.

Copy differences between two files in unix

Firstly, which is the best and fastest unix command to get only the differences between two files ? I tried using diff to do it (below).
I tried the answer given by Neilvert Noval over here - Compare two files line by line and generate the difference in another file
code -
diff -a --suppress-common-lines -y file1.txt file2.txt >> file3.txt
But, I get a lot of spaces and a > symbol also before the different lines. How do I fix that ? I was thinking of removing trailing spaces and the first '>', but not sure if that is a neat fix.
My file1.txt has -
Hello World!
Its such a nice day!
#this is a newline and not a line of text#
My file1.txt has -
Hello World!
Its such a nice day!
Glad to be here!
#this is a newline and not a line of text#
Output - " #Many spaces here# > Glad to be here:)"
Expected output - Glad to be here:)

Another way to get diff is by using awk:
awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2
Though I must admit that I haven't run any benchmarks and can't say which is the fastest solution.

The -y option to diff makes it produce a "side by side" diff, which is why you have the spaces. Try -u 0 for the unified format with zero lines of context. That should print:
+Glad to be here:)
The plus means the line was added, whereas a minus means it was removed.

diff -a --suppress-common-lines -y file1.txt file2.txt|tr 'a >' '' |awk '{print $1}' >>file3.txt

space/tab/newline insensitive comparison

Suppose I have these two files:
File 1:
1 2 3 4 5 6 7
File 2:
1
2
3
4
5
6
7
Is it possible to use diff to compare these two files so that the result is equal ?
(Or if not, what are other tools should I use? )
Thanks

You could collapse whitespace so file2 looks like file1, with every number on the same line:
$ cat file1
1 2 3 4 5 6 7
$ cat file2
1
2
4
3
5
6
7
$ diff <(echo $(< file1)) <(echo $(< file2))
1c1
< 1 2 3 4 5 6 7
---
> 1 2 4 3 5 6 7
Explanation:
< file # Equivalent to "cat file", but slightly faster since the shell doesn't
# have to fork a new process.
$(< file) # Capture the output of the "< file" command. Can also be written
# with backticks, as in `< file`.
echo $(< file) # Echo each word from the file. This will have the side effect of
# collapsing all of the whitespace.
<(echo $(< file)) # An advanced way of piping the output of one command to another.
# The shell opens an unused file descriptor (say fd 42) and pipes
# the echo command to it. Then it passes the filename /dev/fd/42 to
# diff. The result is that you can pipe two different echo commands
# to diff.
Alternately, you may want to make file1 look like file2, with each number on separate lines. That will produce more useful diff output.
$ diff -u <(printf '%s\n' $(< file1)) <(printf '%s\n' $(< file2))
--- /dev/fd/63 2012-09-10 23:55:30.000000000 -0400
+++ file2 2012-09-10 23:47:24.000000000 -0400
## -1,7 +1,7 ##
1
2
-3
4
+3
5
6
7
This is similar to the first command with echo changed to printf '%s\n' to put a newline after each word.
Note: Both of these commands will fail if the files being diffed are overly long. This is because of the limit on command-line length. If that happens then you will need to workaround this limitation, say by storing the output of echo/printf to temporary files.

Some diffs have a -b (ignore blanks) and -w (ingnore whitespace), but as unix utilities are all line-oriented, I don't thing whitespace will include \n chars.
Dbl-check that your version of diff doesn't have some fancy gnu-options with diff --help | less or man diff.
Is your formatting correct above, file 1, data all on one line? You could force file2 to match that format with
awk '{printf"%s ", $0}' file2
Or as mentioned in comments, convert file 1
awk '{for (i=1;i<=NF;i++) printf("%s\n", $i)}' file1
But I'm guessing that your data isn't really that simple. Also there are likely line length limitations that will appear when you can least afford the time to deal with them.
Probably not what you want to hear, and diffing of complicated stuff like source-code is not an exact science. So, if you still need help, create a slightly more complicated testcase and add it to your question.
Finally, you'll need to show us what you'd expect the output of such a diff project to look like. Right now I can't see any meaningful way to display such differences for a non-trival case.
IHTH

If it turns out the data is indeed simple enough to not run into limitations, and the only difference between the files is that the first one separates by space and the second by newline, you can also do process substitution (as was suggested above) but with sed to replace the spaces in the first file with newlines:
diff <(sed 's/ /\n/g' file1) file2

Search for lines in a file that contain de lines of a second file

So I have a first file with a ID in each line, for example:
458-12-345
466-44-3-223
578-4-58-1
599-478
854-52658
955-12-32
Then I have a second file. It has a ID in each file followed by information, for example:
111-2457-1 0.2545 0.5484 0.6914 0.4222
112-4844-487 0.7475 0.4749 0.1114 0.8413
115-44-48-5 0.4464 0.8894 0.1140 0.1044
....
The first file only has 1000 lines, with the IDs of the info I need, while the second file has more than 200,000 lines.
I used the following bash command in a fedora with good results:
cat file1.txt | while read line; do cat file2.txt | egrep "^$line\ "; done > file3.txt
However I'm now trying to replicate the results in Ubuntu, and the output is a blank file. Is there a reason for this not to work in Ubuntu?
Thanks!

You can grep for several strings at once:
grep -f id_file data_file
Assuming that id_file contains all the IDs and data_file contains the IDs and data.

Typical job for awk:
awk 'FNR==NR{i[$1]=1;next} i[$1]{print}' file1 file2
This will print the lines from the second file that have an index in the first one. For even more speed, use mawk.

this line works fine for me in Ubuntu:
cat 1.txt | while read line; do cat 2.txt | grep "$line"; done
However, this may be slow as the second file (200000 lines) will be grepped 1000 times (number of lines in the first file)

How to prepend/(insert a string) in a file at arbitrary line numbers to multiple files in a directory?

Appending can be done using tee command.
cat file | tee -a >> *
Is there a way to do a prepend/insertion?
Thanks.

Using sed might help
example:
sed -i.bak '3 r tmp1.txt' settings.xml
will add the contents of tmp1.txt after line 3 in settings.xml (and create a backup file with the .bak extension)

Just a brief example: say, comment out specific/particular/arbitrary C lines:
$ echo -e "1\n2\n3\n4\n5\n6\n" | sed "3s,^,/* ,;5s,$, */,"
1
2
/* 3
4
5 */
6
Note:
single sed command follows a format "${linenum}s/${search}/${replace}/"
then two commands can be delimited by a semicolon ';'
using a comma ',' as delimiter, for easier reading of s///
Caret '^' matches start of line; dollar '$' matches end of line; the s/// will replace only those (meta?)"characters"
Of course, then this should be modified with the -i switch to sed, to eventually replace file contents..
Cheers!
EDIT: Refs:
Can we give multiple patterns to a sed command??? - The UNIX and Linux Forums
Cute ‘sed’ Tricks to Modify Specific Lines Within File | The Linux Daily

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to avoid break line symbol when merging two files? - linux

- tell me about yourself^M 20 - bye for now 21 - tell me about you^M 22 - catch you later 23 - about yourself^M I use the following command to merge two text files: cat 1.txt >> 2.txt But the merged file introduced lots of ^M. How to void that? I am working on Mac Pro.

The ^M is a carriage return character, octal 015. Delete it with tr' cat 1.txt | tr -d '\015' >> 2.txt

Related

Bash: redirect `cat` to file without newline

Copy differences between two files in unix

space/tab/newline insensitive comparison

Search for lines in a file that contain de lines of a second file

How to prepend/(insert a string) in a file at arbitrary line numbers to multiple files in a directory?

Categories

Resources