I am really did not understand use of patch command
I have file1 with 1 2 3
file2 with 1 2 4
diff -u file1 file2 > out.patch
patch -b file1 out.patch
Now file1 will have 1 2 4 ... Is it a copy file2 or what?
What is happening here or what s the use of patch command
man patch says
patch takes a patch file patchfile containing a difference listing
produced by the diff program and applies those differences to one or
more original files, producing patched versions.
Normally the patched versions are put in place of the originals.
Backups can be made; see the -b or --backup option.
So, in your case diff -u file1 file2 results in the difference between two files which is 4 in this case; then patch command applies that difference to the original file.
Is it a copy file2 or what?
It will not be but rather appending the difference of files to the original file.
Related
I have a huge plain text file (~500Gb) on linux machine. I want the replace some string in header line (the first row of the file), but all the method I known seems to be slow and low efficiency.
example file:
foo apple cat
1 2 2
2 3 4
3 4 6
...
expected file output:
bar apple cat
1 2 2
2 3 4
3 4 6
...
sed:
sed -i '1s/foo/bar/g' file
-i can change the file in place, but this command generate a tmp file on disk and use the tmp file to replace the original one. The io waste time.
vim:
ex -c '1s/foo/bar/g' -c 'wq' file
vim doesn't generate a tmp file, but this tool load the whole file in to memory, which waste a lot of time either.
Is there a better solution that only read the first row in to memory and write it back to the original file? I known that linux head command can extract the first column very fast.
Could you please try following awk command and let me know if this helps you, I couldn't test it as I don't have a huge size file like 500 GB. For sure it shouldn't create any temp file in backend as it is not using inplace substitution on Input_file.
awk 'FNR==1{$1="bar";print;next} 1' Input_file > temp_file && mv temp_file Input_file
I have two files like below
file1 has the below words
word23.cs
test.cs
only12.cs
file 2 has the below words
word231.cs
test.cs
only12.cs
The above words might change, And now i need to compare the two files using script or linux command to get the different word , i need to compare the file2 with file1 and need to get the output as word23.cs
Thank you
Use the "diff" command to compare 2 files:
$ diff a.txt b.txt
Or, for a unified diff:
$ diff -u a.txt b.txt
Use -u0 for a unified diff without context.
You can use comm, diff or cmp command to find different word from files.
Also this trick can work with a grep command as follows
grep -Fwf file1 file2
What are the differences between sort file1 -o file2 and sort file1 > file2 ? So far from what I have done they do the same thing but perhaps I'm missing something.
Following two commands are similar as long as file1 and file2 are different.
sort file1 -o file2 # Output redirection within sort command
sort file1 > file2 # Output redirection via shell
Let's see what happens when input and output files are same file i.e. you try to sort in-place
sort file -o file # Works perfectly fine and does in-place sorting
sort file > file # Surprise! Generates empty file. Data is lost :(
In summary, above two redirection methods are similar but not the same
Test
$ cat file
2
5
1
4
3
$ sort file -o file
$ cat file
1
2
3
4
5
$ sort file > file
$ cat file
$ ls -s file
0 file
The result is the same but in the case of -o file2 the resulting file is created by sort directly while in the other case, it is created by bash and filled with the standard output of sort. The xfopen defined in line 450 of sort.c in coreutils treats both cases (stdout and -o filename) equally.
Redirecting the standard output of sort is more generic as it could be redirected to another program with a | in place of a >, which the -o option makes more difficult to do (but not impossible)
The -o option is handy for in place sorting as the redirection to the same file will lead to a truncated file because it is created (and truncated) by the shell prior to the invocation of sort.
There is not much difference > is a standard unix output redirection function. That is to say 'write your output that you would otherwise display on the terminal to the given file' The -o option is more specific to the sort function. It is a way to again say 'write the output to this given file'
The > can be used where a tool does not specifically have a write to file argument or option.
When I am trying to sort a file and save the sorted output in itself, like this
sort file1 > file1;
the contents of the file1 is getting erased altogether, whereas when i am trying to do the same with 'tee' command like this
sort file1 | tee file1;
it works fine [ed: "works fine" only for small files with lucky timing, will cause lost data on large ones or with unhelpful process scheduling], i.e it is overwriting the sorted output of file1 in itself and also showing it on standard output.
Can someone explain why the first case is not working?
As other people explained, the problem is that the I/O redirection is done before the sort command is executed, so the file is truncated before sort gets a chance to read it. If you think for a bit, the reason why is obvious - the shell handles the I/O redirection, and must do that before running the command.
The sort command has 'always' (since at least Version 7 UNIX) supported a -o option to make it safe to output to one of the input files:
sort -o file1 file1 file2 file3
The trick with tee depends on timing and luck (and probably a small data file). If you had a megabyte or larger file, I expect it would be clobbered, at least in part, by the tee command. That is, if the file is large enough, the tee command would open the file for output and truncate it before sort finished reading it.
It doesn't work because '>' redirection implies truncation, and to avoid keeping the whole output of sort in the memory before re-directing to the file, bash truncates and redirects output before running sort. Thus, contents of the file1 file will be truncated before sort will have a chance to read it.
It's unwise to depend on either of these command to work the way you expect.
The way to modify a file in place is to write the modified version to a new file, then rename the new file to the original name:
sort file1 > file1.tmp && mv file1.tmp file1
This avoids the problem of reading the file after it's been partially modified, which is likely to mess up the results. It also makes it possible to deal gracefully with errors; if the file is N bytes long, and you only have N/2 bytes of space available on the file system, you can detect the failure creating the temporary file and not do the rename.
Or you can rename the original file, then read it and write to a new file with the same name:
mv file1 file1.bak && sort file1.bak > file1
Some commands have options to modify files in place (for example, perl and sed both have -i options (note that the syntax of sed's -i option can vary). But these options work by creating temporary files; it's just done internally.
Redirection has higher precedence. So in the first case, > file1 executes first and empties the file.
The first command doesn't work (sort file1 > file1), because when using the redirection operator (> or >>) shell creates/truncates file before the sort command is even invoked, since it has higher precedence.
The second command works (sort file1 | tee file1), because sort reads lines from the file first, then writes sorted data to standard output.
So when using any other similar command, you should avoid using redirection operator when reading and writing into the same file, but you should use relevant in-place editors for that (e.g. ex, ed, sed), for example:
ex '+%!sort' -cwq file1
or use other utils such as sponge.
Luckily for sort there is the -o parameter which write results to the file (as suggested by #Jonathan), so the solution is straight forward: sort -o file1 file1.
Bash open a new empty file when reads the pipe, and then calls to sort.
In the second case, tee opens the file after sort has already read the contents.
You can use this method
sort file1 -o file1
This will sort and store back to the original file. Also, you can use this command to remove duplicated line:
sort -u file1 -o file1
Basically, i need a file of specified format and large size(Around 10gb). To get this, i am copying the contents of my original file into the same file, multiple times, to increase its size. I dont care about the contents of the file as long as they have the required format.
Initially, i tried to do this using gedit, which failed miserably after few 100mbs. I'm looking for an editor which will help me do this. Or, may be a suggestion on alternate ways
You could make 2 files and repeatedly append them to each other:
cp file1 file2
for x in `seq 1 200`; do
cat file1 >> file2
cat file2 >> file1
done;
In Windows, from the command line:
copy file1.txt+file2.txt file3.txt
concats 1 and 2, places in 3 - repeat or add +args until you get the size you need.
For Unix,
cat file1.txt file2.txt >> file3.txt
concats 1 and 2, places in 3 - repeat or add more input files until you get the size you need.
There are probably many other ways to do this in Unix.