I'm having trouble applying a patch to my source tree, and it's not the usual -p stripping problem. patch is able to find the file to patch.
Specifically, my question is how to read / interpret the .rej files patch creates when it fails on a few hunks. Most discussions of patch/diff I have seen don't include this.
A simple example:
$ echo -e "line 1\nline 2\nline 3" > a
$ sed -e 's/2/b/' <a >b
$ sed -e 's/2/c/' <a >c
$ diff a b > ab.diff
$ patch c < ab.diff
$ cat c.rej
***************
*** 2
- line 2
--- 2 -----
+ line b
As you can see: The old file contains line 2 and the new file should contain line b. However, it actually contains line c (that's not visible in the reject file).
In fact, the easiest way to resolve such problems is to take the diff fragment from the .diff/.patch file, insert it at the appropriate place in the file to be patched and then compare the code by hand to figure out, what lines actually cause the conflict.
Or - alternatively: Get the original file (unmodified), patch it and run a three way merge on the file.
Wiggle is a great tool for applying .rej files when patch does not succeed.
I'm not an expert on dealing with patch files, but I'd like to add some clarity on how to read them based on my understanding of the information they contain.
Your .rej files will tell you:
the difference between the original and the .rej file;
where the problem code starts in the original file, how many lines it goes on
for in that file;
and where the code starts in the new file, and how many lines it goes on for in that file.
So given this message, noted in the beginning of my .rej file:
diff a/www/js/app.js b/www/js/app.js (rejected hunks)
## -4,12 +4,24 ##
I see that for my problem file (www/js/app), the difference between the original (noted as a/www/js/app.js on the first line) and the .rej file (noted as b/www/js/) starts on line 4 of the original and goes on for 12 lines (the part before the comma in ## -4,12, +4,24 ## on line two), and starts on line 4 of the new version of the file and goes on for 24 lines (the part after the comma in ## -4,12, +4,24 ##.
For further information, see the excellent overview of patch files (containing the information I note above, as well as details on lines added and/or between file versions) at http://blog.humphd.org/vocamus-906/.
Any corrections or clarifications welcome of course.
Related
I made a shell script the purpose of which is to find files that don't contain a particular string, then display the first line that isn't empty or otherwise useless. My script works well in the console, but for some reason when I try to direct the output to a .txt file, it comes out empty.
Here's my script:
#!/bin/bash
# takes user input.
echo "Input substance:"
read substance
echo "Listing media without $substance:"
cd media
# finds names of files that don't feature the substance given, then puts them inside an array.
searchresult=($(grep -L "$substance" *))
# iterates the array and prints the first line of each - contains both the number and the medium name.
# however, some files start with "Microorganisms" and the actual number and name feature after several empty lines
# the script checks for that occurence - and prints the first line that doesnt match these criteria.
for i in "${searchresult[#]}"
do
grep -m 1 -v "Microorganisms\|^$" $i
done >> output.txt
I've tried moving the >>output.txt to right after the grep line inside the loop, tried switching >> to > and 2>&1, tried using tee. No go.
I'm honestly feeling utterly stuck as to what the issue could be. I'm sure there's something I'm missing, but I'm nowhere near good enough with this to notice. I would very much appreciate any help.
EDIT: Added files to better illustrate what I'm working with. Sample inputs I tried: Glucose, Yeast extract, Agar. Link to files [140kB] - the folder was unzipped beforehand.
The script was given full permissions to execute. I don't think the output is being rewritten because even if I don't iterate and just run a single line of the loop, the file is empty.
I tried p4 diff2 command. Please help me in understanding the command's output. Specifically I want to understand the lines like following
1c1
3a4,5
6a9
13a17,18
2,7d1
It's pretty typical; see https://unix.stackexchange.com/questions/81998/understanding-of-diff-output for a very similar description.
Lines which were (a)dded, (c)hanged, or (d)eleted are prefixed by these entries.
The numbers are line numbers from either the source file or the target file.
So 1c1 means that line 1 in the source was changed, resulting in line 1 in the target.
And 13a17,18 means that lines 17-18 in the target were added after line 13 in the source.
And 2,7d1 means that lines 2 through 7 in the source were deleted before line 1 in the target.
I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done
I have large file with size of 130GB.
# ls -lrth
-rw-------. 1 root root 129G Apr 20 04:25 syslog.log
So I need to reduce file size by deleting line which starts with "Nov 2" , So I have given the following command,
sed -i '/Nov 2/d' syslog.log
So I can't edit file using VIM editor also.
When I trigger SED command , its creating backup file also. But I don't have much space in root. Please try to give alternate solution to delete particular line from this file without increasing space in server.
It does not create a real backup file. sed is a stream editor. When applied to a file with option -i it will stream that file through the sed process, write the output to a new file (a temporary one), when everything is done, it will rename the new file to the original name.
(There are options to create backup files also, but you didn't give them, so I won't mention that further.)
In your case you have a very large file and don't want to create any copy, however temporary. For this you need to open the file for reading and writing at the same time, then your sed process can overwrite the original. After this, you will have to truncate the file at the end of the writing.
To demonstrate how this can be done, we first perform a test case.
Create a test file, containing lots of lines:
seq 0 999999 > x
Now, lets say we want to remove all lines containing the digit 4:
grep -v 4 1<>x <x
This will open the file for reading and writing as STDOUT (1), and for reading as STDIN. The grep command will read all lines and will output only the lines not containing a 4 (option -v).
This will effectively overwrite the beginning of the original file.
You will not know how long the output is, so after the output the original contents of the file will appear:
…
999991
999992
999993
999995
999996
999997
999998
999999
537824
537825
537826
537827
537828
537829
…
You can use the Unix tool truncate to shorten your file manually afterwards. In a real scenario you will have trouble finding the right spot for this, so it makes sense to count the number of bytes written (using wc):
(Don't forget to recreate the original x for this test.)
(grep -v 4 <x | tee /dev/stderr 1<>x) |& wc -c
This will preform the step above and additionally print out the number of bytes written to the terminal, in this example case the output will be 3653658. Now use truncate:
truncate -s 3653658 x
Now you have the result you want.
If you want to do this in a script, i. e. without interaction, you can use this:
length=$((grep -v 4 <x | tee /dev/stderr 1<>x) |& wc -c)
truncate -s "$length" x
I cannot guarantee that this will work for files >2GB or >4GB on your machine; depending on your operating system (32bit?) and the versions of the installed tools you might run into largefile issues. I'd perform tests with large files first (>4GB as this is typically a limit for many things) and then cross your fingers and give it a try :)
Some caveats you have to keep in mind:
Of course, nobody is supposed to append log entries to that log file while the procedure is running.
Also, any abort during the running of the process (power failure, signal caught, etc.) will leave the file in an undefined state. But re-running the command again after such a mishap will in most cases produce the correct output; some lines might be doubled, but not more than a single line should be corrupted then.
The output must be smaller than the input, of course, otherwise the writing will overtake the reading, corrupting the whole result so that lines which should be there will be missing (or truncated at the start).
How do I quickly scrape off first few lines from a large file, without opening the whole file in main memory?
UPDATE
I do not want to pipe the starting x lines into another file and then cut the first few lines, I want to update the original file.
Not exactly vim, but to cut of the first 10 lines you could use
tail --lines=+10 somefile.txt > newfile.txt
tail -n+11 somefile.txt | vim -
To chop off the first 10* lines and open the file for edit, without creating a temporary file. Note that the file will have no name in vim when you open it this way. That's the only drawback.
* Note that although I used 11 in the command, this starts from line 11. So it will chop off the first 10 lines.
The original question was never actually answered here. I believe this is a solution:
sed -i 's/`head -n 500 foo.txt`//' foo.txt
This would eliminate the first 500 lines of a file without having to create a temporary file. (Actually, you might have to do head -n 499) I think it's actually quite useful as a one-liner for say, cleaning up log files, without just erasing the entire log.
$ seq 1 502 > foo.txt
$ sed -i 1,500d foo.txt
$ cat foo.txt
501
502
vim will always want/need to read in the whole file, so there's no way to do it using (only) vim. Darcara's suggestion looks good.
This process will always involve copying all but the first part of the file to another, so I don't see any way of doing it quickly.
Depending on what you can do with the file you may be better of using sed or awk for editing such a big file.
How about ..
split the original file into 2 parts. (p1: lines 0 - x) (p2: lines x+1 - n)
edit p1 since you want to edit the first x lines. We'll call it p1'
combine p1' and p2
In short
file -> p1 and p2
p1 -> p1'
p1' + p2 -> new_file
Commands
use split or cut
use vim or editor of your choice.
use cat to combine