How to add 2 netcdf files with NCO? - add

I have two gridded netcdf files with several variables including also time.
The files have the same dimension (gridded, time, variables). The number of the variables are the same. However , the name of them are different.
For example
File1: var1_co, var2_co, var3_co
File2: var1_mo, var2_mo, var3_mo
The new file would be:
File3: var1_mo+var1_co, var2_co+var2_mo, var3_co+var3_mo
The following command does not work:
ncbo --op_typ=add File1.nc File2.nc File3.nc

Try something like this, and possible re-rename on the back-end too:
ncrename -v var1_mo,var1_co -v var2_mo,var2_co, -v var3_mo,var3_co File2.nc f2_tmp.nc
ncbo --op_typ=add File1.nc f2_fmp.nc File3.nc

Related

How to include static variable with time varying variables in a netCDF file using Climate Data Operator (CDO)?

I have two netCDF files. One file elevation.nc contains just the 'elevation' of an area. Other file climate.nc has ('lat', 'lon','prcp', 'temp'). I have used the following:
cdo merge elevation.nc climate.nc merged.nc
The merge.nc file only has on single prcp and temp from the date that the elevation had been recorded.
How to get time varying prcp and temp in merged.nc similar to climate.nc but also with the static variable elevation?
did you try reversing the order?
cdo merge climate.nc elevation.nc merged.nc
the main issue was found to be that the elevation.nc has a single time step attribute. It is essential to remove this attribute before the merge. hence, the steps shall involve (using ncks from nco):
ncwa -a time elevation.nc test1.nc
ncks -O -x -v time test1.nc test2.nc #removing the time attribute
ncks -A -v hgt test2.nc climate.nc #appending the hgt or elevation.nc with climate.nc

How to link a selected set of files from one directory to another in Linux?

Say for example, in the source directory I have the following files:
abc.r
xyz.sh
pqr.fam
lmn.bim
uvw.r
ttt.sh
Now I need to link only the items 1,2 and 5 only (listed above). Most importantly I need to link all the 3 files together (i.e. link all the 3 files at the same time).
I know how to link 1 file at a time (ln -s sourceDirectory/fileName targetDirectory/), but not multiple files at once.
I found ways to do this when the file name prefixes has some pattern (for example, link all the files where the names start with letter "f"), but in my case, I do not have any such pattern. My file names are different.
Try this:
#!/bin/bash
for file in a.txt b.txt c.txt
do
ln -s /sourcedir/"${file}" /targetdir/
done
Since you only have a list, you have to iterate through the list.

Iterate through files in a directory, create output files, linux

I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.
If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do
I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.
You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done

Find difference line by line

I have a program which stores some data in two files stored in separate folders. /Path_1/File A and /Path_2/File B.
Now I need to compare those two files line by line for any differences. If any difference noted, I need to capture that and stored in a separate file or print it on the screen.
I tried using comm,diff and join. But none of them worked so far. Appreciate any help.
Sample file looks like following.
124 days
3.10.0-327.13.1.el7.x86_64
/dev/mapper/vg_sda-lv_root ext4
devtmpfs devtmpfs
In other file number of days and kernel version can be differ. I only need to capture that while running a script.
I tried diff -y -W 120 Source/File Destination/File , comm File1 File2
You can try this:
diff --suppress-common-lines /path_1/file_a /path_2/file_b > output
Where --suppress-common-lines does:
"do not output common lines"

Manually merge two files using diff

I'd like to merge two files by doing the following:
Output the diff of the two files into a temp file and
Manually select the lines I want to copy/save.
The problem here is that diff -u only gives me a file lines of context, while I want to output the entire file in a unified format.
Is there any way diff can do this?
One option that might fit the bill for you,
sdiff : side-by-side diff of files.
sdiff -o merged.file left.file right.file
Once there, it will prompt you with what lines you want to keep from which file. Hit ? and then enter for a little help. Also man sdiff with the detailed goods.
(In my distro, these come packaged in the "diffutils" package [fedora,centos])
If you need to automate the process, you might want to try the util merge, which will mark conflicts in the files. However, that might put you back at square one.
"I want to output the entire file in a unified format. Is there any way diff can do this?"
Yes.
diff -U 9999999 file1.txt file2.txt > diff.txt
This should work, provided your files are less than 10 million lines long.
You can merge/combine the two files with diff using --
diff --line-format %L file1 file2
The easy answer is to use the -D flag to merge the files and surround the differences with C style #ifdef statements.
From the documentation:
-D NAME --ifdef=NAME
Output merged file to show `#ifdef NAME' diffs.
You can use it as follows:
$ diff -D NEWSTUFF file1 file2 > merged_file
I usually then just open the merged file in an editor and resolve the merge conflicts by hand.
You also can use options to output an ed script, etc.
If you are an emacs user, you can do this directly in emacs using the "emerge" tool:
https://www.gnu.org/software/emacs/manual/html_node/emacs/Emerge.html
Issuing M-x emerge-files will open an interactive prompt with a view of files A, B, and the merged file to allow choosing text that differs between files A & B, inserting part of A into B, and more.

Resources