Traversing in linux - linux

I am new to linux shell scripting! Can anyone tell me how to:
traverse a folder with multiple xml files.
after traversing, check for given string.
if the given string is present, display the files with given string

Try this:
grep -o "display_name = Question" /path/to/abc/*.xml

Related

What command to search for ID in .bz2 file?

I am new to Linux and I'm trying to look for an ID number within a .bz2 file. Seems like a fairly straight forward requirement, however I cannot find the correct command anywhere online. I believe I need to use bzgrep.
I want to look for '123456' in the file Bulk9876.bz2
How would I construct this command?
You probably just need to tell grep that it's okay to parse that data as text:
bzgrep -a 123456 Bulk9876.bz2
If you're trying to view the compressed data (rather than decompressing it and searching the decompressed data), just use grep -a ….
Otherwise, it might make sense to verify that the desired string is even present in the file; bunzip2 it and grep -a the decompressed file. If that works, the problem is in your bzgrep instance (which is odd because it should be using the same decompression library as bunzip2).

Two-layer search in Atom

Currently in atom I can search for a string in my project, but is there a way to search for a string in my projects, then search for a string in those files found in the first search?
I want to be able to find all uses of a function in a class, but the problem is that the class can have any name in files(thank you Javascript). So I know which files import my class based on the file location which gets imported, but I don't have any way of searching those results for the string 'getOrder'. For example I would like to be able to do the following:
search for the string 'meals/meals'
get a list of file names
use those file names and search for the string '.getOrder'
I think this is doable with grep, but I am no master of grep. I tried that route and failed.
You can grep a grep by feeding into the grep command with $():
https://unix.stackexchange.com/questions/20262/how-do-i-pass-a-list-of-files-to-grep
and this guy showed me how to get a list of files that contain the string I need:
How do I find all files containing specific text on Linux?
which gives you
grep createOrder $(grep -rnwl './' -e 'meals/meals')

Check latest file updates in directory on linux using bash shell scripting

I have basic knowledge of linux bash shell scripting, right now I am facing a problem that is like following:
Suppose I am working in an empty directory mydir
Then there is a process which is created by a C program to generate a file with one word. (Exp: file.txt would have one word, "hello")
Routinely, after a specific period of time, the file is updated by the C program with the same one word "hello".
I want check the file every time when it is updated.
But the issue is that I also want my script doing some other operation while checking the file updates and when it detects file updates that it returns something for which I can use to trigger something else.
So, can anyone help me.
Also, some proof of concept :
while true;
do
func1();
func2();
check file is updated or not
if updated ; then
break;
else
continue;
You probably want the stat command. Do man stat to see how yours works. You want to look for "modtime" or "time of last data modification" option. For mine that would be stat -c%Y file. Something like basemodtime=$(stat -c%Y file) before the loop, modtime=$(stat -c%Y file) after func2(), and then if [ $modtime != $basemodtime ]; then to detect "updated".

How to call a large list of paired files to be executed by a program in BASH?

I have a large directory of files (100+) that I'd like to pass through a program via the terminal.
The files are paired and all follow a naming scheme like such:
TS-8_S53_L001_R1_001.fastq
TS-8_S53_L001_R2_001.fastq
RS-9_S54_L001_R1_001.fastq
RS-9_S54_L001_R2_001.fastq
And the program execution looks like:
Seqprogram -i1 Blah_R1_001.fastq -i2 Blah_R2_001.fastq -o Blah_paired.fastq
All of these files are in one directory.
I'd like to able to run the program on all of the files, using the files paired together in the proper sequence (R1 files are passed through i1, the R1 and R2 files have the same base name) and the output file (-o) is saved under the base name with some identifier attached ("_paired", etc).
I've envisioned on how I'd do this over Python; however, I am trying to get better with BASH.
I'm familiar with how one might call multiple files into a single command; i.e., uncompressing all .gz files in a particular directory
gunzip "*.gz"
But this command has two inputs, and the inputs must be ordered, so the wildcard scheme isn't sufficient.
Thanks
Use a wildcard to get one file of the pair, and then use parameter substitution to get the other corresponding filenames.
for i1 in *_R1_001.fastq; do
i2=${i1/R1_001/R2_001}
paired=${i1/R1_001/paired}
Seqprogram -i1 "$i1" -i2 "$i2" -o "$paired"
done
The easiest way to do this is to match a single one of the three filenames patterned, and to modify it to get the other two.
That is to say:
for r1file in *_R1_*.fastq; do
r2file=${r1file/_R1_/_R2_}
pairfile=${r1file%_R1_*}_paired.fastq
Seqprogram -i1 "$r1file" -i2 "$r2file" -o "$pairfile"
done

How to edit multiple file names at once?

I have directory full of .txt files (2000 files). they have very long name. I want to edit their name and just keep certain letter from inside of their name as file name.
like this :
UNCID_279113.TCGA-A6-2683-01A-01R-0821-07.100902_UNC7-RDR3001641_00025_FC_62EPOAAXX.1.trimmed.annotated.gene.quantification.txt
I want eliminate this long names and just keep the name starting from TCGA and ending after three - ; for example, my new file name would be : TCGA-A6-2683-01A
does anybody knows how can I do this for whole files in one directory?
Assuming the files are in the current directory:
library(gsubfn)
pat <- "TCGA-[^-]*-[^-]*-[^-]*"
file.names <- dir(pattern = pat)
new.names <- strapplyc(file.names, pat, simplify = TRUE)
file.rename(file.names, new.names)
Create a shell/batch script Here is a variation. It produces a UNIX shell file or a Windows batch file. You can then review the file and run it:
# UNIX
writeLines(paste("mv", file.names, new.names), con = "tcga_rename.sh")
shell("tcga_rename.sh")
or on Windows:
# Windows
writeLines(paste("rename", file.names, new.names), con = "tcga_rename.bat")
shell("tcga_rename.bat")
REVISED: Factored out pat, simplified and added variations.
Assuming your files are in the current working directory, try
library(stringr)
files <- list.files(".", pattern=".txt")
file.rename(files, str_extract(files, "TCGA(-\\w+){3}"))
You can do something like this:
pattern <- ".*(TCGA-[^-]+-[^-]+-[^-]*).*"
file.rename(
list.files("."),
sub(pattern, "\\1", list.files("."))
)
But be super careful that the sub command does what you think it will do before you run the full thing (i.e. just run the sub piece). Hard to be sure this won't cause a problem without knowing what patterns you have in your file names.
Also, in this case replace list.files(".") with your directory. Note you don't need to filter our the files that match the pattern in the first place since sub will only modify the file names that do match the pattern (not super efficient if you have a lot of files that don't match the pattern, but easier to write, if a concern, you can use the pattern argument as Greg Snow does).
You cane use list.files() to get a list of the filenames in the directory, then use substitute with regular expressions to edit the names, then file.rename to actually do the renaming.
Something like (untested):
curfiles <- list.files(pattern='TCGA') # only grab files with TCGA in them
newfiles <- sub("^.*(TCGA-[a-zA-z0-9]+-[a-zA-Z0-9]+-[a-zA-Z0-9]+).*$", "\\1", curfiles)
file.rename(curfiles,newfiles)

Resources