How can I remove these parts of my filenames? - linux

I have some files I need to rename in bulk. For example:
Jul-0961_S7_R2_001.fastq.gz
Jul-0967_S22_rep1_R1.fastq.gz
Jul-0974_S32_R2_001.fastq.gz
I need to remove the S* part of the filename but I don't know the right regular regex to use.
Specifically:
Jul-0961_S7_R2_001.fastq.gz --> Jul-0961_R2_001.fastq.gz
Something like, rename 's/S*//' *.gz is what I'm looking for.
Is there a regex wizard out there who can show me the way? Thanks in advance.

You should be able to use something like this: s/_S[0-9]+_/_/

If the files are in the same format (i.e have the same number of underscores, you could use:
"ls" | awk -F_ '{ system("mv "$0" "$1"_"$3"_"$4) }'
Here we are using underscore as the delimiter and then building a command to execute with the system function

Related

GREP - Searching for specific string going backwards

I would like to search file.txt with grep to locate a url ending with. ".doc". When it finds .doc, I want grep to go backwards and find "http://" at the begining of that string.
The output would be http://somesite.com/random-code-that-changes-daily/somefilename.doc
There is only 1 .doc url on this page, so multiple search results should not be an issue.
Please excuse, I am a novice. I did locate the answer at one time but search for 1 hour and can no longer find. I am willing to read and learn but I do not think I'm using the correct search terms for what I want to do. Thank you.
You can use regular expressions,
with the marker ^ you can indicate the start of the line you are looking for.
with the marker $ you can indicate the end of the line you are looking for.
then, you can do something like
grep '^http:\\' \ '.doc$' file.txt
or
grep '^http://\|.doc$' file.txt
or not using regular expressions but just a matching pattern with wildcards as #choroba suggested:
grep 'http://.*\.doc' file.txt
You can also search for http:// and print the line if it contains .doc somewhere after it:
grep 'http://.*\.doc' file.txt
If you want to only print the matching part, use the -o option (if your version of grep supports it).

Shell scripting : to print selected text in the string

Log file name: "/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
From the above string i need to extract the content BCH1043.
The directory structure may differ so the solution should check for the string with BCH until the dot
No need to call basename, you can use parameter substitution that is built-in to the shell for the whole thing:
$ cat x.sh
filepath="/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
# Strip off the path. Everything between and including the slashes.
filename=${filepath##/*/}
# Then strip off everything after and including the first dot.
part1=${filename%%.*}
echo $part1
$ ./x.sh
BCH1043
$
A dot in the filepath will not cause trouble either.
See section 4.5.4 here for more info: http://docstore.mik.ua/orelly/unix3/korn/ch04_05.htm
Oh and resist the temptation to get tricky and do it all in one line. Breaking into separate components is much easier to debug and maintain down the road, and who knows you may need to use those components too (the path and the rest of the file name).
basename will reduce /home/msubra/WORK/tmo/LOG/BCH1043.9987.log to BCH1043.9987.log
echo /home/msubra/WORK/tmo/LOG/BCH1043.9987.log | basename
You can use regular expressions, awk, perl, sed etc to extract "BCH1043" from "BCH1043.9987.log". First I need to know what the range of possible filenames is before I can suggest a regular expression for you.
Use basename to extract only the filename and then use parameter expansion to strip off the data you don't want.
log=/home/msubra/WORK/tmo/LOG/BCH1043.9987.log
log=$(basename "$log")
echo "${log%%.*}"
The following is almost equivalent but doesn't use the external basename process. However there are cases where it will give different results (though whether those cases are relevant here is up to you and your usage/input). See this answer for examples/details.
log=/home/msubra/WORK/tmo/LOG/BCH1043.9987.log
log=${log#*/}
echo "${log%%.*}"
try like this:
a="/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
echo ${a##*/} | cut -d "." -f 1
or
basename $a | cut -d "." -f 1
or
var=${a##*/}; echo ${var%%.*}
output:
BCH1043
It dosent include dot. Your question is not clear, but you can extract like that
${a##*/} will extract after last / like same as basename

Replace "{" with "x " in files

I wish to process files (.krn-files that can be read as txtfiles) and replace every occurence of { with x . Is it possible to do this on the command-line?
As I wish to do this in many files, my idea is to be able to go through all the files in a folder and process them one and one. How can this be achieved? I understand that grep may come in handy...
You can use sed:
sed -i 's/{/x/g' *
sed is the wrong tool for this job, but if you are going to use sed, do it with y instead of s
sed 'y/{/x/'
The correct tool for translating characters is tr
tr { x
If you opened the filem then use this below command...
:%s/{/x/

awk / sed script to remove text

I am currently in need of a way to programmatically remove some text from Makefiles that I am dealing with. Now the problem is that (for whatever reason) the makefiles are being generated with link commands of -l<full_path_to_library>/<library_name> when they should be generated with -l<library_name>. So what I need is a script to find all occurrences of -l/ and then remove up to and including the next /.
Example of what I'm dealing with
-l/home/user/path/to/boost/lib/boost_filesystem
I need it to be
-lboost_filesystem
As could be imagined this is a stop gap measure until I fix the real problem (on the generation side) but in the meantime it would be a great help to me if this could work and I am not too good with my awk and sed.
Thanks for any help.
sed -i 's|-l[^ ]*/\([^/ ]*\)|-l\1|g' Makefile
Here you go
echo "-l/home/user/path/to/boost/lib/boost_filesystem" | awk -F"/" '{ print $1 $NF } '

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources