I am trying to do a mass search and replace across many files where I replace a keyword in the file, lets say myKeyword, with the name of the current file.
So in file1.php the phrase myKeyword would become file1;
in file2.php it would become file2;
and so on until all the files are completed.
I was wondering if this is possible using scripts or a text editor function.
With GNU sed:
$ for filename in $(find . -type f -name "*"); do sed -i "s/myKeyword/$(basename ${filename} | cut -f 1 -d '.')/g" "${filename}"; done
basename: strip full path from filename - path/to/file1.txt -> file1.txt
cut -f 1 -d '.': strip file extension - file1.txt -> file1
On OSX (BSD sed instead of GNU) you'll need to write sed -i '' "s/myKeyword/... instead (empty string '' after -i). See this answer for the difference: https://unix.stackexchange.com/a/272041/374001
Related
I am currently using the following command:
grep -l -Z -E '.*?FindMyRegex' /home/user/folder/*.csv | xargs -0 -I{} mv {} /home/destination/folder
This works fine. The problem is it uses grep on the entire file.
I would like to use the grep command on the FIRST line of the file only.
I have tried to use head -1 file | at the beginning, but it did not work.
A change I would add to your script is -
for file in *.csv; do
head -1 "$file" | grep -l -Z -E '.*?FindMyRegex' | xargs -0 -I{} mv {} /home/destination/folder;
done
you can maybe try sed '1q' file.csv | grep ... to search the regexp only in the first line.
You don't need grep or find, as long as your files don't have embedded newlines.
I don't know an easy way off the top of my head to get sed to delimit with nulls.
mv $( for f in /home/user/folder/*.csv;
do sed -ns '1 { /yourPattern/F; q; }' $f;
done ) /home/destination/folder/
EDIT
Rewrote with a loop. This will run a separate instance of sed to check each file, but at least it shouldn't read beyond the first line. It will fail syntactically if there are no hits.
You might need -E depending on your regex.
-n says don't print records from the files.
-s says treat each file as a distinct input - this is so the filenames aren't always the first one.
This does require GNU sed for the F.
gawk 'FNR==1{if($0~/PATTERN/)
printf "mv %s %s\n",FILENAME, "/target";nextfile}' /path/*.csv
First of all, in your regex: .*?FindMyRegex the .*? doesn't make any sense, they could be removed.
The above awk (gawk) one-liner will build up mv file target command lines for you. You can check them, if you are satisfied with them, pipe the output to |sh , the commands are gonna be executed.
replace PATTERN by your regex pattern, and /target by the real target dir.
The one-liner is assuming that the filenames don't contain special chars (space i.e.), if it is the case, add "s to the mv cmd.
using GNU awk to find the filenames, pipe the filenames into xargs
gawk -v pattern="myRegex" '
FNR == 1 {if ($0 ~ pattern) printf "%s\0", FILENAME; nextfile}
' *.csv | xargs -0 echo mv -t destination
If it looks OK, remove "echo"
Try this Shellcheck-clean Bash code:
#! /bin/bash
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs match files whose names start with '.'
dest=/home/destination/folder
for file in *.csv ; do
head -n 1 -- "$file" | grep -qE '.*?FindMyRegex' && mv -- "$file" "$dest"
done
shopt -s nullglob prevents an error if there are no .csv files in the directory.
shopt -s dotglob ensures that files whose name starts with '.' are handled.
The -- in the options for head and mv ensures that files whose names begin with - are handled correctly.
The quotes in "$file" and "$dest" ensure that names that contain whitespace (actually $IFS) characters (including newlines) or glob metacharacters are handled correctly.
Note that the .*? in the reqular expression is probably redundant, and may not do what you think it does (grep -E doesn't do non-greedy matching).
I have several files that have the same name, but a different extension. For example
echo "array" > A.hpp
echo "..." > A.h
echo "content" > B.hpp
echo "..." > B.h
echo "content" > C.hpp
echo "..." > C.h
I want to get a list of *.h files based on some content in the corresponding *.hpp file. In particular I am looking for a one-liner to open them in my editor.
It is fair to assume that for each *.hpp file the corresponding *.h file exists. Also, since they are source files, it may be assumed that the filenames do not contain whitespaces.
Current approach
I know how to get a list of *.hpp files based on their content. An approach (but surely not the only or the best) is to
find . -type f -iname '*.hpp' -print | xargs grep -i 'content' | cut -d":" -f1
which gives
./B.hpp
./C.hpp
Opening in my editor is then done by
st `find . -type f -iname '*.hpp' -print | xargs grep -i 'content' | cut -d":" -f1`
But how can I get/open the corresponding *.h files?
You say you want to get a list of *.h files based on some content in the corresponding *.hpp file.
while read -r line ; do
echo "${line%.hpp}.h"
done < <(grep -i 'content' *.hpp| cut -d":" -f1)
BashFAQ 001 recommends to use a while loop and read command to read a data stream.
One-liner as requested
st `while IFS= read -r line ; do echo "${line%.hpp}.h"; done < <(grep -i 'content' *.hpp| cut -d":" -f1)`
If you are dealing with filenames containing whitespace, you need to use printf instead of echo.
st `while IFS= read -r line ; do printf '%q' "${line%.hpp}.h"; done < <(grep -i 'content' *.hpp| cut -d":" -f1)`
The %q lets printf format the output so that it can be reused as shell input.
Explanation
You have to read it from behind. First we grep all files ending in .hpp in the current directory for the string 'content' and cut everything but the basename.
The while loop will read the output of grep and assign the basename to the variable line.
Inside the while loop we use bash's parameter substitution to change the file extension from .h to .hpp.
Your question still isn't clear but is this all you're trying to do (using GNU awk for gensub())?
$ awk '/content/{print gensub(/[^.]+$/,"h",1,FILENAME)}' *.hpp
B.h
C.h
I have a list of almost 500 pdf files with the following filename structure:
XXXX-YYYY-MM-DD.pdf
where XXXX is a variable lenght numeric code (1 to 4 digits) always delimitated by "-", for example:
51-2016-08-22.pdf
776-2016-08-22.pdf
3881-2016-08-22.pdf
4-2016-08-22.pdf
2860-2016-08-22.pdf
The goal is to copy each file into its own directory, naming the directories like the pattern (ie: file 776-2016-08-22.pdf goes to directory 776). How can I use awk or sed to delimitate the variable lenght field?
Here's my code:
for f in *.pdf
do
FOLDERNAME=`echo $f| awk (awk or sed missing code here)`
mkdir /my/dir/structure/$FOLDERNAME
cp $f /my/dir/structure/$FOLDERNAME/
done
Thanks for your support.
You can use:
for f in *.pdf; do
d="${f%%-*}"
mkdir -p "$d" && cp "$f" "$d"
done
As rightly pointed out by ed-morton, This is NOT recommended solution as it fails in many cases. Please follow https://stackoverflow.com/a/39089589/3834860
Keeping this answer for reference.
awk -F '-' to specify delimiter and '{print $1}' for first element before delimiter.
for f in *.pdf
do
FOLDERNAME=`echo $f| awk -F '-' '{print $1}'`
mkdir /my/dir/structure/$FOLDERNAME
cp $f /my/dir/structure/$FOLDERNAME/
done
I need to replace first 4 header lines of only selected 250 erlang files (with extension .erl), but there are 400 erlang files in total in the directory+subdirectories, I need to avoid modifying the files which doesn't need the change.
I've the list of file names that are to be modified, but don't know how to make my linux command to make use of them.
sed -i '1s#.*#%% This Source Code Form is subject to the terms of the Mozilla Public#' *.erl
sed -i '2s#.*#%% License, v. 2.0. If a copy of the MPL was not distributed with this file,#' *.erl
sed -i '3s#.*#%% You can obtain one at http://mozilla.org/MPL/2.0/.#' *.erl
sed -i '4s#.*##' *.erl
in the above commands instead of passing *.erl I want to pass those list of file names which I need to modify, doing that one by one will take me more than 3 days to complete it.
Is there any way to do this?
Iterate over the shortlisted file names using awk and use xargs to execute the sed. You can execute multiple sed commands to a file using -e option.
awk '{print $1}' your_shortlisted_file_lists | xargs sed -i -e first_sed -e second_sed $1
xargs gets the file name from awk in a $1 variable.
Try this:
< file_list.txt xargs -1 sed -i -e 'first_cmd' -e 'second_cmd' ...
Not answering your question but a suggestion for improvement. Four sed commands for replacing header is inefficient. I would instead write the new header into a file and do the following
sed -i -e '1,3d' -e '4{r header' -e 'd}' file
will replace the first four lines of the file with header.
Another concern with your current s### approach is you have to watch for special chars \, & and your delimiter # in the text you are replacing.
You can apply the sed c (for change) command to each file of your list :
while read file; do
sed -i '1,4 c\
%% This Source Code Form is subject to the terms of the Mozilla Public\
%% License, v. 2.0. If a copy of the MPL was not distributed with this file,\
%% You can obtain one at http://mozilla.org/MPL/2.0/.\
' "$file"
done < filelist
Let's say you have a file called file_list.txt with all file names as content:
file1.txt
file2.txt
file3.txt
file4.txt
You can simply read all lines into a variable (here: files) and then iterate through each one:
files=`cat file_list.txt`
for file in $files; do
echo "do something with $file"
done
I have this file name
1006_12_000123123_000023126.data
and I want this file name. I have arround 300000 files.
1006_12_123123_23126.png
I tried som of these solution, but they are for filename like 00002323.jpg
Bash command to remove leading zeros from all file names
I can use mv to rename.
for original_name in *.data; do
# determine new file name from original:
# remove zeroes and change extension.
new_name=$(echo "$original_name" | sed -e 's/_0*/_/g' -e s'/.data$/.png/')
mv "$original_name" "$new_name"
done
Use this
ls * | sed -e 'p;s/_0*/_/g' | xargs -n2 mv