Unix command to replace all instances of a string in every file in a folder [closed] - string

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have a folder "model". In it, I need to replace all instances of the term "Test_Dbv3" to "TestDbv3". There are multiple files with names like test_host.hbm.xml, test_host2.hbm.xml, testHost.java, testHost2.java and so on. Is there any way I can possibly do this using a Unix command or a script in any language?
I'm working on RHEL5.

sed in in-place mode along with find should probably work:
find . -type f -exec sed -e 's/Test_Dbv3/TestDbv3/g' -i.bak '{}' +
The aptly named find command finds files. Here, we're finding files in the current working directory (.) that are files (-type f). Using these files, we're going to -exec a command: sed. + indicates the end of the command and that we'd like to replace {} with as many files as the operating system will allow.
sed will go file-by-file, line-by-line, executing commands we specify. The command we're giving it is s/Test_Dbv3/TestDbv3/g, which translates to “substitute matches of the regular expression Test_Dbv3 with the text TestDbv3, allowing multiple substitutions per line”. The -i.bak means to replace the original file with the result, saving the unmodified version with the filename suffixed with .bak.

s/_//g is your regex assuming you want all _ gone; otherwise I need to guess how to specify your regex:
For example s/^(Test|test)_/$1/g to replace test_ with test and
Test_ with Test if they are at the beginning of a line.
Or s/^(test)_/$1/gi will additionally work for all TEST_, tEsT_, etc.
If you decide to need completely case insensitive matching that is only available for the for perl -pi -e 's/.../.../gi' or GNU sed or more but not the sed command (not even variables like $1 are, are they?)
If there are also filenames starting like Test2_ or 1EXPERIMENT_ and more words you may would use s/^([A-Za-z0-9]{3,10})_/$1/g to match every combination of letters and numbers of length 3 to 10 chars, not just the Test or test you mentioned.
For even more specific regex search for "regex cheatsheet" and just don't wonder when single tools like sed or grep don't support everything should you even decide to use them.
Should you also ever need a command to only rename files in a folder,
but not edit their content you may try
rename 's/search/relace/' folder/* (not matching subdirectories)
or rename search replace folder/* (depending on version of rename).

Related

Search and replace files (Linux)

I'm quite new to Linux. I'm using Linux Mint and I've just found a situation where I have a file which exists multiple times inside the tree/folders of a folder. I want to replace all occurrences of this file with a new version of it.
So instead of looking for that file once and again and replacing it with the new one, I wonder if there is any kind of search & replace command for files.
I've already searched for a similar question in stackoverflow, but I was only able to find commands to search & replace TEXT in files, not the file itself.
Can anyone please point me to the right direction?
Thank you.
you can always do it in parts, like:
Get a list of items matching your search.
Replace every match (using mv for example) with your file.
something like:
foreach dir ( `ls | egrep '^(i686\|amd64)\.'` )
mv yourfile $dir
end

Find files that are too long for Synology encrypted shares

When trying to encrypt the homes share after the DSM6.1 update, I got a message that there are files with filenames longer than 143 characters. This is the maximum length for a filename in an encrypted Synology share.
Because there is a lot of stuff in the homes share (mostly my own) it was not practical to search for the files by hand. Nevertheless these files had to be deleted or renamed to allow the encryption of the share.
I needed an automated way to find all files in all subdirectories with a filename longer than 143 characters. Searching for the files via the network share using a Windows tool would probably have taken way too long.
I have figured out the solution by myself (with some internet research though, because I'm still a n00b) and want to share it with you, so that someone with the same problem might benefit from this.
So here it goes:
The find function in combination with grep does the trick.
find /volume1/homes/ -maxdepth 15 | grep -P '\/[^\/]{143,}[^\/]'
For my case I assumed that I probably don't have more than 15 nested directories. The maximum depth and the starting directory can be adjusted to your needs.
For the -P argument you might need to have Perl installed, I'm not sure about that, though.
The RegEx matches all elements that have a / somewhere followed by 143 or more of any character other than / and not having a / afterwards. By this we only get files and no directories. For including directories you can leave out the last condition
The RegEx explained for people who might not be too familiar with this:
\/ looks for a forward slash. A new file/directory name begins here.
[^\/] means: Every character except /
{143,} means: 143 or more occurrences of the preceding token
[^\/] same as above. This excludes all results that don't belong to a file.
find . -type f -iname "*" |awk -F'/' 'length($NF)>143{print $0}'
This will print all the files whose name is greater than 143. Note that this is considering only the file name not the full path while calculating length. If you want to consider whole path in length :
find . -type f -iname "*" |awk 'length($0)>143{print $0}'

Using diff for two files and send by email [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have files like below. I use crontab every 5 min to check the files to see if the system's added one file, for example like this: AIR_2015xxxxT0yyyyyyyy.cfg. Then I need to use the diff command automatically between the last one and before the last one.
AIR_20151021T163514000.cfg
AIR_20151026T103845000.cfg
AIR_2015xxxxT0yyyyyyyy.cfg
I want to do this in a script like the one below:
#!/bin/bash
/var/opt/fds/
diff AIR_2015xxxxT0yyyyyyyy.cfg AIR_20151026T103845000.cfg > Test.txt
body(){
cat body.txt
}
(echo -e "$(body)") | -a Test.txt mailx -s 'Comparison' user#email.com
Given a list of files in the directory /var/opt/fds with names in the format:
AIR_YYYYmmddTHHMMSSfff.cfg
where the letter Y represents digits for the year, m for month, d for day, H for hour, M for minute, S for second, and f for fraction (milliseconds), then you need to establish the two most recent files in the directory to compare them.
One way to do this is:
cd /var/opt/fds || exit 1
old=
new=
for file in AIR_20[0-9][0-9]????T?????????.cfg
do
old=$new
new=$file
done
if [ -n "$old" ] && [ -n "$new" ]
then
diff "$old" "$new" > test.txt
mailx -a test.txt -s 'Comparison' user#example.com < body.txt
fi
Note that if the new file has a name containing letters x and y as shown in the question and comments, it will be listed after the names containing the time stamp as digits, so it will be picked up as the new file. It also assumes permission to write in the /var/opt/fds directory, and that the mail body file is present in that directory too. Those assumptions can be trivially fixed if necessary. The test.txt file should be deleted after it is sent, too, and you could check that it is non-empty before sending the email (just in case the two most recent files are in fact identical). You could embed a time-stamp in the generated file name containing the diffs instead of using test.txt:
output="diff.$(date +'%Y%m%dT%H%M%S000').txt"
and then use $output in place of test.txt.
The test ensures that there was both an old and a new name. The pattern match is sloppier than it could be, but using [0-9] or an appropriate subrange ([01], [0-3], [0-2], [0-5]) for the question marks makes the pattern unreadably long:
for file in AIR_20[0-9][0-9][01][0-9][0-3][0-9]T[0-2][0-9][0-5][0-9][0-5][0-9][0-9][0-9][0-9].cfg
It also probably provides very little extra in the way of protection. Of course, as shown, it imposes a Y2.1K crisis on the system, not that it is hard to fix that. You could also cut down the range of valid dates by basing it on today's date, but beware of the end of the year, etc. You might decide you only need entries from the last month or so.
Using globbing is generally better than trying to parse ls or find output. In this context, where the file names have a restricted set of characters in the name (no newlines, no blanks or tabs, no quotes, no dollar signs, etc), it is feasible to use either find or ls — but if you have to deal with arbitrary names created by random end users, those tools are not suitable. (The ls command does all sorts of weird stuff with weird names and basically is hard to use reliably in the face of user cussedness. The find command and its -print0 option can be used, especially if you have a sort that recognizes -z to work with null-terminated 'lines' and an xargs that supports -0 to handle such lines too — but you have to very careful.)
Note that this scheme does not keep a record of the last file analyzed (so if no new files appear for an hour, you might send a dozen copies of the same differences), nor does it directly report on the file names (but using diff -u or diff -c would include the file names being diffed in the output). Again, these issues can be worked around if that's appropriate (and it probably is). Keeping the record of which files have been compared is probably the hardest job; even that's not too bad:
echo "$old" "$new" >> reported.diffs
to record what's been processed; then
if grep -q "$old $new" reported.diffs
then : Already processed
else : Process $old and $new
fi

Delete some lines from text using Linux command

I know how to match text using regex patterns but not how to manipulate them.
I have used grep to match and extract lines from a text file, but I want to remove those lines from the text. How can I achieve this without having to write a python or bash shell script?
I have searched on Google and was recommended to use sed, but I am new to it and don't know how it works.
Can anyone point me in the right direction or help me achieve this goal?
The -v option to grep inverts the search, reporting only the lines that don't match the pattern.
Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write that to a temporary file and then copy or move the temporary file over the original.
grep -v pattern original.file > tmp.file
mv tmp.file original.file
You can also use sed, as shown in shellfish's answer.
There are multiple possible refinements for the grep solution, but for most people most of the time, what is shown is more or less adequate (it would be a good idea to use a per process intermediate file name, preferably with a random name such as the mktemp command gives you). You can add code to remove the intermediate file on an interrupt; suppress interrupts while moving back; use copy and remove instead of move if the original file has multiple hard links or is a symlink; etc. The sed command more or less works around these issues for you, but it is not cognizant of multiple hard links or symlinks.
Create the pattern which matches the lines using grep. Then create a sed script as follows:
sed -i '/pattern/d' file
Explanation:
The -i option means overwrite the input file, thus removing the files matching pattern.
pattern is the pattern you created for grep, e.g. ^a*b\+.
d this sed command stands for delete, it will delete lines matching the pattern.
file this is the input file, it can consist of a relative or absolute path.
For more information see man sed.

Search files with multiple "dot" characters

In Linux how do I use find and regular expressions or a similar way without writing a script to search for files with multiple "dots" but IGNORE extension.
For e.g search through the following files will only return the second file. In this example ".ext" is the extension.
testing1234hellothisisafile.ext
testing.1234.hello.this.is.a.file.ext
The solution should work with one or more dots in the file name (ignoring the extension dot). This should also work for any files i.e. with any file extension
Thanks in advance
So if I understand correctly, you want to get the filenames with at least two additional dots in the name. This would do:
$ find -regex ".*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
./testing1234.hellothisisafile.ext
$ find -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
The key dot detecting part is \.+ (at least one dot), coupled with the separating anything (but a dot, but the previous part covers it already; a safety measure against greedy matching) [^.]*. Together they make the core part of the regex - we don't care what is before or after, just that somewhere there are three dots. Three since also the one from the current dir matters — if you'll be searching from elsewhere, remove one \.+[^.]* group:
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
In this case the result is the same, since the name contains a lot of dots, but the second regex is the correct one.

Resources