Find/Sed not working as expected - linux

So essentially I want to understand why this command- sent to terminal as a one-liner doesn't work as intended. It runs for several minutes but my test files containing "teststring1" don't get replaced. Please without radically changing the syntax or asking why I am doing this from root, can anyone identify the reason why it doesn't?
cd /tmp;find / -maxdepth 3 -type f -print0 | xargs -0 sed -i 's/teststring1/itworked!/gI'

Citate from man sed:
If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read.
s/regular expression/replacement/flags
The value of flags in the substitute function is zero or more of the following:
- N Make the substitution only for the N'th occurrence of the regular expression in the pattern space.
- g Make the substitution for all non-overlapping matches of the regular expression, not just the first one.
- p Write the pattern space to standard output if a replacement was made. If the replacement string is identical to that which it replaces, it is still considered to have been a replacement.
- w file Append the pattern space to file if a replacement was made. If the replacement string is identical to that which it replaces, it is still considered to have been a replacement.
So find / -maxdepth 3 -type f -print0 | xargs -0 sed -e 's/[tT][eE][sS][tT][sS][tT][rR][iI][nN][gG]1/itworked!/g' -i will work as you want.
If you do not like ugly pattern for case insensitive matches, you can use perl instead of sed: find / -maxdepth 3 -type f -print0 | xargs -0 perl -pe 's/teststring1/itworked!/ig' -i

Related

Find top 500 oldest files

How can I find top 500 oldest files?
What I've tried:
find /storage -name "*.mp4" -o -name "*.flv" -type f | sort | head -n500
Find 500 oldest files using GNU find and GNU sort:
#!/bin/bash
typeset -a files
export LC_{TIME,NUMERIC}=C
n=0
while ((n++ < 500)) && IFS=' ' read -rd '' _ x; do
files+=("$x")
done < <(find /storage -type f \( -name '*.mp4' -o -name '*.flv' \) -printf '%T# %p\0' | sort -zn)
printf '%q\n' "${files[#]}"
Update - some explanation:
As mentioned by Jonathan in the comments, the proper way to handle this involves a lot of non-standard features which allows producing and consuming null-delimited lists so that arbitrary filenames can be handled safely.
GNU find's -printf produces the mtime (using the undocumented %T# format. My guess would be that whether or not this works depends upon your C library) followed by a space, followed by the filename with a terminating \0. Two additional non-standard features process the output: GNU sort's -z option, and the read builtin's -d option, which with an empty option argument delimits input on nulls. The overall effect is to have sort order the elements by the mtime produced by find's -printf string, then read the first 500 results into an array, using IFS to split read's input on space and discard the first element into the _ variable, leaving only the filename.
Finally, we print out the array using the %q format just to display the results unambiguously with a guarantee of one file per line.
The process substitution (<(...) syntax) isn't completely necessary but avoids the subshell induced by the pipe in versions that lack the lastpipe option. That can be an advantage should you decide to make the script more complicated than merely printing out the results.
None of these features are unique to GNU. All of this can be done using e.g. AST find(1), openbsd sort(1), and either Bash, mksh, zsh, or ksh93 (v or greater). Unfortunately the find format strings are incompatible.
The following finds the oldest 500 files with the oldest file at the top of the list:
find . -regex '.*.\(mp4\|flv\)' -type f -print0 | xargs -0 ls -drt --quoting-style=shell-always 2>/dev/null | head -n500
The above is a pipeline. The first step is to find the file names which is done by find. Any of find's options can be used to select the files of interest to you. The second step does the sorting. This is accomplished with xargs passing the file names to ls with sorts on time in reverse order so that the oldest files are at the top. The last step is head -n500 which takes just the first 500 file names. The first of those names will be the oldest file.
If there are more than 500 files, then head terminates before ls. If this happens, ls will issue a message: terminated by signal 13. I redirected stderr from the xargs command to eliminate this harmless message.
The above solution assumes that all the filenames can fit on one command line in your shell.

using find command in unix to search for a newline

I would like to search all .java files which have the newline escape sequence \n (backslash followed by 'n') in the files.
I am using this command:
find . –name "*.java" –print | xargs grep “\n”
but the result shows all lines in .java files having the letter n.
I want to search for a newline \n.
Can you please suggest a solution?
Example:
x.java
method abc{
String msg="\n Action not allowed.";}
y. java
method getMsg(){
String errMsg = "\n get is not allowed.";}
I want to search all *.java files having these type of strings defined with newline escape sequence.
It looks like you want to find lines containing the 2-character sequence \n. To do this, use grep -F, which treats the pattern as a fixed string rather than as a regular expression or escape sequence.
find . –name "*.java" –print | xargs grep -F "\n"
This -P grep will match a newline character. using '$'.
Since each line in my file contains a newline ,it will match every line.
grep -P '$' 1.c
I don't know why you want to match a newline character in files.That is strange.
I believe you're looking for this:
find . –name "*.java" –exec grep -H '"[^"]*\n' {} \;
The -H flag is to show the name of the file when there was a pattern match. If that doesn't work for you:
find . –name "*.java" –print0 | xargs -0 grep '"[^"]*\n'
If xargs -0 doesn't work for you:
find . –name "*.java" –print | xargs grep '"[^"]*\n'
If grep doesn't work for you:
find . –name "*.java" –print | xargs egrep '"[^"]*\n'
I needed this last version in Solaris, in modern systems the first one should work.
Finally, not sure if the pattern covers all your corner cases.

How can I search for keywords using logical AND conditions in files on Ubuntu?

I've been trying to search for multiple keyword in my Ubuntu files. I know how to do it for one file :
find /[myRep] -type f | xargs grep -rl "myFunction"
I wanted to do it for two keywords, such as myFunction and myClass, to get all the files that can instantiate myFunction in myClass.
I tryed to use :
find /[myRep] -type f | xargs grep -rl "myFunction" | xargs grep -rl "myClass"
I get results, but I'm not sure if this is accurate. Plus, I wonder if there is a simple way to add more logical conditions in the search, such as "OR", or "NOT" commands ...
Use Regex Alternation for Logical OR Conditions
If you're trying to find files that contain either "myFunction" or "myClass", you could use an extended regular expression with alternation For example:
# Using GNU Find and GNU Grep
find . exec grep --extended-regexp --files-with-matches 'myFunction|myClass' {} +
When passed a list of files to grep, this will show you matching files that contain either word.
Logical AND is Trickier
A logical AND is trickier because you have to account for ordering. You can either:
Filter files on one set of requirements, then the other.
Use a more full-feature program where you can store state.
As a trivial example of the first case:
# Use nulls to separate filenames for safety.
find /etc/passwd -print0 |
xargs -0 egrep -Zl root |
xargs -0 egrep -Zl www
As a contrived example of the second case, you could use GNU awk:
# Print name of current file if it matches both alternates
# on different lines.
find /etc/passwd -print0 |
xargs -0 awk 'BEGIN {matches=0};
/root|www/ {matches+=1};
matches >= 2 {print FILENAME; matches=0; nextfile}'
Your command looks fine to me. You first grep all files to find those which contain "myFunction" and then pass them through another grep for "myClass". As a result, you will end up with files containing both "myFunction" and "myClass".

How to search and replace using grep

I need to recursively search for a specified string within all files and subdirectories within a directory and replace this string with another string.
I know that the command to find it might look like this:
grep 'string_to_find' -r ./*
But how can I replace every instance of string_to_find with another string?
Another option is to use find and then pass it through sed.
find /path/to/files -type f -exec sed -i 's/oldstring/new string/g' {} \;
I got the answer.
grep -rl matchstring somedir/ | xargs sed -i 's/string1/string2/g'
You could even do it like this:
Example
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
This will search for the string 'windows' in all files relative to the current directory and replace 'windows' with 'linux' for each occurrence of the string in each file.
This works best for me on OS X:
grep -r -l 'searchtext' . | sort | uniq | xargs perl -e "s/matchtext/replacetext/" -pi
Source: http://www.praj.com.au/post/23691181208/grep-replace-text-string-in-files
Usually not with grep, but rather with sed -i 's/string_to_find/another_string/g' or perl -i.bak -pe 's/string_to_find/another_string/g'.
Other solutions mix regex syntaxes. To use perl/PCRE patterns for both search and replace, and process only matching files, this works quite well:
grep -rlIZPi 'match1' | xargs -0r perl -pi -e 's/match2/replace/gi;'
match1 and match2 are usually identical but match2 can contain more advanced features that are only relevant to the substitution, e.g. capturing groups.
Translation: grep recursively and list matching filenames, each separated by null to protect any special characters; pipe any filenames to xargs which is expecting a null-separated list; if any filenames are received, pass them to perl to perform the actual substitutions.
For case-sensitive matching, drop the i flag from grep and the i pattern modifier from the s/// expression, but not the i flag from perl itself. To include binary files, remove the I flag from grep.
Be very careful when using find and sed in a git repo! If you don't exclude the binary files you can end up with this error:
error: bad index file sha1 signature
fatal: index file corrupt
To solve this error you need to revert the sed by replacing your new_string with your old_string. This will revert your replaced strings, so you will be back to the beginning of the problem.
The correct way to search for a string and replace it is to skip find and use grep instead in order to ignore the binary files:
sed -ri -e "s/old_string/new_string/g" $(grep -Elr --binary-files=without-match "old_string" "/files_dir")
Credits for #hobs
Here is what I would do:
find /path/to/dir -type f -iname "*filename*" -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
this will look for all files containing filename in the file's name under the /path/to/dir, than for every file found, search for the line with searchstring and replace old with new.
Though if you want to omit looking for a specific file with a filename string in the file's name, than simply do:
find /path/to/dir -type f -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
This will do the same thing above, but to all files found under /path/to/dir.
Modern rust tools can be used to do this job.
For example to replace in all (non ignored) files "oldstring" and "oldString" with "newstring" and "newString" respectively you can :
Use fd and sd
fd -tf -x sd 'old([Ss]tring)' 'new$1' {}
Use ned
ned -R -p 'old([Ss]tring)' -r 'new$1' .
Use ruplacer
ruplacer --go 'old([Ss]tring)' 'new$1' .
Ignored files
To include ignored (by .gitignore) and hidden files you have to specify it :
use -IH for fd,
use --ignored --hiddenfor ruplacer.
Another option would be to just use perl with globstar.
Enabling shopt -s globstar in your .bashrc (or wherever) allows the ** glob pattern to match all sub-directories and files recursively.
Thus using perl -pXe 's/SEARCH/REPLACE/g' -i ** will recursively
replace SEARCH with REPLACE.
The -X flag tells perl to "disable all warnings" - which means that
it won't complain about directories.
The globstar also allows you to do things like sed -i 's/SEARCH/REPLACE/g' **/*.ext if you wanted to replace SEARCH with REPLACE in all child files with the extension .ext.

Remove special characters in linux files

I have a lot of files *.java, *.xml. But a guy wrote some comments and Strings with spanish characters. I been searching on the web how to remove them.
I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders?
If that's what you really want, you can use find, almost as you are using it.
find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'
The differences:
The path . is implicit if no path is supplied.
This command only operates on *.java and *.xml files.
execdir is more secure than exec (read the man page).
-i tells sed to modify the file argument in place. Read the man page to see how to use it to make a backup.
{} represents a path argument which find will substitute in.
The ; is part of the find syntax for exec/execdir.
You're almost there :)
find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
^^ ^^
From sed(1):
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
From find(1):
-exec command ;
Execute command; true if 0 status is returned. All
following arguments to find are taken to be arguments to
the command until an argument consisting of `;' is
encountered. The string `{}' is replaced by the current
file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it
is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or
quoted to protect them from expansion by the shell. See
the EXAMPLES section for examples of the use of the -exec
option. The specified command is run once for each
matched file. The command is executed in the starting
directory. There are unavoidable security problems
surrounding use of the -exec action; you should use the
-execdir option instead.
tr is the tool for the job:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
put.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a
single occurrence of that character
piping your input through tr -d áíéóúñ will probably do what you want.
Why are you trying to remove only characters with diacritic signs? It probably worth removing all characters with codes not in the range 0-127, so removal regexp will be s/[\0x80-\0xFF]//g if you're sure that your files should not contain higher ascii.

Resources