Why does sed leave many files around? - linux

I noticed many files in my directory, called "sedAbCdEf" or such.
Why does it create these files?
Do these have any value after a script has run?
Can I send these files to another location , e.g. /tmp/?
Update:
I checked the scripts until I found one which makes the files. Here is some sample code:
#!/bin/bash
a=1
b=`wc -l < ./file1.txt`
while [ $a -le $b ]; do
for i in `sed -n "$a"p ./file1.txt`; do
for j in `sed -n "$a"p ./file2.txt`; do
sed -i "s/$i/\nZZ$jZZ\n/g" ./file3.txt
c=`grep -c $j file3.txt`
if [ "$c" -ge 1 ]
then
echo $j >> file4.txt
echo "Replaced "$i" with "$j" "$c" times ("$a"/"$b")."
fi
echo $i" not found ("$a"/"$b")."
a=`expr $a + 1`
done
done
done

Why does it create these files?
sed -i "s/$i/\nZZ$jZZ\n/g" ./file3.txt
the -i option makes sed stores the stdout output into a temporary file.
After sed is done, it will rename this temp file to replace your original file3.txt.
If something is wrong when sed is running, these sedAbCdE temp files will be left there.
Do these have any value after a script has run?
Your old file is untouched. Usually no.
Can I send these files to another location , e.g. /tmp/?
Yes you can, see above.
Edit: see this for further reading.

If you use -i option (it means make changes inplace) sed writes to a temporary file and then renames it to your file. Thus if operation is aborted your file is left unchanged.
You can see which files are opened, renamed with strace:
$ strace -e open,rename sed -i 's/a/b/g' somefile
Note: somefile is opened as readonly.
It seems there is no way to override the backup directory. GNU sed always writes in the file's directory (±symlinks). From sed/execute.c:
if (follow_symlinks)
input->in_file_name = follow_symlink (name);
else
input->in_file_name = name;
/* get the base name */
tmpdir = ck_strdup(input->in_file_name);
if ((p = strrchr(tmpdir, '/')))
*(p + 1) = 0;
else
strcpy(tmpdir, ".");
Prefix sed is hardcoded:
output_file.fp = ck_mkstemp (&input->out_file_name, tmpdir, "sed");

This may be that, since you have used too much sed actions, and in a looped pattern, sed may be making tmp files which are not removed properly.
Sed creates un-deleteable files in Windows
Take a look at this post, sed have such an issue to be reported before. The better way is to make a script that removes the files, or create a function that remove all files that deletes all files with name starting with sed, (^sed* )like thing.

Related

Linux : check if something is a file [ -f not working ]

I am currently trying to list the size of all files in a directory which is passed as the first argument to the script, but the -f option in Linux is not working, or am I missing something.
Here is the code :
for tmp in "$1/*"
do
echo $tmp
if [ -f "$tmp" ]
then num=`ls -l $tmp | cut -d " " -f5`
echo $num
fi
done
How would I fix this problem?
I think the error is with your glob syntax which doesn't work in either single- or double-quotes,
for tmp in "$1"/*; do
..
Do the above to expand the glob outside the quotes.
There are couple more improvements possible in your script,
Double-quote your variables to prevent from word-splitting, e.g. echo "$temp"
Backtick command substitution `` is legacy syntax with several issues, use the $(..) syntax.
The [-f "filename"] condition check in linux is for checking the existence of a file and it is a regular file. For reference, use this text as reference,
-b FILE
FILE exists and is block special
-c FILE
FILE exists and is character special
-d FILE
FILE exists and is a directory
-e FILE
FILE exists
-f FILE
FILE exists and is a regular file
-g FILE
FILE exists and is set-group-ID
-G FILE
FILE exists and is owned by the effective group ID
I suggest you try with [-e "filename"] and see if it works.
Cheers!
At least on the command line, this piece of script does it:
for tmp in *; do echo $tmp; if [ -f $tmp ]; then num=$(ls -l $tmp | sed -e 's/ */ /g' | cut -d ' ' -f5); echo $num; fi; done;
If cut uses space as delimiter, it cuts at every space sign. Sometimes you have more than one space between columns and the count can easily go wrong. I'm guessing that in your case you just happened to echo a space, which looks like nothing. With the sed command I remove extra spaces.

Make SED command work for any variable

deploy.sh
USERNAME="Tom"
PASSWORD="abc123"
FILE="config.conf"
sed -i "s/\PLACEHOLDER_USERNAME/$USERNAME/g" $FILE
sed -i "s/\PLACEHOLDER_PASSWORD/$PASSWORD/g" $FILE
config.conf
deloy="PLACEHOLDER_USERNAME"
pass="PLACEHOLDER_PASSWORD"
This file puts my variables defined in deploy into my config file. I can't source the file so I want put my variables in this way.
Question
I want a command that is generic to work for all placeholder variables using some sort of while loop rather than needing one command per variable. This means any term starting with placeholder_ in the file will try to be replaced with the value of the variable defined already in deploy.sh
All variables should be set and not empty. I guess if there is the ability to print a warning if it can't find the variable that would be good but it isn't mandatory for this.
Basically, use shell code to write a sed script and then use sed -i .bak -f sed.script config.conf to apply it:
trap "rm -f sed.script; exit 1" 0 1 2 3 13 15
for var in USERNAME PASSWORD
do
echo "s/PLACEHOLDER_$var/${!var}/"
done > sed.script
sed -i .bak -f sed.script config.conf
rm -f sed.script
trap 0
The main 'tricks' here are:
knowing that ${!var} expands to the value of the variable named by $var, and
knowing that sed will take a script full of commands via -f sed.script, and
knowing how to use trap to ensure temporary files are cleaned up.
You could also use sed -e "s/.../.../" -e "s/.../.../" -i .bak config.conf too, but the script file is easier, I think, especially if you have more than 2 values to substitute. If you want to go down this route, use a bash array to hold the arguments to sed. A more careful script would use at least $$ in the script file name, or use mktemp to create the temporary file.
Revised answer
The trouble is, although much closer to being generic, it is still not generic since I have to manually put in what variables I want to change. Can it not be more like "for each placeholder_, find the variable in deploy.sh and add that variable, so it can work for any number of variables.
So, find what the variables are in the configuration file, then apply the techniques of the previous answer to solve that problem:
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
for file in "$#"
do
for var in $(sed 's/.*PLACEHOLDER_\([A-Z0-9_]*\).*/\1/' "$file")
do
value="${!var}"
[ -z "$value" ] && { echo "$0: variable $var not set for $file" >&2; exit 1; }
echo "s/PLACEHOLDER_$var/$value/"
done > $tmp
sed -i .bak -f $tmp "$file"
rm -f $tmp
done
trap 0
This code still pulls the values from the environment. You need to clarify what is required if you want to extract the settings from the shell script, but it can be done — the script will have to be sufficiently self-aware to find its source so it can search it for the names. But the basics are in this answer; the rest is a question of tinkering until it does what you need.
#!/bin/ksh
TemplateFile=$1
SourceData=$2
(sed 's/.*/#V0r:PLACEHOLDER_&:r0V#/' ${SourceData}; cat ${TemplateFile}) | sed -n "
s/$/²/
H
$ {
x
s/^\(\n *\)*//
# also reset t flag
t varxs
:varxs
s/^#V0r:\([a-zA-Z0-9_]\{1,\}\)=\([^²]*\):r0V#²\(\n.*\)\"\1\"/#V0r:\1=\2:r0V#²\3\2/
t varxs
# clean the line when no more occurance in text
s/^[^²]*:r0V#²\n//
# and next
t varxs
# clean the marker
s/²\(\n\)/\1/g
s/²$//
# display the result
p
}
"
call like this: YourScript.ksh YourTemplateFile YourDataSourceFile where:
YourTemplateFile is the file that contain the structure with generic value like deloy="PLACEHOLDER_USERNAME"
YourDataSourceFile is the file that contain all the peer Generic value = specific value like USERNAME="Tom"

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

How to check if sed has changed a file

I am trying to find a clever way to figure out if the file passed to sed has been altered successfully or not.
Basically, I want to know if the file has been changed or not without having to look at the file modification date.
The reason why I need this is because I need to do some extra stuff if sed has successfully replaced a pattern.
I currently have:
grep -q $pattern $filename
if [ $? -eq 0 ]
then
sed -i s:$pattern:$new_pattern: $filename
# DO SOME OTHER STUFF HERE
else
# DO SOME OTHER STUFF HERE
fi
The above code is a bit expensive and I would love to be able to use some hacks here.
A bit late to the party but for the benefit of others, I found the 'w' flag to be exactly what I was looking for.
sed -i "s/$pattern/$new_pattern/w changelog.txt" "$filename"
if [ -s changelog.txt ]; then
# CHANGES MADE, DO SOME STUFF HERE
else
# NO CHANGES MADE, DO SOME OTHER STUFF HERE
fi
changelog.txt will contain each change (ie the changed text) on it's own line. If there were no changes, changelog.txt will be zero bytes.
A really helpful sed resource (and where I found this info) is http://www.grymoire.com/Unix/Sed.html.
I believe you may find these GNU sed extensions useful
t label
If a s/// has done a successful substitution since the last input line
was read and since the last t or T command, then branch to label; if
label is omitted, branch to end of script.
and
q [exit-code]
Immediately quit the sed script without processing any more input, except
that if auto-print is not disabled the current pattern space will be printed.
The exit code argument is a GNU extension.
It seems like exactly what are you looking for.
This might work for you (GNU sed):
sed -i.bak '/'"$old_pattern"'/{s//'"$new_pattern"'/;h};${x;/./{x;q1};x}' file || echo changed
Explanation:
/'"$old_pattern"'/{s//'"$new_pattern"'/;h} if the pattern space (PS) contains the old pattern, replace it by the new pattern and copy the PS to the hold space (HS).
${x;/./{x;q1};x} on encountering the last line, swap to the HS and test it for the presence of any string. If a string is found in the HS (i.e. a substitution has taken place) swap back to the original PS and exit using the exit code of 1, otherwise swap back to the original PS and exit with the exit code of 0 (the default).
You can diff the original file with the sed output to see if it changed:
sed -i.bak s:$pattern:$new_pattern: "$filename"
if ! diff "$filename" "$filename.bak" &> /dev/null; then
echo "changed"
else
echo "not changed"
fi
rm "$filename.bak"
You could use awk instead:
awk '$0 ~ p { gsub(p, r); t=1} 1 END{ exit (!t) }' p="$pattern" r="$repl"
I'm ignoring the -i feature: you can use the shell do do redirections as necessary.
Sigh. Many comments below asking for basic tutorial on the shell. You can use the above command as follows:
if awk '$0 ~ p { gsub(p, r); t=1} 1 END{ exit (!t) }' \
p="$pattern" r="$repl" "$filename" > "${filename}.new"; then
cat "${filename}.new" > "${filename}"
# DO SOME OTHER STUFF HERE
else
# DO SOME OTHER STUFF HERE
fi
It is not clear to me if "DO SOME OTHER STUFF HERE" is the same in each case. Any similar code in the two blocks should be refactored accordingly.
In macos I just do it as follows:
changes=""
changes+=$(sed -i '' "s/$to_replace/$replacement/g w /dev/stdout" "$f")
if [ "$changes" != "" ]; then
echo "CHANGED!"
fi
I checked, and this is faster than md5, cksum and sha comparisons
I know it is a old question and using awk instead of sed is perhaps the best idea, but if one wants to stick with sed, an idea is to use the -w flag. The file argument to the w flag only contains the lines with a match. So, we only need to check that it is not empty.
perl -sple '$replaced++ if s/$from/$to/g;
END{if($replaced != 0){ print "[Info]: $replaced replacement done in $ARGV(from/to)($from/$to)"}
else {print "[Warning]: 0 replacement done in $ARGV(from/to)($from/$to)"}}' -- -from="FROM_STRING" -to="$DESIRED_STRING" </file/name>
Example:
The command will produce the following output, stating the number of changes made/file.
perl -sple '$replaced++ if s/$from/$to/g;
END{if($replaced != 0){ print "[Info]: $replaced replacement done in $ARGV(from/to)($from/$to)"}
else {print "[Warning]: 0 replacement done in $ARGV(from/to)($from/$to)"}}' -- -from="timeout" -to="TIMEOUT" *
[Info]: 5 replacement done in main.yml(from/to)(timeout/TIMEOUT)
[Info]: 1 replacement done in task/main.yml(from/to)(timeout/TIMEOUT)
[Info]: 4 replacement done in defaults/main.yml(from/to)(timeout/TIMEOUT)
[Warning]: 0 replacement done in vars/main.yml(from/to)(timeout/TIMEOUT)
Note: I have removed -i from the above command , so it will not update the files for the people who are just trying out the command. If you want to enable in-place replacements in the file add -i after perl in above command.
check if sed has changed MANY files
recursive replace of all files in one directory
produce a list of all modified files
workaround with two stages: match + replace
g='hello.*world'
s='s/hello.*world/bye world/g;'
d='./' # directory of input files
o='modified-files.txt'
grep -r -l -Z -E "$g" "$d" | tee "$o" | xargs -0 sed -i "$s"
the file paths in $o are zero-delimited
$ echo hi > abc.txt
$ sed "s/hi/bye/g; t; q1;" -i abc.txt && (echo "Changed") || (echo "Failed")
Changed
$ sed "s/hi/bye/g; t; q1;" -i abc.txt && (echo "Changed") || (echo "Failed")
Failed
https://askubuntu.com/questions/1036912/how-do-i-get-the-exit-status-when-using-the-sed-command/1036918#1036918
Don't use sed to tell if it has changed a file; instead, use grep to tell if it is going to change a file, then use sed to actually change the file. Notice the single line of sed usage at the very end of the Bash function below:
# Usage: `gs_replace_str "regex_search_pattern" "replacement_string" "file_path"`
gs_replace_str() {
REGEX_SEARCH="$1"
REPLACEMENT_STR="$2"
FILENAME="$3"
num_lines_matched=$(grep -c -E "$REGEX_SEARCH" "$FILENAME")
# Count number of matches, NOT lines (`grep -c` counts lines),
# in case there are multiple matches per line; see:
# https://superuser.com/questions/339522/counting-total-number-of-matches-with-grep-instead-of-just-how-many-lines-match/339523#339523
num_matches=$(grep -o -E "$REGEX_SEARCH" "$FILENAME" | wc -l)
# If num_matches > 0
if [ "$num_matches" -gt 0 ]; then
echo -e "\n${num_matches} matches found on ${num_lines_matched} lines in file"\
"\"${FILENAME}\":"
# Now show these exact matches with their corresponding line 'n'umbers in the file
grep -n --color=always -E "$REGEX_SEARCH" "$FILENAME"
# Now actually DO the string replacing on the files 'i'n place using the `sed`
# 's'tream 'ed'itor!
sed -i "s|${REGEX_SEARCH}|${REPLACEMENT_STR}|g" "$FILENAME"
fi
}
Place that in your ~/.bashrc file, for instance. Close and reopen your terminal and then use it.
Usage:
gs_replace_str "regex_search_pattern" "replacement_string" "file_path"
Example: replace do with bo so that "doing" becomes "boing" (I know, we should be fixing spelling errors not creating them :) ):
$ gs_replace_str "do" "bo" test_folder/test2.txt
9 matches found on 6 lines in file "test_folder/test2.txt":
1:hey how are you doing today
2:hey how are you doing today
3:hey how are you doing today
4:hey how are you doing today hey how are you doing today hey how are you doing today hey how are you doing today
5:hey how are you doing today
6:hey how are you doing today?
$SHLVL:3
Screenshot of the output:
References:
https://superuser.com/questions/339522/counting-total-number-of-matches-with-grep-instead-of-just-how-many-lines-match/339523#339523
https://unix.stackexchange.com/questions/112023/how-can-i-replace-a-string-in-a-files/580328#580328

How can I re-add a unicode byte order marker in linux?

I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But when passing these back to windows, it does not like any of the parts other than the first one as only it has the FFFE byte order marker on.
How can I add this two byte code using echo (or any other bash command)?
Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo adds the BOM to the UTF-8 encoded file foo. Usefull is that it also converts ASCII files to UTF8 with BOM
To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.
sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*
straceing this shows sed creates a temp file with a name starting with "sed". If you know for sure there is no BOM already, you can simplify the command:
sed -i '1s/^/\xff\xfe/' foo-*
Make sure you need to set UTF-16, because i.e. UTF-8 is different.
For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb' option:
$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a ...hello.
(-e means runs in ex mode instead of visual mode; -s means don’t print status messages; -c means “do this”)
Try uconv
uconv --add-signature
Something like (backup first)):
for i in $(ls *.sql)
do
cp "$i" "$i.temp"
printf '\xFF\xFE' > "$i"
cat "$i.temp" >> "$i"
rm "$i.temp"
done
Matthew Flaschen's answer is a good one, however it has a couple of flaws.
There's no check that the copy succeeded before the original file is truncated. It would be better to make everything contingent on a successful copy, or test for the existence of the temporary file, or to operate on the copy. If you're a belt-and-suspenders kind of person, you'd do a combo as I've illustrated below
The ls is unnecessary.
I'd use a better variable name than "i" - perhaps "file".
Of course, you could be very paranoid and check for the existence of the temporary file at the beginning so you don't accidentally overwrite it and/or use a UUID or a generated file name. One of mktemp, tempfile or uuidgen would do the trick.
td=TMPDIR
export TMPDIR=
usertemp=~/temp # set this to use a temp directory on the same filesystem
# you could use ./temp to ensure that it's one the same one
# you can use mktemp -d to create the dir instead of mkdir
if [[ ! -d $usertemp ]] # if this user temp directory doesn't exist
then # then create it, unless you can't
mkdir $usertemp || export TMPDIR=$td # if you can't create it and TMPDIR is/was
fi # empty then mktemp automatically falls
# back to /tmp
for file in *.sql
do
# TMPDIR if set overrides the argument to -p
temp=$(mktemp -p $usertemp) || { echo "$0: Unable to create temp file."; exit 1; }
{ printf '\xFF\xFE' > "$temp" &&
cat "$file" >> "$temp"; } || { echo "$0: Write failed on $file"; exit 1; }
{ rm "$file" &&
mv "$temp" "$file"; } || { echo "$0: Replacement failed for $file; exit 1; }
done
export TMPDIR=$td
Traps might be better than all the separate error handlers I've added.
No doubt all this extra caution is overkill for a one-shot script, but these techniques can save you when push comes to shove, especially in a multi-file operation.
$ printf '\xEF\xBB\xBF' > bom.txt
Then check:
$ grep -rl $'\xEF\xBB\xBF' .
./bom.txt

Resources