How to directly overwrite with 'unexpand' (spaces-to-tabs conversion)? - linux

I'm trying to use something along the lines of
unexpand -t 4 *.php
but am unsure how to write this command to do what I want.
Weirdly,
unexpand -t 4 file.php > file.php
gives me an empty file. (i.e. overwriting file.php with nothing)
I can specify multiple files okay, but don't know how to then overwrite each file.
I could use my IDE, but there are ~67000 instances of to be replaced over 200 files, and this will take a while.
I expect that the answers to my question(s) will be standard unix fare, but I'm still learning...

You can very seldom use output redirection to replace the input. Replacing works with commands that support it internally (since they then do the basic steps themselves). From the shell level, it's far better to work in two steps, like so:
Do the operation on foo, creating foo.tmp
Move (rename) foo.tmp to foo, overwriting the original
This will be fast. It will require a bit more disk space, but if you do both steps before continuing to the next file, you will only need as much extra space as the largest single file, this should not be a problem.
Sketch script:
for a in *.php
do
unexpand -t 4 $a >$a-notab
mv $a-notab $a
done
You could do better (error-checking, and so on), but that is the basic outline.

Here's the command I used:
for p in $(find . -iname "*.js")
do
unexpand -t 4 $(dirname $p)/"$(basename $p)" > $(dirname $p)/"$(basename $p)-tab"
mv $(dirname $p)/"$(basename $p)-tab" $(dirname $p)/"$(basename $p)"
done
This version changes all files within the directory hierarchy rooted at the current working directory.
In my case, I only wanted to make this change to .js files; you can omit the iname clause from find if you wish, or use different args to cast your net differently.
My version wraps filenames in quotes, but it doesn't use quotes around 'interesting' directory names that appear in the paths of matching files.
To get it all on one line, add a semi after lines 1, 3, & 4.
This is potentially dangerous, so make a backup or use git before running the command. If you're using git, you can verify that only whitespace was changed with git diff -w.

Related

Is mv * a destructive command on a directory with 2 or more files? What other linux commands have similar behavior?

When I run mv * with no destination directory on a directory with say 10 files, I get an error as follows
root#tryit-apparent:~/test2# ls
file1.txt file10.txt file2.txt file3.txt file4.txt file5.txt file6.txt file7.txt file8.txt file9.txt
root#tryit-apparent:~/test2# mv *
mv: target 'file9.txt' is not a directory
When I run it on a directory with two files it overwrites the file with one just file.
root#tryit-apparent:~/test# ls
tempfile tempfile2
root#tryit-apparent:~/test# mv *
root#tryit-apparent:~/test# ls
tempfile2
I read the man pages but couldn't understand this behaviour. Would like to know what's causing this behavior and what's going on under the hood?
What other linux commands have such pitfalls and have destructive actions that are executed silently if the user is not aware of such behavior?
In Unix, unlike some other OSes, wildcards like * are expanded by the shell, before being passed to the command being run. So when you run mv * with tempfile and tempfile2 as the only files in the current directory, what the shell actually executes is mv tempfile tempfile2, which as normal will rename the first file over the second one, erasing the previous contents of tempfile2. The shell doesn't know or care that this command treats its last argument specially, and mv has no way of knowing that its two arguments came from a wildcard expansion. Hence the behavior you're seeing.
You can have similar issues even with more than two files. For instance, if you have files named tempfile1 through tempfile9 and a subdirectory named zyzzx, then mv * will move all your temp files into the zyzzx subdirectory.
Mostly, you just have to be aware that this is how wildcards work, and use caution with commands that treat one of their arguments specially (e.g. as a destination). cp is another one to watch out for, for the same reason. For interactive usage, you may want to get used to using the -i option to mv and cp, which asks for confirmation before overwriting files; or use an alias to make this the default.
Move is intented to move or rename a file or a directory, so you need a source and a destination.
If the path of the file is unchange then it becomes a rename operation.
If the path changes and the name remains the same it's a move.
You can do both by chaning the path and the name.
Man pages can be challenging to wrap your head around.
Googling can help: https://www.howtoforge.com/linux-mv-command/
Off the top of my head, you could do a cp operation followed by a rm to achieve similar results, but that's two steps, rather than one.

Recursive Text Substitution and File Extension Rename

I am using an application that creates a text file on a Linux server. I then have the ability to execute a shell script (BASH 3.2.57) in which I need to convert the text file from Unix line endings to DOS and also change the extension of the file from .txt to .log.
I currently have a sed based command to do this. This command is rewritten by the application at run time to point to the specific folder and file name, in this example where you see ABC (all capital 3 letters in all my examples are a variable that can be any 3 letters).
pushd /rootfolder/parentfolder/ABC/
sed 's/$/\r/' prABC.txt > prABC.log
popd
The problem with this is that if a user runs the application for 2 different groups, say ABC and DEF at nearly the same time, the script will get overwritten with the DEF variables before ABC had a chance to fire off and do its thing with the file. Additionally the .txt is left in the folder regardless and I would like that to be removed.
A friend of mine came up with the following code that seems to work if its determined to be our best solution, but I would think and hope we have a cleaner more dynamic way to do this. Also this current method requires that when my user decides to add a GHI directory and file I now have to update the code, which i can program my application to do for me but i don't want this script to have to be rewritten every time the application wants to use it.
pushd /rootfolder/parentfolder/ABC
if [[ -f prABC.txt ]]
then
sed 's/$/\r/' prABC.txt > prABC.log
rm prABC.txt
fi
popd
pushd /rootfolder/parentfolder/DEF
if [[ -f prABC.txt ]]
then
sed 's/$/\r/' prABC.txt > prABC.log
rm prABC.txt
fi
popd
I would like to call this script at anytime from my application and it find any file named pr*.txt below the /rootfolder/parentfolder/ directory (if that has to include the parentfolder in its search that won't be a problem) and convert the line endings from LF to CRLF and change the extension of the file from .txt to .log.
I've done a ton of searching and have found near solutions for this but not exactly what I need and I want to be sure it's as safe as possible (issues with using "find with for". I don't know what utilities are installed on this build so i would like to keep it as basic/supportable as possible Thanks in advance :)
You should almost never need pushd and popd in scripts. In fact, you rarely need cd, either.
#!/bin/bash
for d in /rootfolder/parentfolder/ABC /rootfolder/parentfolder/DEF
do
if [[ -f "$d/prABC.txt" ]]
then
sed 's/$/\r/' "$d/prABC.txt" > "$d/prABC.log" &&
rm "$d/prABC.txt"
fi
done
Recall that a && b is shorthand for
if a; then
b
fi
In other words, if sed fails (because the source file can't be read, or the destination can't be written) we don't rm the source file. There should be an error message already so we don't add another one.
Not only is this more succinct, it is also easier to change if you decide that the old file should be renamed instead of removed, or you want to filter out all lines which contain "beef" in the sed script. Generally you should avoid repeated code; see also the DRY principle on Wikipedia.
Something is seriously wrong somewhere if you require DOS line endings in your files on Unix.

RH Linux Bash Script help. Need to move files with specific words in the file

I have a RedHat linux box and I had written a script in the past to move files from one location to another with a specific text in the body of the file.
I typically only write scripts once a year so every year I forget more and more... That being said,
Last year I wrote this script and used it and it worked.
For some reason, I can not get it to work today and I know it's a simple issue and I shouldn't even be asking for help but for some reason I'm just not looking at it correctly today.
Here is the script.
ls -1 /var/text.old | while read file
do
grep -q "to.move" $file && mv $file /var/text.old/TBD
done
I'm listing all the files inside the /var/text.old directory.
I'm reading each file
then I'm grep'ing for "to.move" and holing the results
then I'm moving the resulting found files to the folder /var/text.old/TBD
I am an admin and I have rights to the above files and folders.
I can see the data in each file
I can mv them manually
I have use pwd to grab the correct spelling of the directory.
If anyone can just help me to see what the heck I'm missing here that would really make my day.
Thanks in advance.
UPDATE:
The files I need to move do not have Whitespaces.
The Error I'm getting is as follows:
grep: 9829563.msg: No such file or directory
NOTE: the file "982953.msg" is one of the files I need to move.
Also note: I'm getting this error for every file in the directory that I'm listing.
You didn't post any error, but I'm gonna take a guess and say that you have a filename with a space or special shell character.
Let's say you have 3 files, and ls -1 gives us:
hello
world
hey there
Now, while splits on the value of the special $IFS variable, which is set to <space><tab><newline> by default.
So instead of looping of 3 values like you expect (hello, world, and hey there), you loop over 4 values (hello, world, hey, and there).
To fix this, we can do 2 things:
Set IFS to only a newline:
IFS="
"
ls -1 /var/text.old | while read file
...
In general, I like setting IFS to a newline at the start of the script, since I consider this to be slightly "safer", but opinions on this probably vary.
But much better is to not parse the output of ls, and use for:
for file in /var/text.old/*`; do
This won't fork any external processes (piping to ls to while starts 2), and behaves "less surprising" in other ways. See here for some examples.
The second problem is that you're not quoting $file. You should always quote pathnames with double quoted: "$file" for the same reasons. If $file has a space (or a special shell character, such as *, the meaning of your command changes:
file=hey\ *
mv $file /var/text.old/TBD
Becomes:
mv hey * /var/text.old/TBD
Which is obviously very different from what you intended! What you intended was:
mv "hey *" /var/text.old/TBD

How to exclude multiple directories with Exuberant ctags?

I have looked and tried to use exuberant ctags with no luck with what I want to do. I am on a Mac trying to work in a project where I want to exclude such directories as .git, node_modules, test, etc. When I try something like ctags -R --exclude=[.git, node_modules, test] I get nothing in return. I really only need to have it run in my core directory. Any ideas on how to accomplish this?
The --exclude option does not expect a list of files. According to ctags's man page, "This option may be specified as many times as desired." So, it's like this:
ctags -R --exclude=.git --exclude=node_modules --exclude=test
Read The Fantastic Manual should always be the first step of any attempt to solve a problem.
From $ man ctags:
--exclude=[pattern]
Add pattern to a list of excluded files and directories. This option may
be specified as many times as desired. For each file name considered by
both the complete path (e.g. some/path/base.ext) and the base name (e.g.
base.ext) of the file, thus allowing patterns which match a given file
name irrespective of its path, or match only a specific path. If appro-
priate support is available from the runtime library of your C compiler,
then pattern may contain the usual shell wildcards (not regular expres-
sions) common on Unix (be sure to quote the option parameter to protect
the wildcards from being expanded by the shell before being passed to
ctags; also be aware that wildcards can match the slash character, '/').
You can determine if shell wildcards are available on your platform by
examining the output of the --version option, which will include "+wild-
cards" in the compiled feature list; otherwise, pattern is matched
against file names using a simple textual comparison.
If pattern begins with the character '#', then the rest of the string is
interpreted as a file name from which to read exclusion patterns, one per
line. If pattern is empty, the list of excluded patterns is cleared.
Note that at program startup, the default exclude list contains "EIFGEN",
"SCCS", "RCS", and "CVS", which are names of directories for which it is
generally not desirable to descend while processing the --recurse option.
From the two first sentences you get:
$ ctags -R --exclude=dir1 --exclude=dir2 --exclude=dir3 .
which may be a bit verbose but that's what aliases and mappings and so on are for. As an alternative, you get this from the second paragraph:
$ ctags -R --exclude=#.ctagsignore .
with the following in .ctagsignore:
dir1
dir2
dir3
which works out to excluding those 3 directories without as much typing.
You can encapsulate a comma separated list with curly braces to handle multiples with one --exclude option:
ctags -R --exclude={folder1,folder2,folder3}
This appears to only work for folders in the root of where you're issuing the command. Excluding nested folders requires a separate --exclude option.
The other answers were straight to the point, and I thought a little example may help:
You should add an asterisk unix-like style to exclude the whole directory.
ctags -R --exclude={.git/*,.env/*,.idea/*} ./
A bit late but following on romainl response, you could use your .gitignore file as a basis, you only need to remove any leading slashes from the file, like so:
sed "s/\///" .gitignore > .ctagsignore
ctags -R --exclude=#.ctagsignore
I really only need to have it run in my core directory.
Simply remove the -R (recursion) flag!!!

Modifying files nested in tar archive

I am trying to do a grep and then a sed to search for specific strings inside files, which are inside multiple tars, all inside one master tar archive. Right now, I modify the files by
First extracting the master tar archive.
Then extracting all the tars inside it.
Then doing a recursive grep and then sed to replace a specific string in files.
Finally packaging everything again into tar archives, and all the archives inside the master archive.
Pretty tedious. How do I do this automatically using shell scripting?
There isn't going to be much option except automating the steps you outline, for the reasons demonstrated by the caveats in the answer by Kimvais.
tar modify operations
The tar command has some options to modify existing tar files. They are, however, not appropriate for your scenario for multiple reasons, one of them being that it is the nested tarballs that need editing rather than the master tarball. So, you will have to do the work longhand.
Assumptions
Are all the archives in the master archive extracted into the current directory or into a named/created sub-directory? That is, when you run tar -tf master.tar.gz, do you see:
subdir-1.23/tarball1.tar
subdir-1.23/tarball2.tar
...
or do you see:
tarball1.tar
tarball2.tar
(Note that nested tars should not themselves be gzipped if they are to be embedded in a bigger compressed tarball.)
master_repackager
Assuming you have the subdirectory notation, then you can do:
for master in "$#"
do
tmp=$(pwd)/xyz.$$
trap "rm -fr $tmp; exit 1" 0 1 2 3 13 15
cat $master |
(
mkdir $tmp
cd $tmp
tar -xf -
cd * # There is only one directory in the newly created one!
process_tarballs *
cd ..
tar -czf - * # There is only one directory down here
) > new.$master
rm -fr $tmp
trap 0
done
If you're working in a malicious environment, use something other than tmp.$$ for the directory name. However, this sort of repackaging is usually not done in a malicious environment, and the chosen name based on process ID is sufficient to give everything a unique name. The use of tar -f - for input and output allows you to switch directories but still handle relative pathnames on the command line. There are likely other ways to handle that if you want. I also used cat to feed the input to the sub-shell so that the top-to-bottom flow is clear; technically, I could improve things by using ) > new.$master < $master at the end, but that hides some crucial information multiple lines later.
The trap commands make sure that (a) if the script is interrupted (signals HUP, INT, QUIT, PIPE or TERM), the temporary directory is removed and the exit status is 1 (not success) and (b) once the subdirectory is removed, the process can exit with a zero status.
You might need to check whether new.$master exists before overwriting it. You might need to check that the extract operation actually extracted stuff. You might need to check whether the sub-tarball processing actually worked. If the master tarball extracts into multiple sub-directories, you need to convert the 'cd *' line into some loop that iterates over the sub-directories it creates.
All these issues can be skipped if you know enough about the contents and nothing goes wrong.
process_tarballs
The second script is process_tarballs; it processes each of the tarballs on its command line in turn, extracting the file, making the substitutions, repackaging the result, etc. One advantage of using two scripts is that you can test the tarball processing separately from the bigger task of dealing with a tarball containing multiple tarballs. Again, life will be much easier if each of the sub-tarballs extracts into its own sub-directory; if any of them extracts into the current directory, make sure you create a new sub-directory for it.
for tarball in "$#"
do
# Extract $tarball into sub-directory
tar -xf $tarball
# Locate appropriate sub-directory.
(
cd $subdirectory
find . -type f -print0 | xargs -0 sed -i 's/name/alternative-name/g'
)
mv $tarball old.$tarball
tar -cf $tarball $subdirectory
rm -f old.$tarball
done
You should add traps to clean up here, too, so the script can be run in isolation from the master script above and still not leave any intermediate directories around. In the context of the outer script, you might not need to be so careful to preserve the old tarball before the new is created (so rm -f $tarbal instead of the move and remove command), but treated in its own right, the script should be careful not to damage anything.
Summary
What you're attempting is not trivial.
Debuggability splits the job into two scripts that can be tested independently.
Handling the corner cases is much easier when you know what is really in the files.
You probably can sed the actual tar as tar itself does not do compression itself.
e.g.
zcat archive.tar.gz|sed -e 's/foo/bar/g'|gzip > archive2.tar.gz
However, beware that this will also replace foo with bar also in filenames, usernames and group names and ONLY works if foo and bar are of equal length

Resources