Related
This question already has answers here:
Change file's numbers Bash
(2 answers)
Closed 2 years ago.
I need to implement a script (duplq.sh) that would rename all the text files existing in the current directory using the command line arguments. So if the command duplq.sh pic 0 3 was executed, it would do the following transformation:
pic0.txt will have to be renamed pic3.txt
pic1.txt to pic4.txt
pic2.txt to pic5.txt
pic3.txt to pic6.txt
etc…
So the first argument is always the name of a file the second and the third always a positive digit.
I also need to make sure that when I execute my script, the first renaming (pic0.txt to pic3.txt), does not erase the existing pic3.txt file in the current directory.
Here's what i did so far :
#!/bin/bash
name="$1"
i="$2"
j="$3"
for file in $name*
do
echo $file
find /var/log -name 'name[$i]' | sed -e 's/$i/$j/g'
i=$(($i+1))
j=$(($j+1))
done
But the find command does not seem to work. Do you have other solutions ?
The problem you're trying to solve is actually somewhat tricky, and I don't think you've fully thought it through. For instance, what's the difference between duplq.sh pic 0 3 and duplq.sh pic 2 5 -- it looks like both should just add 3 to the number, or would the second skip "pic0.txt" and "pic1.txt"? What effect would either one have on files named "pic", "pic.txt", "picture.txt", "picture2.txt", "pic2-2.txt", or "pic999.txt".
There are also a bunch of basic mistakes in the script you have so far:
You should (almost) always put variable references in double-qotes, to avoid unexpected word-splitting and wildcard expansion. So, for example, use echo "$file" instead of echo $file. In for file in $name*, you should put double-quotes around the variable but not the *, because you want that to be treated as a wildcard. Hence, the correct version is for file in "$name"*
Don't put variable references in single-quotes, they aren't expanded there. So in the find and sed commands, you aren't passing the variables' values, you're passing literal dollar signs followed by letters. Again, use double-quotes. Also, you don't have a "$" before "name", so it won't be treated as a variable even in double-quotes.
But the find and sed commands don't do what you want anyway. Consider find /var/log -name "name[1]" -- that looks for files named "name1", not "name1" + some extension. And it looks in the current directory and all subdirectories, which I'm pretty sure you don't want. And the "1" ("$i") may not be the number in the current filename. Suppose there are files named "pic0.jpg", "pic0.png", and "pic0.txt" -- on the first iteration, the loop might find all three with a pattern like "pic0*", then on the second and third iterations try to find "pic1*" and "pic2*, which don't exist. On the other hand, suppose there are files named "pic0.txt", "pic5.txt", and "pic8.txt" -- again, it might look for "pic0*" (ok), then "pic1*" (not found), and then "pic2*" (ditto).
Also, if you get to multi-digit numbers, the pattern "name[10]" will match "file0" and "file1", but not "file10". I don't know why you added the brackets there, but they don't do anything you'd want.
You already have the files being listed one at a time in the $file variable, searching again with different criteria just adds confusion.
Also, at no point in the script do you actually rename anything. The find | sed line will (if it works) print the new name for the file, but not actually rename it.
BTW, when you do use the mv command, use either mv -n or mv -i to keep it from silently and irretrievably overwriting files if/when a name conflict occurs.
To prevent overwriting when incrementing file numbers, you need to do the renames in reverse numeric order (i.e. rename "pic3.txt" to "pic6.txt" before renaming "pic0.txt" to "pic3.txt"). This is especially tricky because if you just sort filenames in reverse alphabetic order, you'll get "pic7.txt" before "pic10.txt". But you can't do a numeric sort without removing the "pic" and ".txt" parts first.
IMO this is actually the trickiest problem to be solved in order to get this script to work right. It might be simplest to specify the largest index number as one of the arguments, and have it start there and count down to 0 (looping over numbers rather than files), and then for each number iterate over matching files (e.g. "pic0.jpg", "pic0.png", and "pic0.txt").
So I assume that 0 3 is just a measurement for the difference of old num and new num and equivalent to 1 4 or 100 103.
To avoid overwriting existing files, create a new temp dir, move all affected files there, and move all of them back in the end.
#/bin/bash
#
# duplq.sh pic 0 3
base="$1"
delta=$(( $3 - $2 ))
# echo delta $delta
target=$(mktemp -d)
echo $target
# /tmp/tmp.7uXD2GzqAb
add () {
f="$1"
b="$2"
d=$3
num=${f#./${b}}
# echo -e "file: $f \tnum: $num \tnum + d: $((num + d))" ;
echo -e "$((num + d))" ;
}
for f in $(find -maxdepth 1 -type f -regex ".*/${base}[0-9]+")
do
newnum=$(add "$f" "${base}" $delta)
echo mv "$f" "$target/${base}$newnum"
done
# exit
echo mv $target/${base}* .
First I tried to just use bash syntax, to check, whether removal of the prefix (pic) results in just digits remaining. I also didn't use the extension .txt - this is left as an exercise for the reader. From the question it is unclear - it is never explicitly told, that all files share the same extension, but all files in the example do.
With the -regex ".*/${base}[0-9]+") in find, the values are guaranteed to be just digits.
num=${f#./${b}}
removes from file f the base ("pic"). Delta d is added.
Instead of really moving, I just echoed the mv-command.
#TODO: Implement the file name extension conservation.
And 2 other pitfalls came to my mind: If you have 3 files pic0, pic00 and pic000 they all will be renamed to pic3. And pic08 will be cut into pic and 08, 08 will then be tried to be read as octal number (or 09 or 012129 and so on) and lead to an error.
One way to solve this issue is, that you prepend the extracted number (001 or 018) with a "1", then add 3, and remove the leading 1:
001 1001 1004 004
018 1018 1021 021
but this clever solution leads to new problems:
999 1999 2002 002?
So a leading 1 has to be cut off, a leading 2 has to be reduced by 1. But now, if the delta is bigger, let's say 300:
018 1018 1318 318
918 1918 2218 1218
Well - that seems to be working.
See my previous question on assembling a specific string here.
I was given an answer to that question, but unfortunately the information didn't actually help me accomplish what I was trying to achieve.
Using the info from that post, I have been able to assemble the following set of strings: gnuplot -e "filename='output_N.csv'" 'plot.p' where N is replaced by the string representation of an integer.
The following loop will explain: (Actually, there is probably a better way of doing this loop, which you may want to point out - hopefully the following code won't upset too many people...)
1 #!/bin/bash
2 n=0
3 for f in output_*.csv
4 do
5 FILE="\"filename='output_"$n".csv'\""
6 SCRIPT="'plot.p'"
7 COMMAND="gnuplot -e $FILE $SCRIPT"
8 $COMMAND
9 n=$(($n+1))
10 done
Unfortunately this didn't work... gnuplot does run, but gives the following error message:
"filename='output_0.csv'"
^
line 0: invalid command
"filename='output_1.csv'"
^
line 0: invalid command
"filename='output_2.csv'"
^
line 0: invalid command
"filename='output_3.csv'"
^
line 0: invalid command
...
So, as I said before, I'm no expert in bash. My guess is that something isn't being interpreted correctly - either something is being interpreted as a string where it shouldn't or it is not being interpreted as a string where it should? (Just a guess?)
How can I fix this problem?
The first few (relevant) line of my gnuplot script are the following:
(Note the use of the variable filename which was entered as a command line argument. See this link.)
30 fit f(x) filename using 1:4:9 via b,c,e
31
32 plot filename every N_STEPS using 1:4:9 with yerrorbars title "RK45 Data", f(x) title "Landau Model"
Easy fix - I made a mistake with the quotation marks. ("")
Essentially, the only reason why the quotation marks " and " are required around the text filename='output_"$n".csv' is so that this string is interpreted correctly by bash, before executing the command! So indeed it is correct that the program runs when the command gnuplot -e "filename='output_0.csv'" 'plot.p' is entered into the terminal directly, but the quotation marks are NOT required when assembling the string beforehand. (This is a bit difficult to explain, but hopefully it is clear in your mind the difference between the 2.)
So the corrected version of the above code is:
1 #!/bin/bash
2 n=0
3 for f in output_*.csv
4 do
5 FILE="filename='output_"$n".csv'"
6 SCRIPT='plot.p'
7 COMMAND="gnuplot -e $FILE $SCRIPT"
8 $COMMAND
9 n=$(($n+1))
10 done
That is now corrected and working. Note the removal of the escaped double quotes.
My problem is that the result is jumbled. Consider this script:
#!/bin/bash
INPUT="filelist.txt"
i=0;
while read label
do
i=$[$i+1]
echo "HELLO${label}WORLD"
done <<< $'1\n2\n3\n4'
i=0;
while read label
do
i=$[$i+1]
echo "HELLO${label}WORLD"
done < "$INPUT"
filelist.txt
5
8
15
67
...
The first loop, with the immediate input (through something I believe is called a herestring (the <<< operator) gives the expected output
HELLO1WORLD
HELLO2WORLD
HELLO3WORLD
HELLO4WORLD
The second loop, which reads from the file, gives the following jumbled output:
WORLD5
WORLD8
WORLD15
WORLD67
I've tried echo $label: This works as expected in both cases, but the concatenation fails in the second case as described. Further, the exact same code works on my Win 7, git-bash environment. This issue is on OSX 10.7 Lion.
How to concatenate strings in bash |
Bash variables concatenation |
concat string in a shell script
Well, just as I was about to hit post, the solution hit me. Sharing here so someone else can find it - it took me 3 hours to debug this (despite being on SO for almost all that time) so I see value in addressing this specific (common) use case.
The problem is that filelist.txt was created in Windows. This means it has CRLF line endings, while OSX (like other Unix-like environments) expects LF only line endings. (See more here: Difference between CR LF, LF and CR line break types?)
I used the answer here to convert the file before consumption. Using sed I managed to replace only the final line's carriage return, so I stuck to known guns and went for the perl approach. Final script is below:
#!/bin/bash
INPUTFILE="filelist.txt"
INPUT=$(perl -pe 's/\r\n|\n|\r/\n/g' "$INPUTFILE")
i=0;
while read label
do
i=$[$i+1]
echo "HELLO${label}WORLD"
done <<< $'INPUT'
Question has been asked in a different form at Bash: Concatenating strings fails when read from certain files
The GNU bash manual tells me
An indexed array is created automatically if any variable is assigned
to using the syntax
name[subscript]=value
The subscript is treated as an arithmetic expression that must
evaluate to a number. If subscript evaluates to a number less than
zero, it is used as an offset from one greater than the array’s
maximum index (so a subcript of -1 refers to the last element of the
array).
So I figure I will give it a try and get the following result:
$ muh=(1 4 'a' 'bleh' 2)
$ echo $muh
1
$ echo ${muh[*]}
1 4 a bleh 2 # so far so good so now I'll try a negative ...
$ echo ${muh[-1]}
-bash: muh: bad array subscript # didn't go as planned!
Did I do something wrong, or is the website wrong, or is gnu bash that different from the bash I am running under CentOS? Thanks!
If you just want the last element
$ echo ${muh[*]: -1}
2
If you want next to last element
$ echo ${muh[*]: -2:1}
bleh
According to Greg Wooledge's wiki, (which links to the bash changelog) the negative index syntax was added to bash in version 4.2 alpha.
Bash beore 4.2 (like the default one on Macs these days) doesn't support negative subscripts. Apart from the "substring expansion" used in the accepted answer, a possibly cleaner workaround is to count the desired index from the array start within the brackets:
$ array=(one two three)
$ echo "${array[${#array[#]}-1]}"
three
With this approach, you can pack other parameter expansion operations into the term, e.g. "remove matching prefix pattern" th:
$ echo "${array[${#array[#]}-1]#th}"
ree
If you do man bash the section on arrays does not list this behavior. It might be something new (gnu?) in bash.
Fails for me in CentOS 6.3 (bash 4.1.2)
The negative subscript works perfectly fine for me on my computer with Ubuntu 14.04 / GNU bash version 4.3.11(1) however it returns:
line 46: [-1]: bad array subscript
When I try to run the same script on 4.2.46(1). I
I'm trying to do my homework that is restricted to only using sed to filter an input file to a certain format of output. Here is the input file (named stocks):
Symbol;Name;Volume
================================================
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
================================================
And the output needs to be:
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
I did come up with a solution, but it's not efficient. Here is my sed script (named try.sed):
/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}
The command that I run on shell is:
$ sed -nf try.sed stocks
My question is, is there a better way of using sed to get the same result? The script I wrote only works with 7 lines of data. If the data is longer, I need to re-modify my script. I'm not sure how I can make it any better, so I'm here asking for help!
Thanks for any recommendations.
One more way using sed:
sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks
Output:
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
Explanation:
-ne # Process each input line without printing and execute next commands...
/^====/,/^====/ # For all lines between these...
{
/;/ # If line has a semicolon...
{
s/;.*$// # Remove characters from first semicolon until end of line.
H # Append content to 'hold space'.
}
};
$ # In last input line...
{
g # Copy content of 'hold space' to 'pattern space' to work with it.
s/\n// # Remove first newline character.
s/\n/, /g # substitute the rest with output separator, comma in this case.
p # Print to output.
Edit: I've edited my algorithm, since I had neglected to consider the header and footer (I thought they were just for our benefit).
sed, by its design, accesses every line of an input file, and then performs expressions on ones that match some specification (or none). If you're tailoring your script to a certain number of lines, you're definitely doing something wrong! I won't write you a script since this is homework, but the general idea for one way to go about it is to write a script that does the following. Think of the ordering as the order things should be in a script.
Skip the first three lines using d, which deletes the pattern space and immediately moves on to the next line.
For each line that isn't a blank line, do the following steps. (This would all be in a single set of curly braces.)
Replace everything after and including the first semicolon (;) with a comma-and-space (", ") using the s (substitute) command.
Append the current pattern space into the hold buffer (look at H).
Delete the pattern space and move on to the next line, like in step 1.
For each line that gets to this point in the script (should be the first blank line), retrieve the contents of the hold space into the pattern space. (This would be after the curly braces above.)
Substitute all newlines in the pattern space with nothing.
Next, substitute the last comma-and-space in the pattern space with nothing.
Finally, quit the program so you don't process any more lines. My script worked without this, but I'm not 100% sure why.
That being said, that's just one way to go about it. sed often offers varying ways of varying complexity to accomplish a task. A solution I wrote with this method is 10 lines long.
As a note, I don't bother suppressing printing (with -n) or manually printing (with p); each line is printed by default. My script runs like this:
$ sed -f companies.sed companies
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
This sed command should produce your required output:
sed -rn '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt
OR on Mac:
sed -En '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt
This might work for you:
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stocks
We don't want the headings so let's delete them. 1d
All data items are delimited by ;'s so let's concentrate on those lines. /;/
Of the things above delete everything from the first ; to the end of line and then stuff it away in the the hold space (HS) {s/;.*//;H}
When you get to the last line, overwrite it with the HS using the g command, delete the first newline (generated by the H command), replace all subsequent newlines with a comma and a space and print out what's left. ${g;s/.//;s/\n/, /g;q}
Delete everything else d
Here's a terminal session showing the incremental refinement of building a sed command:
cat <<! >stock # paste the file into a here doc and pass it on to a file
> Symbol;Name;Volume
> ================================================
>
> BAC;Bank of America Corporation Com;238,059,612
> CSCO;Cisco Systems, Inc.;28,159,455
> INTC;Intel Corporation;22,501,784
> MSFT;Microsoft Corporation;23,363,118
> VZ;Verizon Communications Inc. Com;5,744,385
> KO;Coca-Cola Company (The) Common;3,752,569
> MMM;3M Company Common Stock;1,660,453
>
> ================================================
> !
sed '1d;/;/!d' stock # delete headings and everything but data lines
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
sed '1d;/;/{s/;.*//p};d' stock # delete all non essential data
BAC
CSCO
INTC
MSFT
VZ
KO
MMM
sed '1d;/;/{s/;.*//;H};${g;l};d' stock # use the l command to see what's really there!
\nBAC\nCSCO\nINTC\nMSFT\nVZ\nKO\nMMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;l};d' stock # refine refine
BAC, CSCO, INTC, MSFT, VZ, KO, MMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stock # all done!
BAC, CSCO, INTC, MSFT, VZ, KO, MMM