Eliminate multiple space from a file and modify the original file - linux

Specify the command that removes multiple spaces from a text file, leaving a single space in their place. Extra requirements : Original file to be modified.
Managed to pull out those 3 commands:
awk '{$2=$2};1' filename.txt
tr -s '[:space:]' < filename.txt > filename.new && mv filename.new filename.txt
sed -i 's/\s\+/ /g' filename.txt
Not sure if using a 'temporary file' is the best way to do the trick. Is there any more efficient way to do the problem ? Doesn't matter if it is tr / sed / awk or anything else, you can post all of them.
Example input:
I'm just giving spaces
Output :
I'm just giving spaces
Edit: Still looking for more answers

I'd use ed over the non-standard sed -i (And non-portable RE in your example) if you want to alter the original file:
printf "%s\n" '1,$s/[[:space:]]\{2,\}/ /g' w | ed -s filename.txt
or with perl:
perl -pi -e 's/\s{2,}/ /g' filename.txt
The {2,} regular expression construct (\{2,\} for POSIX Basic Regular Expressions like sed and ed use) matches 2 or more of the previous token.
Both of these match any whitespace characters, not just space, because that's how your examples work. If the goal is to only compress multiple spaces, not spaces + tabs, switch out the [[:space:]] and \s for just a single space.
(Anything that modifies a file "in place", be it ed, sed -i, perl -i, or a regular editor, has a good chance that it's going to be using a temporary file under the hood, by the way. They just handle it for you so you don't have to do it manually like with your tr example.)

Related

grep for a line in a file then remove the line

$ cat example.txt
Yields:
example
test
example
I want to remove 'test' string from this file.
$ grep -v test example.txt > example.txt
$ cat example.txt
$
The below works, but I have a feeling there is a better way!
$ grep -v test example.txt > example.txt.tmp;mv example.txt.tmp example.txt
$ cat example.txt
example
example
Worth noting that this is going to be on a file with over 10,000 lines.
Cheers
You could use sed,
sed -i '/test/d' example.txt
-i saves the changes made to that file. so you don't need to use a redirection operator.
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
You're doing it the right way but use an && before the mv to make sure the grep succeeded or you'll zap your original file:
grep -F -v test example.txt > example.txt.tmp && mv example.txt.tmp example.txt
I also added the -F options since you said you want to remove a string, not a regexp.
You COULD use sed -i but then you need to worry about figuring out and/or escaping sed delimiters and sed does not support searching for strings so you'd need to try to escape every possible combination of regexp characters in your search string to try to make sed treat them as literal chars (a process you CANNOT automate due to the position-sensitive nature of regexp chars) and all it'd save you is manually naming your tmp file since sed uses one internally anyway.
Oh, one other option - you could use GNU awk 4.* with "inplace editing". It also uses a tmp file internally like sed does but it does support string operations so you don't need to try to escape RE metacharacters and it doesn't have delimiters as part of the syntax to worry about:
awk -i inplace -v rmv="test" '!index($0,rmv)' example.txt
Any grep/sed/awk solution will run the in blink of an eye on a 10,000 line file.

Trying to use grep to find something, then output a different part of the line

Say for instance I'm searching a line that is like this:
Color asdf
and I use grep to find that line, like grep asdf file.txt
How would I then display Color? Learning linux is hard.
With the command line tool sed you can replace stings by using regular expressions:
echo "Color asdf" | sed 's/\([^ ]*\).*/\1/'
This part: \([^ ]*\).* is a regular expresion. The first part of the regex: [^ ]*, matches any character except a space as many times as possible and what's between the \( and \) is being captured in the variable \1. Then you also match the remaining part of the string with .* and replace all of that with only the first word which was captured by \([^ ]*\) by using \1 in the replace part of the sed command.
Here some more info about sed:
http://linux.about.com/od/commands/a/Example-Uses-Of-Sed-Cmdsedxa.htm
You could use sed:
sed -n 's/[[:space:]][[:space:]]*asdf$//p' file.txt
Details:
The -n option tells sed not to print the pattern space automatically. Basically, it doesn't output anything unless you tell it to.
The s command of sed replaces text. Here, if a line ends with asdf, preceded by at least one whitespace character, we replace all of that with nothing and then print the line (notice the p flag at the end of the s command). The printing is only done if something was actually replaced. More information about the s command can be found e. g. in the GNU sed manual.
Edit for clarity: When using single quotes, parameter expansion does not work and thus, variables won't be replaced. To use variables, use double quotes:
search=asdf
sed -n "s/[[:space:]][[:space:]]*${search}\$//p" file.txt
If you'd really like to use grep here, you could pipe the output from grep into cut:
grep -h asdf *.txt | cut -s -d -f 1
Note that there have to be two spaces after the -d option to cut - the first tells cut to use a blank as the field delimiter (I'm assuming your fields are blank-delimited rather than tab-delimited), while the second separates the -d option from the following option (-f).
But, yeah, sed or awk are probably your friends here... :-)
you can color pattern in the line using grep
grep --colour -o 'asdf' file.txt
edit: the -o option will print only the patterns

Replace string between square brackets with sed

I have some strings in a textfile that look like this:
[img:3gso40ßf]
I want to replace them to look like normal BBCode:
[img]
How can I do that with sed? I tried this one but it doesn't do anything:
sed -i 's/^[img:.*]/[img]/g' file.txt
Escape those square brackets
Square brackets are metacharacters: they have a special meaning in POSIX regular expressions. If you mean [ and ] literally, you need to escape those characters in your regexp:
$ sed -i .bak 's/\[img:.*\]/\[img\]/g' file.txt
Use [^]]* instead of .*
Because * is greedy, .* will capture more than what you want; see Jidder's comment. To fix this, use [^]]*, which captures a sequence of characters up to (but excluding) the first ] encountered.
$ sed -i .bak 's/\[img:.[^]]\]/\[img\]/g' file.txt
Are you using an incorrect sed -i syntax?
(Thanks to j.a. for his comment.)
Depending on the flavour of sed that you're using, you may be allowed to use sed -i without specifying any <extension> argument, as in
$ sed -i 's/foo/bar/' file.txt
However, in other versions of sed, such as the one that ships with Mac OS X, sed -i expects a mandatory <extension> argument, as in
$ sed -i .bak 's/foo/bar/' file.txt
If you omit that extension argument (.bak, here), you'll get a syntax error. You should check out your sed's man page to figure out whether that argument is optional or mandatory.
Match a specific number of characters
Is there a way to tell sed that there are always 8 random characters after the colon?
Yes, there is. If the number of characters between the colon and the closing square bracket is always the same (8, here), you can make your command more specific:
$ sed -i .bak 's/\[img:[^]]\{8\}\]/\[img\]/g' file.txt
Example
# create some content in file.txt
$ printf "[img:3gso40ßf]\nfoo [img:4t5457th]\n" > file.txt
# inspect the file
$ cat file.txt
[img:3gso40ßf]
foo [img:4t5457th]
# carry out the substitutions
$ sed -i .bak 's/\[img:[^]]\{8\}\]/\[img\]/g' file.txt
# inspect the file again and make sure everything went smoothly
$ cat file.txt
[img]
foo [img]
# if you're happy, delete the backup that sed created
$ rm file.txt.bak

clean letters and characters in files leaving only numbers using bash

I am reading files and i am doing something like:
cat file | sed s/\ //g |awk '$0 !~ /[^0-9]/'
With this line I want to clean anything different to numbers.
But i have a problem, when the file is not sorted the command works fine, but with a sorted file the command not works, the output is empty.
Who can help me?
with grep -o '[0-9]+' not works because:
I have a file like:
311435ll3e
kk13322;.
erre433
The output is:
311435
3
13322
433
And the 3 is in the second line, the output that i need is:
3114353
13322
433
As a general rule, there is no reason to have both awk and sed appearing in the same pipe, due to a large overlap of capability, and frequently the same is true of awk/grep/sed combinations.
If you just want to suppress the non-digit characters within lines of characters, use (eg) sed -e 's/[^0-9]//g' file, or if you want to do it in place with no backup, sed -i -e 's/[^0-9]//g' file, or in place with backup to a .bak file, sed -ibak -e 's/[^0-9]//g' file.
To suppress blank lines, you can append |egrep -v '^$' after the sed, but it's more efficient to just use sed's d command to delete the pattern space and start next cycle if the pattern space is empty. For example,
sed -e 's/[^0-9]//g; /^$/d' file
does a d if the line is empty after substitution.
The form suggested in 1_CR's comment,
sed -e 's/[^0-9]//g' -e '/./!d'
is an alternative. That form tests if the line has at least one character in it, and if so does not do a d.
If you want to suppress everything in the file that's not digits, use tr -cd 0-9 < file. This suppresses line feeds also.
Note, the form tr -cd [0-9] < file or tr -cd '[0-9]' < file is not correct; it will fail to suppress ] and [ characters because tr will regard them as part of SET1.

How to remove a special character in a string in a file using linux commands

I need to remove the character : from a file. Ex: I have numbers in the following format:
b3:07:4d
I want them to be like:
b3074d
I am using the following command:
grep ':' source.txt | sed -e 's/://' > des.txt
I am new to Linux. The file is quite big & I want to make sure I'm using the write command.
You can do without the grep:
sed -e 's/://g' source.txt > des.txt
The -i option edits the file in place.
sed -i 's/://' source.txt
the first part isn't right as it'll completely omit lines which don't contain :
below is untested but should be right. The g at end of the regex is for global, means it should get them all.
sed -e 's/://g' source.txt > out.txt
updated to better syntax from Jon Lin's answer but you still want the /g I would think

Resources