delete strings from file bash - string

I have a file with tons of call logs and I am trying to clean it up using bash. I figured out how to search for a string and delete the entire line it is on but that isn't what I want to accomplish.
I want to search for a string as an example:
There are tons of MAC address in the file and I want to remove them all MAC:00-0A-DD-84-01-33
There is also a call ID at the beginning of each line that looks like: 354469805 or 354469894 and I want to remove all of those as well.
I'm just starting in bash so please excuse my ignorance. I am entering 2 lines of the call log below for clarification. I want to delete the 3544 number, the MAC address, and the word Telepacific.
354469725 06/24/2013 09:34 00:03:26 Chante Squires 105 TelePacific MAC:00-0A-DD-84-01-1D TelePacific 17025290701 1
354469732 06/24/2013 09:59 00:01:16 Chante Squires 105 TelePacific MAC:00-0A-DD-84-01-1D TelePacific 12132238375 1

You could use sed:
sed -i 's/^[0-9]\{9\}\|MAC:[0-9A-Fa-f]\{2\}\([-\:][0-9A-Fa-f]\{2\}\)\{5\}//g' input.log
Between the 's/ and //g' is a regular expression that matches the removal criteria in your question. The s flag in front means "search and replace" the regular expression. The // means replace the regular expression with nothing. The g flag at the end means "replace all matches" if they occur more than once in a line. Finally, the -i switch means "edit the files in-place".
This solution assumes that your call IDs are all 9 digits and that the MAC address has six groups of two hexadecimal digits separated by dashes or colons.

One way with awk (you will loose extra tabs space, every field will be separated by single space):
awk '{for(i=2;i<NF;i++) if(8>i || i>10) printf "%s ", $i; print $NF}' log

Related

Simple way to remove multi-line string using sed

Using sed, is there a way to remove multiple lines from a text file based on some starting and ending expressions?
I have known markers in the file and want to remove everything between (markers inclusive). I have seen some really complicated solutions and I would like to do this without resorting to micro commands.
My file looks something like this:
cat /tmp/foobar.txt
this is line 1
this is line 3
tomcat.util.scan.StandardJarScanFilter.jarsToSkip=\
annotations-api.jar,\
ant-junit*.jar,\
ant-launcher.jar,\
ant.jar,\
asm-*.jar,\
aspectj*.jar,\
bootstrap.jar,\
catalina-ant.jar,\
catalina-ha.jar,\
catalina-ssi.jar,\
catalina-storeconfig.jar
the end leave me
and me
I want to remove everything starting at tomcat.util all the away to the last .jar
tldr;
I think this is the simplest way, ad no need for the assembly like micro commands
sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
which produces
this is line 1
this is line 3
the end leave me
and me
if you wanted to remove the lines in the file rather than spit out the output to stdout then use the inline flag, so
sed -i '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
So... how does this work?
sed commands, like vi commands operate on an address. Normally we don't specify an address and that simply applies the command to all lines of the file, eg when replacing the for that in a file we'd normally do
sed -i 's/the/that/g' /tmp/foobar.txt
ie applying the substitute or s command to all lines in the file.
In this case you want to delete some lines so we can use the delete or d command. But we need to tell it where to delete. So we need to give it an address.
The format of a sed command is
[addr][!]command[options]
(see the docs )
If no address is specified then the command is applied to all lines, if the ! is specified then it is applied to all lines that don't match the pattern. So far so good.
The trick here is that addr can be a single address or a range of addresses. The address can be a line number or a regex pattern. You use a , between two addresses to to specify a range.
so to delete line 5 to 8 inclusive you could do
sed -i '5,8d' /tmp/foobar.txt
in this case rather than knowing the line number we know some "markers" and we can use Regex instead, so the first marker, a line starting with tomcat.util is found by the regex
/^tomcat\.util.*$/
The second marker is a bit more tricky but if we look we can see that the final line to remove is the first one that does not end with a \, so we can match a line that consists of "anything but does not end with \"
/^.*[^\]$/
While the second marker could match a whole bunch of lines if we make a range out of these two regexes, the range means that the second "address" is the first line after the first address that matches the regex.
Putting that all together, we want to delete (d) all lines in the range from the address that is found by the regex matching a line starting with tomcat.util and ending with a line that does not end in \ ie
sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
hope that helps ;-)
Cheers
Karl
Awk is generally more useful than sed for anything spanning lines. Using any awk in any shell on every Unix box:
$ awk '!/\.jar/{f=0} /tomcat\.util/{f=1} !f' file
this is line 1
this is line 3
the end leave me
and me
This might work for you (GNU sed):
sed -n '/tomcat\.util/{:a;n;/\.jar/ba};p' file
Turn off implicit printing using the -n option.
Match on a line containing tomcat.util.
Continue fetching lines until such a line does not match one containing .jar.
Print all other lines.
Alternative:
sed -E '/tomcat\.util/{:a;$!N;/\.jar(,\\)?$/s/\n//;ta;D}' file
Gather up lines beginning tomcat.util and ending either .jar,\ or .jar, removing newlines until the end-of-file or a mis-match and then delete the collection.

Linux remove whitespace first line

i have the file virt.txt contains:
0302 000000 23071SOCIETY 117
0602 000000000000000001 PAYMENT BANK
I want to remove 3 whitespaces from 6th to 8th column to the first line only.
I do:
sed '1s/[[:blank:]]+[[:blank:]]+[[:blank:]]//6' virt.txt
it'KO
please help
Your regex would consume all the available blanks from a sequence of three or more (in a quite inefficient way) and replace the sixth occurrence of that. Because your first input line does not contain six or more separate stretches of three or more whitespace characters, it actually did nothing. But you can in fact use sed to do exactly what you say you want:
sed '1s/^\(.....\) /\1/' virt.txt
(or for convenience, if you have sed -E or the variant sed -r which works on some platforms, but neither of these is standard):
sed -E '1s/^(.{5}) {3}/\1/' virt.txt # -E is not portable
The parentheses capture the first five characters into a back reference, and we then use the first back reference \1 as the replacement string, effectively replacing only the text which matched outside the parentheses.
If your sed supports the -i option, you can use that to modify the file directly; but this is also not standard, so the most portable solution is to write the result to a new file, then move it back on top of the original file if you want to replace it.
sed is convenient if you are familiar with it, but as you are clearly not, perhaps a better approach would be to use a different language, ideally one which is not write-only for many users, like sed.
If you know the three characters will always be spaces, just do a static replacement.
awk 'NR==1 { $0 = substr($0, 1, 5) substr($0, 9) } 1' virt.txt
On the first line (NR is the current input line number) replace the input line $0 with a catenation of the substrings on both sides of the part you want to cut.
For a simple replacement like that, you can also use basic Unix text manipulation utilities, though it's rather inefficient and inelegant:
head -n 1 virt.txt | cut -c1-5,9- >newfile.txt
tail -n +2 virt.txt >>newfile.txt
If you need to check that the three characters are spaces, the Awk script only needs a minor tweak.
awk 'NR==1 && /^.{5} {3}/ { $0 = substr($0, 1, 5) substr($0, 9) } 1' virt.txt
You should vaguely recognize the regex from above. Awk is less succinct, but as a consequence also quite a lot more readable, than sed.

how to transpose values two by two using shell?

I have my data in a file store by lines like this :
3.172704445659,50.011996744997,3.1821975358417,50.012335988197,3.2174797791605,50.023182479597
And I would like 2 columns :
3.172704445659 50.011996744997
3.1821975358417 50.012335988197
3.2174797791605 50.023182479597
I know sed command for delete ','(sed "s/,/ /") but I don't know how to "back to line" every two digits ?
Do you have any ideas ?
One in awk:
$ awk -F, '{for(i=1;i<=NF;i++)printf "%s%s",$i,(i%2&&i!=NF?OFS:ORS)}' file
Output:
3.172704445659 50.011996744997
3.1821975358417 50.012335988197
3.2174797791605 50.023182479597
Solution viable for those without knowledge of awk command - simple for loop over an array of numbers.
IFS=',' read -ra NUMBERS < file
NUMBERS_ON_LINE=2
INDEX=0
for NUMBER in "${NUMBERS[#]}"; do
if (($INDEX==$NUMBERS_ON_LINE-1)); then
INDEX=0
echo "$NUMBER"
else
((INDEX++))
echo -n "$NUMBER "
fi
done
Since you already tried sed, here is a solution using sed:
sed -r "s/(([^,]*,){2})/\1\n/g; s/,\n/\n/g" YOURFILE
-r uses sed extended regexp
there are two substitutions used:
the first substitution, with the (([^,]*,){2}) part, captures two comma separated numbers at once and store them into \1 for reuse: \1 holds in your example at the first match: 3.172704445659,50.011996744997,. Notice: both commas are present.
(([^,]*,){2}) means capture a sequence consisting of NOT comma - that is the [^,]* part followed by a ,
we want two such sequences - that is the (...){2} part
and we want to capture it for reuse in \1 - that is the outer pair of parentheses
then substitute with \1\n - that just inserts the newline after the match, in other words a newline after each second comma
as we have now a comma before the newline that we need to get rid of, we do a second substitution to achieve that:
s/,\n/\n/g
a comma followed by newline is replace with only newline - in other words the comma is deleted
awk and sed are powerful tools, and in fact constitute programming languages in their own right. So, they can, of course, handle this task with ease.
But so can bash, which will have the benefits of being more portable (no outside dependencies), and executing faster (as it uses only built-in functions):
IFS=$', \n'
values=($(</path/to/file))
printf '%.13f %.13f\n' "${values[#]}"

How to use invert "-v" in grep when I do not have a file but a long string that is just one line?

Supposed I have
echo "The first part. The second part. The third part."
and want to remove The first part and The third part to get:
The second part.
I tried:
echo "The first part. The second part. The third part." | grep -v -e "The first part." -e "The third part."
but the inverting flag appears to work only for files with multiple lines. How can I do it for a single string?
Use sed instead:
echo "The first part. The second part. The third part." \
| sed -e 's/[[:space:]]*The first part\.[[:space:]]*//g' \
-e 's/[[:space:]]*The third part\.[[:space:]]*//g'
grep is a tool which works line-based and is more as a select-lines-which-satesfy-condition tool, The task you want to implement is more remove-substrings-from-file. This is in the area of substitutions and not in the area of selection: The best tool for this task is to use sed
sed 's/string_to_get_rid_of//g' file
Of course it is possible that your file is structured in records and you want to remove all records which contain a particular word, then there is another option. Assume that your file is split into various records which are delimited by a unique character (eg. <full-stop>-character (.)). The it is better to use awk for this. Awk allows you to redefine it's record separator from a new-line (default) to anything you want by defining RS and ORS (the latter for the output):
awk 'BEGIN{RS=ORS="."}/string_that_should_not_appear/{next}1' file
Assume you have a file with the content:
foo.bar.baz.qux
quux.quuz.corge
If we want to remove all the records which do not contain qux, we do:
awk 'BEGIN{RS=ORS="."}/qux/{next}1' file
which returns
foo.bar.baz.quuz.corge.
Notice that the record containing "cux" contained a newline and that an extra ORS is added at the end. Also you might get
foo.bar.baz.quuz.corge
.
Which is due to the POSIX standard that files should end with a newline
In case of the OP, it would read:
awk 'BEGIN{RS=ORS="."}/The first part/{next}/The third part/{next}1' file

A Linux Shell Script Problem

I have a string separated by dot in Linux Shell,
$example=This.is.My.String
I want to
1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:
This.is.My.Goood.Long.String
2.Get the part after the last dot, so I will get
String
3.Turn the dot into underscore except the last dot, so I will get
This_is_My.String
If you have time, please explain a little bit, I am still learning Regular Expression.
Thanks a lot!
I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:
example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot}
This.is.My.Goood.Long.String
echo ${before_last_dot//./_}.${after_last_dot}
This_is_My.String
The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.
This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)
Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:
sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
It splits the line before the last dot by inserting a newline and copies the result into hold space:
s/\(.*\)\./\1\n./;h
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
s/[^\n]*\n//;x
removes everything after and including the newline from the copy that's now in pattern space
s/\n.*//
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
s/\./_/g;G
removes the newline that the append operation adds
s/\n//
Then the sed script is finished and the pattern space is output.
At the end of each numbered step (some consist of two actual steps):
Step Pattern Space Hold Space
This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String
Solution
Two versions of this, too:
Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
Complex: sed 's/.*[.]\([^.]*\)$/\1/'
Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'
With 3, you probably need to run the substitute (in its entirety) at least twice, in general.
Explanation
Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.
Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.
Here's a version that uses Bash's regex matching (Bash 3.2 or greater).
[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}
Here's a Bash version that uses IFS (Internal Field Separator).
saveIFS=$IFS
IFS=.
array=($e) # * split the string at each dot
lastword=${array[#]: -1}
unset "array[${#array}-1]" # *
IFS=_
echo "${array[*]}.$lastword" # The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS
* use declare -p array after these steps to see what the array looks like.
1.
$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string
before: a dot, then no dot until the end. after: obvious, & is what matched the first part
2.
$ echo 'This.is.my.string' | sed 's}.*\.}}'
string
sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.
3.
$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string
convert all dots to _, then turn the last _ to a dot.
(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')
You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.
1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'
What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).
2. echo $example| awk -F . '{print $NF}'
$NF is the last field, so that's all!
3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'
Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.
$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!
Note you can't say $NF="", because then it would display two underscores.

Resources