Remove string using sed or awk, grep

Remove string using sed or awk, grep - string

I'm trying find and remove strings like:
[1126604244001,85.00], [1122204245002,85.00], [1221104246003,85.00],
[1222204247004,85.00], [1823304248005,85.00], [1424404249006,85.00],
85.00 = constans. I mean [xxxxxxxxxxxxx,85.00],
In notepad++ is simple:
find: "[^........].............,85.00]" and replace:""
I wolud like to use awk or sed to remove string automaticly without importing it to notepad++.
ok, I have file
temp.txt
[1126604244001,17.00], [1126604244001,17.00], [1126604244001,17.00],
[1126604244001,85.00], [1122204245002,85.00], [1221104246003,85.00],
[1222204247004,85.00], [1823304248005,85.00], [1424404249006,85.00], [1126604244001,17.00], [1126604244001,17.00],
My desire output
temp.txt
[1126604244001,17.00],[1126604244001,17.00],[1126604244001,17.00],[1126604244001,17.00],[1126604244001,17.00],
Thx in advance!

With sed, simply:
sed 's/\[[^]]*,85.00\],[[:space:]]*//g' filename
With this, everything that matches the regex \[[^]]*,85.00\],[[:space:]]* is removed. The regex matches [ followed by an arbitrary number of characters that are not ], followed by ,85.00], and optionally spaces; the only syntactically tricky bit is the [^]] character set which matches all characters other than ].
Alternatively with awk:
awk -v RS='],' -v ORS='],' '!/,85.00$/' filename
This splits the input into records delimited by ], and prints only those that don't end with ,85.00.

egrep -v '[^0-9]85\.00]' YourFile
remove (not empty) line with your pattern

Related

Replace multiline string with sed

I have a file that's basically an INI/CFG file the looks like this:
[thing-a]
attribute1=foo
attribute2=bar
attribute3=foobar
attribute4=barfoo
[thing-b]
attribute1=dog
attribute3=foofoo
attribute4=castles
[thing-c]
attribute1=foo
attribute4=barfoo
[thing-d]
attribute1=123455
attribute2=dogs
attribute3=biscuits
attribute4=1234
Each 'thing' has a set of attributes that could include all the same ones or a subset there of.
I am trying to write a small bash script that will replace the attributes for 'thing-c' with a predefined block $a1, $a2 & $a3 are generated elsewhere in the wider script:
NEW_BLOCK="[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}"
I can find the right block with sed like this:
THING_BLOCK=$(sed -nr "/^\[thing-c\]/ { :l /^\s*[^#].*/ p; n; /^\[/ q; b l; }" ./myThingFile)
I'm not sure if i've gone down a rabbit hole or what with this and I'm pretty sure there is a better way of doing it.
I'm wanting to do what is:
sed "s/${THING_BLOCK}/${NEW_BLOCK}/"
But I can't quite figure out the multiline aspect to this and I'm not sure what the best route to take is.
Is there a way to do this sort of multiline find and replace with sed (or a better way with bash)

Is there a way to do this sort of multiline find and replace ...
Yes there is indeed a better way, albeit using awk:
awk -v blk="$NEW_BLOCK" -v RS= '{ORS = RT} $1 == "[thing-c]" {$0 = blk} 1' file
Using -v RS= we use an empty record separator that splits records in input file on each new line.

Another awk. Store the replacement to file2 and:
$ awk -v RS="" '
NR==FNR {
b=$0
next
}
$1~/thing-c/ {
$0=b
}
{
print (++c==1?"":ORS) $0
}' file2 file1
Output:
[thing-a]
attribute1=foo
attribute2=bar
attribute3=foobar
attribute4=barfoo
[thing-b]
attribute1=dog
attribute3=foofoo
attribute4=castles
[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}
[thing-d]
attribute1=123455
attribute2=dogs
attribute3=biscuits
attribute4=1234

When you want to use sed(IMHO awk is better here), you must have "nice" data (no special characters that sed will try to handle and [ inside block thing-3).
I tested with
read -d '' -r NEW_BLOCK <<END
[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}
END
For my solution I first need to replace newlines in $NEW_BLOCK with the two characters \n.
echo "This is the replacement string: ${NEW_BLOCK//$'\n'/\\n}"
With the "multi-line" option "-z" you can do
sed -rz "s/\[thing-c\][^[]*/${NEW_BLOCK//$'\n'/\\n}\n\n/" myThingFile

Get Text after word at specific position

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?

Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

remove from the string "89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.tar.gz" only the ".tar.gz"

remove from the string 89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.tar.gz only the .tar.gz part and the result should be 89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.
It can also happen some files with this extension:
91xhq8vkxlkdfpmfg566qahrwkh01c7n0scpdsr4p4vf6.tbz.tar.bz2 and others with double extension tar.tbz tar.zip and so on ...
In case .tar.zip the result must be nomearchivio.tar in the case 91xhq8vkxlkdfpmfg566qahrwkh01c7n0scpdsr4p4vf6.tbz.tar.bz2 must be 91xhq8vkxlkdfpmfg566qahrwkh01c7n0scpdsr4p4vf6.tbz
I use this :
nameFile= "89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.tar.gz"
name=${nameFile%.*}
and the result is :
echo $name
89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.tar
can you help me? Thanks
P.S. note that there are also other points within the file name.

Since you know exactly what you want to remove, just write it in full:
name=${nameFile%.tar.gz}
Or to remove the last two "extensions" .*.*:
name=${nameFile%.*.*}

You could use sed and remove the last 7 characters
echo $nameFile |sed 's/.\{7\}$//'

You could give a try to awk, for example:
echo 89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.tar.gz | awk -F '\.tgz' '{print $1".tgz"}'
It will output:
89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz
For other files:
echo "01c7n0scpdsr4p4vf6.tbz.tar.bz2" | awk -F '\.tbz' '{print $1".tbz"}'
It will output:
01c7n0scpdsr4p4vf6.tbz
In this case, awk is using as a delimiter -F '\.tbz' your pattern, .tgz or tbz and then prints all items found at the left + your desired extension.

Trim a string up to 4th delimiter from right side

I have strings like following which should be parsed with only unix command (bash)
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
I want to trim the strings like above upto 4th underscore from end/right side. So output should be
49_sftp_mac_myfile_simul_test
Number of underscores can vary in overall string. For example, The string could be
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
Output should be (after trimming up to 4th occurrence of underscore from right.
49_sftp_simul_test

Easily done using awk that decrements NF i.e. no. of fields to -4 after setting input+output field separator as underscore:
s='49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed'
awk 'BEGIN{FS=OFS="_"} {NF -= 4; $1=$1} 1' <<< "$s"
49_sftp_mac_myfile_simul_test

You can use bash's parameter expansion for that:
string="..."
echo "${string%_*_*_*_*}"

With GNU sed:
$ sed -E 's/(_[^_]*){4}$//' <<< "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"
49_sftp_mac_myfile_simul_test
From the end of line, removes 4 occurrences of _ followed by non _ characters.

Perl one-liner
echo $your-string | perl -lne '$n++ while /_/g; print join "_",((split/_/)[-$n-1..-5])'
input
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_mac_myfile_simul_test
input
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_simul_test

Not the fastest but maybe the easiest to remember and funiest:
echo "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"|
rev | cut -d"_" -f5- | rev

csv file replace two character string with three character

I would like to replace a handful of strings with others (e.g. "GG" with "GGX", "GG " with "GGX", "FG" with "FGX", etc) in the first column of a big csv file using a shell command.
I know I need something like
big.csv shell_commands big.csv
but I don't know awk or sed

Using sed, replacing all instances of "GG" with "GGX" in big.csv would look like:
sed 's/^GG/GGX/g' big.csv >big_translated.csv
If you need to replace multiple patterns, you can use multiple replace commands in sed separated by semicolons.
sed 's/^GG/GGX/g; s/^FG/FGX/g' big.csv >big_translated.csv
The ^ character means beginning of line and ensures that we only edit the first field of the csv.

awk 'BEGIN{ r["GG"] = "GGX"; r["FG"] = "FGX" }
{ for( k in r ) if( gsub( k, r[k], $1 ) break } 1' input-file
The break is there to prevent multiple substitutions.

Try this (provided you have single occurance of strings)
awk '{sub("GG","GGX",$0); sub("FG","FGX",$0); print}' temp.txt

How about this?
sed -i "s/^\(..\),/\1X,/" big.csv
Or if you got there some spaces this:
sed -i "s/^\([^ ][^ ][ ]*\),/\1X,/" big.csv

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Remove string using sed or awk, grep - string

egrep -v '[^0-9]85\.00]' YourFile remove (not empty) line with your pattern

Related

Replace multiline string with sed

Get Text after word at specific position

remove from the string "89dde7.rqsnhq34h.fmu8s1vn0i94hl.tgz.tar.gz" only the ".tar.gz"

Trim a string up to 4th delimiter from right side

csv file replace two character string with three character

Categories

Resources