Remove all punctuation from string except full stop (.) and colon (:) in Python

Remove all punctuation from string except full stop (.) and colon (:) in Python - python-3.x

I am trying to remove all punctuation marks from a string except (.) and (:).
This is what I have implemented:
import string
import re
remove = string.punctuation
remove = remove.replace(".", "")
pattern = r"[{}]".format(remove)
line = "NETWORK [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
re.sub(pattern, "", line)
Current output:
NETWORK listener connection accepted from 12700159926 4785 3 connections now open
Desired output:
NETWORK listener connection accepted from 127.0.0.1:59926 4785 3 connections now open
What am I doing wrong? Thanks for the help!

Apart from the fact you don't remove the : from the pattern, the pattern you end up with is:
[!"#$%&'()*+,-/:;<=>?#[\]^_`{|}~]
^^^
Note that ,-/ bit. In a regex, that means all characters between , and / inclusive, including - and ..
You would possibly be better of constructing it manually so as to avoid any tricky escaping requirements other than what you need, something like (untested so I'm not sure if more escaping is required):
pattern = "[!\"#$%&'()*+,\-/:;<=>?#[\]^_`{|}~]"
Alternatively, I'd probably rather allow a specific set of characters to survive rather than specifying a set to remove (the regex will be a lot simpler):
re.sub("[^a-zA-Z :\.]", "", line)
This will only allow alphanumerics, spaces, the colon and the period - everything else will be stripped.

This should work for you:
import string
import re
remove = string.punctuation
remove = re.sub(r"[.:-]+", "", remove)
pattern = r"[{}]".format(remove + '-')
line = "NETWORK [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
re.sub(pattern, "", line)
Output:
NETWORK listener connection accepted from 127.0.0.1:59926 4785 3 connections now open
Details:
For remove = re.sub(r"[.:-]+", "", remove): In character class adding : and - for removal since an unescaped hyphen in middle of a character class acts as range rather than literal -
For r"[{}]".format(remove + '-') we add - in character class in the end, note that unescaped hyphen at the end of [...] is fine

you don't escape special characters in string.punctuation for your regex. also you forgot to replace :!
use re.escape to escape regex special characters in punctuation. your final pattern will be [\!\"\#\$\%\&\'\(\)\*\+\,\-\/\;\<\=\>\?\#\[\\\]\^_\`\{\|\}\~]
import string
import re
remove = string.punctuation
remove = remove.replace(".", "")
remove = remove.replace(":", "")
pattern = r"[{}]".format(re.escape(remove))
line = "NETWORK [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
line = re.sub(pattern, "", line)
output:
NETWORK listener connection accepted from 127.0.0.1:59926 4785 3 connections now open

Related

Deleting until whitespace in Sublime Text

Is there any way to delete all characters untill first whitespace in Sublime. I know that you can use ctrl+delete to do that but it stops at non-word characters(",:,&,*, etc). When you try to delete aaa aaa 2+a, from the end, it will delete 2+a until + sign, but it will delete aaa until space. I need to change that so it will delete 2+a until first space. Solution can be anything; changing settings, plug-in.

I found solution for this. It's via this plugin:
https://packagecontrol.io/packages/KeyboardNavigation
Key for it is:
{ "keys": ["ctrl+backspace"], "command": "delete_to_beg_of_contig_boundary", "args": {"forward": false} }
It deletes any characters right to left until first whitespace.

I have written a Sublime Text plugin to delete text as you require. It is almost identical to ST's delete_word command but breaks only at whitespace/non-whitespace.
When called the plugin deletes text from the cursor to the next or previous group of characters, the grouping being defined as either whitespace or non-whitespace characters. Thus if run several times in succession it will alternate between deleting groups of whitespace and non-whitespace characters ahead or behind the cursor. The forwards parameter of the run() method (i.e. the command's arg) controls the deletion direction.
Save the plugin somewhere in your config Packages folder hierarchy. e.g.
.../sublime-text-3/Packages/User/DeleteToWhitespace.py
Add key bindings to your user .sublime-keymap file. e.g.
//
// These key bindings override the ST 'delete_word' keys but use whatever keys you want.
// You could use `super+delete` and `super+backspace` and keep ST's delete keys intact.
//
{ "keys": ["ctrl+delete"], "command": "delete_to_whitespace", "args": {"forwards": true} },
{ "keys": ["ctrl+backspace"], "command": "delete_to_whitespace", "args": {"forwards": false} },
Below is the DeleteToWhitespace.py plugin. It has been uploaded to this GitHub Gist – this links directly to the raw source code.
#
# Name: Delete To Whitespace
# Requires: Plugin for Sublime Text v3
# Command: delete_to_whitespace
# Args: forwards: bool (delete backwards if false)
# License: MIT License
#
import sublime, sublime_plugin, re
class DeleteToWhitespaceCommand(sublime_plugin.TextCommand):
"""
A Sublime Text plugin that deletes text from the cursor to the next or
previous group of characters, the grouping being defined as either
whitespace or non-whitespace characters. Thus if run several times in
succession it will alternate between deleting groups of whitespace and
non-whitespace ahead or behind the cursor. The forwards parameter of the
run() method (i.e. the command's arg) controls the deletion direction.
"""
def run(self, edit, forwards=True):
self.edit = edit
self.forwards = forwards
if forwards:
self.delete_forwards()
else:
self.delete_backwards()
def delete_forwards(self):
whitespace_regex = "^\s+"
non_whitespace_regex = "^\S+"
for sel in self.view.sel():
if sel.size() > 0:
self.view.erase(self.edit, sel)
continue
# ∴ sel.a == sel.b == sel.begin() == sel.end()
# view.full_line() includes the trailing newline (if any).
cursor = sel.a
line = self.view.full_line(cursor)
cursor_to_eol = sublime.Region(cursor, line.end())
cursor_to_eol_str = self.view.substr(cursor_to_eol)
match = re.search(whitespace_regex, cursor_to_eol_str)
if match:
self.erase_matching_characters(cursor, match)
continue
match = re.search(non_whitespace_regex, cursor_to_eol_str)
if match:
self.erase_matching_characters(cursor, match)
continue
def delete_backwards(self):
whitespace_regex = "\s+$"
non_whitespace_regex = "\S+$"
for sel in self.view.sel():
if sel.size() > 0:
self.view.erase(self.edit, sel)
continue
# ∴ sel.a == sel.b == sel.begin() == sel.end()
# view.line() excludes the trailing newline (if any).
cursor = sel.a
line = self.view.line(cursor)
cursor_to_bol = sublime.Region(cursor, line.begin())
cursor_to_bol_str = self.view.substr(cursor_to_bol)
# Delete the newline of the 'previous' line.
if cursor_to_bol.size() == 0 and cursor > 0:
erase_region = sublime.Region(cursor, cursor - 1)
self.view.erase(self.edit, erase_region)
continue
match = re.search(whitespace_regex, cursor_to_bol_str)
if match:
self.erase_matching_characters(cursor, match)
continue
match = re.search(non_whitespace_regex, cursor_to_bol_str)
if match:
self.erase_matching_characters(cursor, match)
continue
def erase_matching_characters(self, cursor, match):
match_len = match.end() - match.start()
if self.forwards:
erase_region = sublime.Region(cursor, cursor + match_len)
else:
erase_region = sublime.Region(cursor, cursor - match_len)
self.view.erase(self.edit, erase_region)

System.JSONException: Unexpected character ('i' (code 105)): was expecting comma to separate OBJECT entries at [line:1, column:18]

#SalesforceChallenge
I'm trying to escape a string but I had no success so far.
This is the response body I'm getting:
{"text":"this \"is something\" I wrote"}
Please note that there are 2 backslashes to escape the double quotes char. (This is a sample. Actually I have a big to escape with lots of "text" elements.)
When I try to deserialize it I get the following error:
System.JSONException: Unexpected character ('i' (code 105)): was expecting comma to separate OBJECT entries at [line:1, column:18]
I've tried to escape by using:
String my = '{"text":"this \"is something\" I wrote"}';
System.debug('test 0: ' + my);
System.debug('test 1: ' + my.replace('\"', '-'));
System.debug('test 2: ' + my.replace('\\"', '-'));
System.debug('test 3: ' + my.replace('\\\"', '-'));
System.debug('test 4: ' + my.replace('\\\\"', '-'));
--- Results:
[22]|DEBUG|test 0: {"text":"this "is something" I wrote"}
[23]|DEBUG|test 1: {-text-:-this -is something- I wrote-}
[23]|DEBUG|test 1: {-text-:-this -is something- I wrote-}
[24]|DEBUG|test 2: {"text":"this "is something" I wrote"}
[25]|DEBUG|test 3: {"text":"this "is something" I wrote"}
[26]|DEBUG|test 4: {"text":"this "is something" I wrote"}
--- What I need as result:
{"text":"this -is something- I wrote"}
Please, does someone has any fix to share?
Thanks a lot.

This is the problem with your test runs in Anonymous Apex:
String my = '{"text":"this \"is something\" I wrote"}';
Because \ is an escape character, you need two backslashes in an Apex string literal to produce a backslash in the actual output:
String my = '{"text":"this \\"is something\\" I wrote"}';
Since Apex quotes strings with ', you don't have to escape the quotes for Apex; you're escaping them for the JSON parser.
The same principle applies to the strings you're trying to use to do replacements: you must escape the \ for Apex.
All that said, it's unclear why you are trying to manually alter this string. The payload
{"text":"this \"is something\" I wrote"}
is valid JSON. In general, you should not perform string replacement on inbound JSON structures in Apex unless you're attempting to compensate for a payload that contains an Apex reserved word as a key so that you can use typed deserialization.

From SSH not decoded from bytes to ASCII?

Good afternoon.
I get the example below from SSH:
b"rxmop:moty=rxotg;\x1b[61C\r\nRADIO X-CEIVER ADMINISTRATION\x1b[50C\r\nMANAGED OBJECT DATA\x1b[60C\r\n\x1b[79C\r\nMO\x1b[9;19HRSITE\x1b[9;55HCOMB FHOP MODEL\x1b[8C\r\nRXOTG-58\x1b[10;19H54045_1800\x1b[10;55HHYB"
I process ssh.recv (99999) .decode ('ASCII')
but some characters are not decoded for example:
\x1b[61C
\x1b[50C
\x1b[9;55H
\x1b[9;19H
The article below explains that these are ANSI escape codes that appear since I use invoke_shell. Previously everything worked until it moved to another server.
Is there a simple way to get rid of junk values that come when you SSH using Python's Paramiko library and fetch output from CLI of a remote machine?
When I write to the file, I also get:
rxmop:moty=rxotg;[61C
RADIO X-CEIVER ADMINISTRATION[50C
MANAGED OBJECT DATA[60C
[79C
MO[9;19HRSITE[9;55HCOMB FHOP MODEL[8C
RXOTG-58[10;19H54045_1800[10;55HHYB
If you use PuTTY everything is clear and beautiful.
I can't get away from invoke_shell because the connection is being thrown from one server to another.
Sample code below:
# coding:ascii
import paramiko
port = 22
data = ""
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(hostname=host, username=user, password=secret, port=port, timeout=10)
ssh = client.invoke_shell()
ssh.send("rxmop:moty=rxotg;\n")
while data.find("<") == -1:
time.sleep(0.1)
data += ssh.recv(99999).decode('ascii')
ssh.close()
client.close()
f = open('text.txt', 'w')
f.write(data)
f.close()
The normal output is below:
MO RSITE COMB FHOP MODEL
RXOTG-58 54045_1800 HYB BB G12
SWVERREPL SWVERDLD SWVERACT TMODE
B1314R081D TDM
CONFMD CONFACT TRACO ABISALLOC CLUSTERID SCGR
NODEL 4 POOL FLEXIBLE
DAMRCR CLTGINST CCCHCMD SWVERCHG
NORMAL UNLOCKED
PTA JBSDL PAL JBPTA
TGFID SIGDEL BSSWANTED PACKALG
H'0001-19B3 NORMAL
What can you recommend in order to return normal output, so that all characters are processed?
Regular expressions do not help, since the structure of the record is shifted, then characters from certain positions are selected in the code.
PS try to use ssh.invoke_shell (term='xterm') don't work.

There is an answer here:
How can I remove the ANSI escape sequences from a string in python
There are other ways...
https://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output
Essentially, you are 'screen-scraping' input, and you need to strip the ANSI codes. So, grab the input, and then strip the codes.
import re
... (your ssh connection here)
data = ""
while data.find("<") == -1:
time.sleep(0.1)
chunk = ssh.recv(99999)
data += chunk
... (your ssh connection cleanup here)
ansi_escape = re.compile(r'\x1B(?:[#-Z\\-_]|\[[0-?]*[ -/]*[#-~])')
data = ansi_escape.sub('', data)

split() on one character OR another

Python 3.6.0
I have a program that parses output from Cisco switches and routers.
I get to a point in the program where I am returning output from the 'sh ip int brief'
command.
I place it in a list so I can split on the '>' character and extract the hostname.
It works perfectly. Pertinent code snippet:
ssh_channel.send("show ip int brief | exc down" + "\n")
# ssh_channel.send("show ip int brief" + "\n")
time.sleep(0.6)
outp = ssh_channel.recv(5000)
mystring = outp.decode("utf-8")
ipbrieflist = mystring.splitlines()
hostnamelist = ipbrieflist[1].split('>')
hostname = hostnamelist[0]
If the router is in 'enable' mode the command prompt has a '#' character after the hostname.
If I change my program to split on the '#' character:
hostnamelist = ipbrieflist[1].split('#')
it still works perfectly.
I need for the program to handle if the output has the '>' character OR the '#' character in 'ipbrieflist'.
I have found several valid references for how to handle this. Ex:
import re
text = 'The quick brown\nfox jumps*over the lazy dog.'
print(re.split('; |, |\*|\n',text))
The above code works perfectly.
However, when I modify my code as follows:
hostnamelist = ipbrieflist[1].split('> |#')
It does not work. By 'does not work' I mean it does not split on either character. No splitting at all.
The following debug is from PyCharm:
ipbrieflist = mystring.splitlines() ipbrieflist={list}: ['terminal length 0', 'rtr-1841>show ip int brief | exc down', 'Interface'] IP-Address OK? Method Status Protocol', 'FastEthernet0/1 192.168.1.204 YES NVRAM up up ', 'Loopback0 172.17.0.1 YES NVRAM up up ', '', 'rtr-1841>']
hostnamelist = ipbrieflist[1].split('> |#') hostnamelist={list}: ['rtr-1841>show ip int brief | exc down']
hostname = {str}'rtr-1841>show ip int brief | exc down'
As you can see the hostname variable still contains the 'show ip int brief | exc down' appended to it.
I get the same exact behavior if the hostname is followed by the '#' character.
What am I doing wrong?
Thanks.

Instead of this:
ipbrieflist[1].split('> |#')
You want this:
re.split('>|#', ipbrieflist[1])

is there a way to define auto-escaped string in lua (raw)?

The following lines are arbitrary regexp which I need to use in lua.
['\";=]
!^(?:(?:[a-z]{3,10}\s+(?:\w{3,7}?://[\w\-\./]*(?::\d+)?)?/[^?#]*(?:\?[^#\s]*)?(?:#[\S]*)?|connect (?:\d{1,3}\.){3}\d{1,3}\.?(?::\d+)?|options \*)\s+[\w\./]+|get /[^?#]*(?:\?[^#\s]*)?(?:#[\S]*)?)$
'(?i:(?:c(?:o(?:n(?:t(?:entsmartz|actbot/)|cealed defense|veracrawler)|mpatible(?: ;(?: msie|\.)|-)|py(?:rightcheck|guard)|re-project/1.0)|h(?:ina(?: local browse 2\.|claw)|e(?:rrypicker|esebot))|rescent internet toolpak)|w(?:e(?:b(?: (?:downloader|by mail)|(?:(?:altb|ro)o|bandi)t|emailextract?|vulnscan|mole)|lls search ii|p Search 00)|i(?:ndows(?:-update-agent| xp 5)|se(?:nut)?bot)|ordpress(?: hash grabber|\/4\.01)|3mir)|m(?:o(?:r(?:feus fucking scanner|zilla)|zilla\/3\.mozilla\/2\.01$|siac 1.)|i(?:crosoft (?:internet explorer\/5\.0$|url control)|ssigua)|ailto:craftbot\#yahoo\.com|urzillo compatible)|p(?:ro(?:gram shareware 1\.0\.|duction bot|webwalker)|a(?:nscient\.com|ckrat)|oe-component-client|s(?:ycheclone|urf)|leasecrawl\/1\.|cbrowser|e 1\.4|mafind)|e(?:mail(?:(?:collec|harves|magne)t|(?: extracto|reape)r|(siphon|spider)|siphon|wolf)|(?:collecto|irgrabbe)r|ducate search vxb|xtractorpro|o browse)|t(?:(?: ?h ?a ?t ?' ?s g ?o ?t ?t ?a ? h ?u ?r ?|his is an exploi|akeou)t|oata dragostea mea pentru diavola|ele(?:port pro|soft)|uring machine)|a(?:t(?:(?:omic_email_hunt|spid)er|tache|hens)|d(?:vanced email extractor|sarobot)|gdm79\#mail\.ru|miga-aweb\/3\.4|utoemailspider| href=)|^(?:(google|i?explorer?\.exe|(ms)?ie( [0-9.]+)?\ ?(compatible( browser)?)?)$|www\.weblogs\.com|(?:jakart|vi)a|microsoft url|user-Agent)|s(?:e(?:archbot admin#google.com|curity scan)|(?:tress tes|urveybo)t|\.t\.a\.l\.k\.e\.r\.|afexplorer tl|itesnagger|hai)|n(?:o(?:kia-waptoolkit.* googlebot.*googlebot| browser)|e(?:(?:wt activeX; win3|uralbot\/0\.)2|ssus)|ameofagent|ikto)|f(?:a(?:(?:ntombrows|stlwspid)er|xobot)|(?:ranklin locato|iddle)r|ull web bot|loodgate|oobar/)|i(?:n(?:ternet(?: (?:exploiter sux|ninja)|-exprorer)|dy library)|sc systems irc search 2\.1)|g(?:ameBoy, powered by nintendo|rub(?: crawler|-client)|ecko\/25)|(myie2|libwen-us|murzillo compatible|webaltbot|wisenutbot)|b(?:wh3_user_agent|utch__2\.1\.1|lack hole|ackdoor)|d(?:ig(?:imarc webreader|out4uagent)|ts agent)|(?:(script|sql) inject|$botname/$botvers)ion|(msie .+; .*windows xp|compatible \; msie)|h(?:l_ftien_spider|hjhj#yahoo|anzoweb)|(?:8484 boston projec|xmlrpc exploi)t|u(?:nder the rainbow 2\.|ser-agent:)|(sogou develop spider|sohu agent)|(?:(?:d|e)browse|demo bot)|zeus(?: .*webster pro)?|[a-z]surf[0-9][0-9]|v(?:adixbot|oideye)|larbin#unspecified|\bdatacha0s\b|kenjin spider|; widows|rsync|\\\r))'
And there are many others where these came from.....
Point as you might noticed, the first case only the " is escaped with \" bot not the '
Hence,
rex_pcre.new('['\";=]')
Won't work.
rex_pcre.new("['\";=]")
Should work, however, parts in the regex such as \-.
I also cannot use
[[ ]]
as there are regexp which ends with ] (first example)
breaking the lines as in
rex_pcre.new( [[
['\";=]
]])
won't work for me in cases such as the third one which ends with ) and also raised an error of unexpected symbol.
in sum I am searching for such for the r"UNESCAPED STRING" of Python or the #"UNESCAPED STRING" of C#..
I assume there isn't such, but wonder what is the way to get a similar functionality, given the fact, I only consume those value (regular expression) and have no control on how to compose them originally..
Here is my current solution
I simply try to compile the line, with [[ ]], if fail, move to " and then to "'"/
EscapeRegEx = function (xp)
-- try with [[ ]]
local opening = '[['
local closing = ']]'
local codeline = "rex_pcre.new(" .. opening .. xp .. closing .. ")"
_, err = loadstring(codeline)
if not err then return codeline end
-- then try with "
opening = '"'
closing = '"'
codeline = "rex_pcre.new(" .. opening .. xp .. closing .. ")"
_, err = loadstring(codeline)
if not err then return codeline end
-- then try with '
opening = "'"
closing = "'"
codeline = "rex_pcre.new(" .. opening .. xp .. closing .. ")"
_, err = loadstring(codeline)
if not err then return codeline end
end

You can use longer versions of the long brackets:
[=========[the regex goes in here]=========]
The opening long bracket will only be matched by a closing long bracket of the same length.
See this for more details; you can also do a similar thing to get nested multi-line comments.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Remove all punctuation from string except full stop (.) and colon (:) in Python - python-3.x

Related

Deleting until whitespace in Sublime Text

System.JSONException: Unexpected character ('i' (code 105)): was expecting comma to separate OBJECT entries at [line:1, column:18]

From SSH not decoded from bytes to ASCII?

split() on one character OR another

is there a way to define auto-escaped string in lua (raw)?

Categories

Resources