Prevent shlex from splitting with colon (:) - python-3.x

I'm having trouble dealing with colons (:) in shlex. I need the following behaviour:
Sample input
text = 'hello:world ("my name is Max")'
s = shlex.shlex(instream=text, punctuation_chars=True)
s.get_token()
s.get_token()
...
Desired output
hello:world
(
"my name is Max"
)
Current output
hello
:
world
(
"my name is Max"
)
Shlex puts the colon in a separate token and I don't want that. The documentation doesn't say very much about the colon. I've tried to add it to the wordchar attribute but it messes everything up and separates the words between commas. I've also tried setting the punctuation_char attribute to a custom array with only parenthesis: ["(", ")"] but it makes no difference. I need the punctuation_char option set to get the parenthesis as a separate token (or any other option that achieves this output).
Anyone knows how could I get this to work? Any help will be greatly appreciated.
I'm using python 3.6.9, could upgrade to python 3.7.X if necessary.

To make shlex treat : as a word char, you need to add : to wordchars:
>>> text = 'hello:world ("my name is Max")'
>>> s = shlex.shlex(instream=text, punctuation_chars=True)
>>> s.wordchars += ':'
>>> while True:
... tok = s.get_token()
... if not tok: break
... print(tok)
...
hello:world
(
"my name is Max"
)
I tested that with Python 3.6.9 and 3.8.0. I think you need Python 3.6 in order to have the punctuation_chars initialization parameter.

Related

Python - Write the header without double quotes in pandas(df.to_csv)

i might be missing a small trick here but i couldn't get it right from today afternoon
I have 4 columns that needs to separated by ~. Out of this 4 column, one column is having ~ symbol as part of its name itself which is !~ID. This is how my output should look
!~ID~Rev~Type~Name
My code
df.to_csv(r'myout.txt', header=['!~ID','Rev','Type','Name'], index=None, sep='~', mode='w')
But this always gives me
"!~ID"~Rev~Type~Name
After seeing couple of post i have tried quoting options
df.to_csv(r'myout.txt', header=['!~ID','Rev','Type','Name'], index=None, sep='~', mode='w',
quoting = csv.QUOTE_NONE,
escapechar = '~')
But this gives me one extra ~ for ID. Please help
!~~ID~Rev~Type~Name
Since the file you want isn't a valid csv, I suggest you edit the file afterwards to get the desired result:
df.to_csv(r'myout.txt', header=['!#ID','Rev','Type','Name'], index=None, sep='~') # replacing the first ~ with # (use any character you like)
with open('myout.txt', 'r+') as f:
f.seek(1) # position of #
f.write('~') # replacing

F string is adding new line

I am trying to make a name generator. I am using F string to concatenate the first and the last names. But instead of getting them together, I am getting them in a new line.
print(f"Random Name Generated is:\n{random.choice(firstname_list)}{random.choice(surname_list)}")
This give the output as:
Random Name Generated is:
Yung
heady
Instead of:
Random Name Generated is:
Yung heady
Can someone please explain why so?
The code seems right, perhaps could be of newlines (\n) characters in element of list.
Check the strings of lists.
import random
if __name__ == '__main__':
firstname_list = ["yung1", "yung2", "yung3"]
surname_list = ["heady1", "heady2", "heady3"]
firstname_list = [name.replace('\n', '') for name in firstname_list]
print(f"Random Name Generated is:\n{random.choice(firstname_list)} {random.choice(surname_list)}")
Output:
Random Name Generated is:
yung3 heady2
Since I had pulled these values from UTF-8 encoded .txt file, the readlines() did convert the names to list elements but they had a hidden '\xa0\n' in it.
This caused this particular printing problem. Using .strip() helped to remove the spaces.
print(f"Random Name Generated is:\n{random.choice(firstname_list).strip()} {random.choice(surname_list).strip()}")

how can i split a full name to first name and last name in python?

I'm a novice in python programming and i'm trying to split full name to first name and last name, can someone assist me on this ? so my example file is:
Sarah Simpson
I expect the output like this : Sarah,Simpson
You can use the split() function like so:
fullname=" Sarah Simpson"
fullname.split()
which will give you: ['Sarah', 'Simpson']
Building on that, you can do:
first=fullname.split()[0]
last=fullname.split()[-1]
print(first + ',' + last)
which would give you Sarah,Simpson with no spaces
This comes handly : nameparser 1.0.6 - https://pypi.org/project/nameparser/
>>> from nameparser import HumanName
>>> name = "Sarah Simpson"
>>> name = HumanName(name)
>>> name.last
'Simpson'
>>> name.first
'Sarah'
>>> name.last+', '+name.first
'Simpson, Sarah'
you can try the .split() function which returns a list of strings after splitting by a separator. In this case the separator is a space char.
first remove leading and trailing spaces using .strip() then split by the separator.
first_name, last_name=fullname.strip().split()
Strings in Python are immutable. Create a new String to get the desired output.
You can use split() method of string class.
name = "Sarah Simpson"
name.split()
split() by default splits on whitespace, and takes separator as parameter. It returns a list
["Sarah", "Simpson"]
Just concatenate the strings. For more reference https://docs.python.org/3.7/library/stdtypes.html?highlight=split#str.split
Output = "Sarah", "Simpson"
name = "Thomas Winter"
LastName = name.split()[1]
(note the parantheses on the function call split.)
split() creates a list where each element is from your original string, delimited by whitespace. You can now grab the second element using name.split()[1] or the last element using name.split()[-1]
split() is obviously the function to go for-
which can take a parameter or 0 parameter
fullname="Sarah Simpson"
ls=fullname.split()
ls=fullname.split(" ") #this will split by specified space
Extra Optional
And if you want the split name to be shown as a string delimited by coma, then you can use join() or replace
print(",".join(ls)) #outputs Sarah,Simpson
print(st.replace(" ",","))
Input: Sarah Simpson => suppose it is a string.
Then, to output: Sarah, Simpson. Do the following:
name_surname = "Sarah Simpson".split(" ")
to_output = name_surname[0] + ", " + name_surname[-1]
print(to_output)
The function split is executed on a string to split it by a specified argument passed to it. Then it outputs a list of all chars or words that were split.
In your case: the string is "Sarah Simpson", so, when you execute split with the argument " " -empty space- the output will be: ["Sarah", "Simpson"].
Now, to combine the names or to access any of them, you can right the name of the list with a square brackets containing the index of the desired word to return. For example: name_surname[0] will output "Sarah" since its index is 0 in the list.

multiple variable in python regex

I have seen several related posts and several forums to find an answer for my question, but nothing has come up to what I need.
I am trying to use variable instead of hard-coded values in regex which search for either word in a line.
However i am able to get desired result if i don't use variable.
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub2/a2#dell>
<http://www.somesite.com/software/sub3/a3#Notepad>
re.search(r"\#Msoffice|#vlc|#Notepad", line)
This regex will return the line which has #Msoffice OR #vlc OR #Notepad.
I tried defining a single variable using re.escape and that worked absolutely fine. However i have tried many combination using | and , (pipe and comma) but no success.
Is there any way i can specify #Msoffice , #vlc and #Notepad in different variables and so later i can change those ?
Thanks in advance!!
If I did understand you the right way you'd like to insert variables in your regex.
You are actually using a raw string using r' ' to make the regex more readable, but if you're using f' ' it allows you to insert any variables using {your_var} then construct your regex as you like:
var1 = '#Msoffice'
var2 = '#vlc'
var3 = '#Notepad'
re.search(f'{var1}|{var2}|{var3}', line)
The most annoying issue is that you will have to add \ to escaped char, to look for \ it will be \\
Hope it helped
import re
lines = ["<http://www.somesite.com/software/sub/a1#Msoffice>",
"<http://www.somesite.com/software/sub1/a1#vlc>",
"<http://www.somesite.com/software/sub2/a2#dell>",
"<http://www.somesite.com/software/sub3/a3#Notepad>"]
for line in lines:
if re.search(r'\b(?:\#{}|\#{}|\#{})\b'.format('Msoffice', 'vlc', 'Notepad'), line):
print(line)
Output :
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub3/a3#Notepad>

How to substitute predicate value by a variable using LXML find() with Python 3.6

I am new to Python coding. I am able to create the output XML file. I want to use a variable which holds a string value and pass it to 'predicate' of 'find()'. Is this achievable? How to make this work?
I am using LXML package with Python 3.6. Below is my code. Area of problem is commented at the end of the code.
import lxml.etree as ET
# Create root element
root = ET.Element("Base", attrib={'Name': 'My Base Node'})
# Create first child element
FirstElement = ET.SubElement(root, "FirstNode", attrib={'Name': 'My First Node', 'Comment':'Hello'})
# Create second child element
SecondElement = ET.SubElement(FirstElement, "SecondNode", attrib={'Name': 'My Second Node', 'Comment': 'World'})
# Create XML file
XML_data_as_string = ET.tostring(root, encoding='utf8')
with open("TestFile.xml", "wb") as f:
f.write(XML_data_as_string)
# Variable to substitute in second portion of predicate
NewValue = "My Second Node"
# #### AREA OF PROBLEM ###
# Question. How to pass variable 'NewValue' in the predicate?
# Gives "SyntaxError: invalid predicate"
x = root.find("./FirstNode/SecondNode[#Name={subs}]".format(subs=NewValue))
# I commented above line and reexecuted the code with this below line
# enabled. It gave "ValueError: empty namespace prefix must be passed as None,
# not the empty string"
x = root.find("./FirstNode/SecondNode[#Name=%s]", NewValue)
As Daniel Haley said - you're missing a single quotes in #Name={subs}.
The following line works for me:
x = root.find("./FirstNode/SecondNode[#Name='{subs}']".format(subs=NewValue))
Since you use Python 3.6, you can utilize f-strings:
x = root.find(f"./FirstNode/SecondNode[#Name='{NewValue}']")
The "proper" way to solve this would be to use XPath variables, which are not supported by find() (and consequently, aren't supported by xml.etree from the standard library either) but are supported by xpath().
NewValue = "AJL's Second Node" # Uh oh, that apostrophe is going to break something!!
x_list = root.xpath("./FirstNode/SecondNode[#Name=$subs]", subs=NewValue)
x = x_list[0]
This avoids any sort of issue you might otherwise run into with quoting and escaping.
The main caveat of this method is namespace support, since it doesn't use the bracket syntax of find.
x = root.find("./{foobar.xsd}FirstNode")
# Brackets are doubled to avoid conflicting with `.format()`
x = root.find("./{{foobar.xsd}}FirstNode/SecondNode[#Name='{subs}']".format(subs=NewValue))
Instead, you must specify those in a separate dict:
ns_list = {'hello':'foobar.xsd'}
x_list = root.xpath("./hello:FirstNode/SecondNode[#Name=$subs]", namespaces=ns_list , subs=NewValue)
x = x_list[0]

Resources