Removing whitespace after splitting a string in python - string

1)Prompt the user for a string that contains two strings separated by a comma.
2)Report an error if the input string does not contain a comma. Continue to prompt until a valid string is entered. Note: If the input contains a comma, then assume that the input also contains two strings.
3)Using string splitting, extract the two words from the input string and then remove any spaces. Output the two words.
4)Using a loop, extend the program to handle multiple lines of input. Continue until the user enters q to quit.
I wrote a program with these instructions although I cannot work out how to remove extra spaces that may be attached to the outputted words. For example if you enter "Billy, Bob" it works fine, but if you enter "Billy,Bob" you will get an IndexError: list index out of range, or if you enter "Billy , Bob" Billy will be outputted with an extra space attached to the string. Here is my code.
usrIn=0
while usrIn!='q':
usrIn = input("Enter input string: \n")
if "," in usrIn:
tokens = usrIn.split(", ")
print("First word:",tokens[0])
print("Second word:",tokens[1])
print('')
print('')
else:
print("Error: No comma in string.")
How do I remove spaces from the outputs so I can just use a usrIn.split(",") ?

You can use .trim() method which will remove leading and trailing white-spaces.
usrIn.trim().split(","). After this is done you can split them again using the white-space regex, for instance, usrIn.split("\\s+")
The \s will look for white space while the + operator will look for repeated white-spaces.
Hope this helps :)

Related

Remove newlines from a regex matched string

I have a string as below:
Financial strain: No\n?Food insecurity:\nWorry: No\nInability: No\n?Transportation needs:\nMedical: No\nNon-medical: No\nTobacco Use\n?Smoking status: Never Smoker\n?
I want to first match the substring/sentence of interest (I.e. the sentence beginning with "Food insecurity" and ending with "\n?") then remove all the newlines in this sentence apart from the last one i.e. the one before the question mark.
I have been able to match the sentence w/o its last newline and question mark with regex (Food insecurity:).*?(?=\\n\?) but I struggle to remove the first 2 newlines of the matched sentence and return the whole preprocessed string. Any advice?
You could use re.sub with a callback function:
inp = "Financial strain: No\n?Food insecurity:\nWorry: No\nInability: No\n?Transportation needs:\nMedical: No\nNon-medical: No\nTobacco Use\n?Smoking status: Never Smoker\n?"
output = re.sub(r'Food insecurity:\nWorry: No\nInability: No(?=\n\?)', lambda m: m.group().replace('\n', ''), inp)
print(output)

How do I search for a substring in a string then find the character before the substring in python

I am making a small project in python that lets you make notes then read them by using specific arguments. I attempted to make an if statement to check if the string has a comma in it, and if it does, than my python file should find the comma then find the character right below that comma and turn it into an integer so it can read out the notes the user created in a specific user-defined range.
If that didn't make sense then basically all I am saying is that I want to find out what line/bit of code is causing this to not work and return nothing even though notes.txt has content.
Here is what I have in my python file:
if "," not in no_cs: # no_cs is the string I am searching through
user_out = int(no_cs[6:len(no_cs) - 1])
notes = open("notes.txt", "r") # notes.txt is the file that stores all the notes the user makes
notes_lines = notes.read().split("\n") # this is suppose to split all the notes into a list
try:
print(notes_lines[user_out])
except IndexError:
print("That line does not exist.")
notes.close()
elif "," in no_cs:
user_out_1 = int(no_cs.find(',') - 1)
user_out_2 = int(no_cs.find(',') + 1)
notes = open("notes.txt", "r")
notes_lines = notes.read().split("\n")
print(notes_lines[user_out_1:user_out_2]) # this is SUPPOSE to list all notes in a specific range but doesn't
notes.close()
Now here is the notes.txt file:
note
note1
note2
note3
and lastly here is what I am getting in console when I attempt to run the program and type notes(0,2)
>>> notes(0,2)
jeffv : notes(0,2)
[]
A great way to do this is to use the python .partition() method. It works by splitting a string from the first occurrence and returns a tuple... The tuple consists of three parts 0: Before the separator 1: The separator itself 2: After the separator:
# The whole string we wish to search.. Let's use a
# Monty Python quote since we are using Python :)
whole_string = "We interrupt this program to annoy you and make things\
generally more irritating."
# Here is the first word we wish to split from the entire string
first_split = 'program'
# now we use partition to pick what comes after the first split word
substring_split = whole_string.partition(first_split)[2]
# now we use python to give us the first character after that first split word
first_character = str(substring_split)[0]
# since the above is a space, let's also show the second character so
# that it is less confusing :)
second_character = str(substring_split)[1]
# Output
print("Here is the whole string we wish to split: " + whole_string)
print("Here is the first split word we want to find: " + first_split)
print("Now here is the first word that occurred after our split word: " + substring_split)
print("The first character after the substring split is: " + first_character)
print("The second character after the substring split is: " + second_character)
output
Here is the whole string we wish to split: We interrupt this program to annoy you and make things generally more irritating.
Here is the first split word we want to find: program
Now here is the first word that occurred after our split word: to annoy you and make things generally more irritating.
The first character after the substring split is:
The second character after the substring split is: t

How do I get the computer to seperate a conjoined string into seperate items on a list depending on what it detects?

This is a follow up from a question I asked yesterday which I got brilliant responses for but now I have more problems :P
(How do I get python to detect a right brace, and put a space after that?)
Say I have this string that's in a txt document which I make Python read
!0->{100}!1o^{72}->{30}o^{72}->{30}o^{72}->{30}o^{72}->{30}o^{72}->{30}
I want to seperate this conjoined string into individual components that can be indexed after detecting a certain symbol.
If it detects !0, it's considered as one index.
If it detects ->{100}, that is also considered as another part of the list.
It seperates all of them into different parts until the computer prints out:
!0, ->{100}, !1, o^{72}, ->{30}
From yesterdays code, I tried a plethora of things.
I tried this technique which separates anything with '}' perfectly but has a hard time separating !0
text = "(->{200}o^{90}->{200}o^{90}->{200}o^{90}!0->{200}!1o^{90})" #this is an example string
my_string = ""
for character in text:
my_string += character
if character == "}":
my_string+= "," #up until this point, Guimonte's code perfectly splits "}"
elif character == "0": #here is where I tried to get it to detect !0. it splits that, but places ',' on all zeroes
my_string+= ","
print(my_string)
The output:
(->{20,0,},o^{90,},->{20,0,},o^{90,},->{20,0,},o^{90,},!0,->{20,0,},!1o^{90,},)
I want the out put to insead be:
(->{200}, o^{90}, ->{200}, o^{90}, ->{200}, o^{90}, !0, ->{200}, !1, o^{90})
It seperates !0 but it also messes with the other symbols.
I'm starting to approach a check mate scenario. Is there anyway I can get it to split !0 and !1 as well as the right brace?

how can i split a full name to first name and last name in python?

I'm a novice in python programming and i'm trying to split full name to first name and last name, can someone assist me on this ? so my example file is:
Sarah Simpson
I expect the output like this : Sarah,Simpson
You can use the split() function like so:
fullname=" Sarah Simpson"
fullname.split()
which will give you: ['Sarah', 'Simpson']
Building on that, you can do:
first=fullname.split()[0]
last=fullname.split()[-1]
print(first + ',' + last)
which would give you Sarah,Simpson with no spaces
This comes handly : nameparser 1.0.6 - https://pypi.org/project/nameparser/
>>> from nameparser import HumanName
>>> name = "Sarah Simpson"
>>> name = HumanName(name)
>>> name.last
'Simpson'
>>> name.first
'Sarah'
>>> name.last+', '+name.first
'Simpson, Sarah'
you can try the .split() function which returns a list of strings after splitting by a separator. In this case the separator is a space char.
first remove leading and trailing spaces using .strip() then split by the separator.
first_name, last_name=fullname.strip().split()
Strings in Python are immutable. Create a new String to get the desired output.
You can use split() method of string class.
name = "Sarah Simpson"
name.split()
split() by default splits on whitespace, and takes separator as parameter. It returns a list
["Sarah", "Simpson"]
Just concatenate the strings. For more reference https://docs.python.org/3.7/library/stdtypes.html?highlight=split#str.split
Output = "Sarah", "Simpson"
name = "Thomas Winter"
LastName = name.split()[1]
(note the parantheses on the function call split.)
split() creates a list where each element is from your original string, delimited by whitespace. You can now grab the second element using name.split()[1] or the last element using name.split()[-1]
split() is obviously the function to go for-
which can take a parameter or 0 parameter
fullname="Sarah Simpson"
ls=fullname.split()
ls=fullname.split(" ") #this will split by specified space
Extra Optional
And if you want the split name to be shown as a string delimited by coma, then you can use join() or replace
print(",".join(ls)) #outputs Sarah,Simpson
print(st.replace(" ",","))
Input: Sarah Simpson => suppose it is a string.
Then, to output: Sarah, Simpson. Do the following:
name_surname = "Sarah Simpson".split(" ")
to_output = name_surname[0] + ", " + name_surname[-1]
print(to_output)
The function split is executed on a string to split it by a specified argument passed to it. Then it outputs a list of all chars or words that were split.
In your case: the string is "Sarah Simpson", so, when you execute split with the argument " " -empty space- the output will be: ["Sarah", "Simpson"].
Now, to combine the names or to access any of them, you can right the name of the list with a square brackets containing the index of the desired word to return. For example: name_surname[0] will output "Sarah" since its index is 0 in the list.

regex - Making all letters in a text lowercase using re.sub in python but exclude specific string?

I am writing a script to convert all uppercase letters in a text to lower case using regex, but excluding specific strings/characters such as "TEA", "CHI", "I", "#Begin", "#Language", "ENG", "#Participants", "#Media", "#Transcriber", "#Activities", "SBR", "#Comment" and so on.
The script I have is currently shown below. However, it does not provide the desired outputs. For instance when I input "#Activities: SBR", the output given is "#Activities#activities: sbr#activities: sbrSBR". The intended output is "#Activities": "SBR".
I am using Python 3.5.2
Can anyone help to provide some guidance? Thank you.
import os
from itertools import chain
import re
def lowercase_exclude_specific_string(line):
line = line.strip()
PATTERN = r'[^TEA|CHI|I|#Begin|#Language|ENG|#Participants|#Media|#Transcriber|#Activities|SBR|#Comment]'
filtered_line = re.sub(PATTERN, line.lower(), line)
return filtered_line
First, let's see why you're getting the wrong output.
For instance when I input "#Activities: SBR", the output given is
"#Activities#activities: sbr#activities: sbrSBR".
This is because your code
PATTERN = r'[^TEA|CHI|I|#Begin|#Language|ENG|#Participants|#Media|#Transcriber|#Activities|SBR|#Comment]'
filtered_line = re.sub(PATTERN, line.lower(), line)
is doing negated character class matching, meaning it will match all characters that are not in the list and replace them with line.lower() (which is "#activities: sbr"). You can see the matched characters in this regex demo.
The code will match ":" and " " (whitespace) and replace both with "#activities: sbr", giving you the result "#Activities#activities: sbr#activities: sbrSBR".
Now to fix that code. Unfortunately, there is no direct way to negate words in a line and apply substitution on the other words on that same line. Instead, you can split the line first into individual words, then apply re.sub on it using your PATTERN. Also, instead of a negated character class, you should use a negative lookahead:
(?!...)
Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression doesn’t match at
the current position in the string.
Here's the code I got:
def lowercase_exclude_specific_string(line):
line = line.strip()
words = re.split("\s+", line)
result = []
for word in words:
PATTERN = r"^(?!TEA|CHI|I|#Begin|#Language|ENG|#Participants|#Media|#Transcriber|#Activities|SBR|#Comment).*$"
lword = re.sub(PATTERN, word.lower(), word)
result.append(lword)
return " ".join(result)
The re.sub will only match words not in the PATTERN, and replace it with its lowercase value. If the word is part of the excluded pattern, it will be unmatched and re.sub returns it unchanged.
Each word is then stored in a list, then joined later to form the line back.
Samples:
print(lowercase_exclude_specific_string("#Activities: SBR"))
print(lowercase_exclude_specific_string("#Activities: SOME OTHER TEXT SBR"))
print(lowercase_exclude_specific_string("Begin ABCDEF #Media #Comment XXXX"))
print(lowercase_exclude_specific_string("#Begin AT THE BEGINNING."))
print(lowercase_exclude_specific_string("PLACE #Begin AT THE MIDDLE."))
print(lowercase_exclude_specific_string("I HOPe thIS heLPS."))
#Activities: SBR
#Activities: some other text SBR
begin abcdef #Media #Comment xxxx
#Begin at the beginning.
place #Begin at the middle.
I hope this helps.
EDIT:
As mentioned in the comments, apparently there is a tab in between : and the next character. Since the code splits the string using \s, the tab can't be preserved, but it can be restored by replacing : with :\t in the final result.
return " ".join(result).replace(":", ":\t")

Resources