parsing words in a document using specific delimiters

parsing words in a document using specific delimiters - string

I have a document that I'm parsing words from but I want to consider anything that is not a-z, A-Z, 0-9, or an apostrophe, to be white space. How could I do this if I am using the following bit of code before:
ifstream file;
file.open(filePath);
while(file >> word){
listOfWords.push_back(word); // I want to make sure only words with the stated
// range of characters exist in my list.
}
So, for example, the word hor.se would be two elements in my list, "hor" and "se".

Create a list of "whitespace characters" and then each time you encounter a character, check to see if that character is in the list and if so you've started a new word. This example is written in python, but the concept is the same.
def get_words(whitespace_chars, string):
words = []
current_word = ""
for x in range(0, len(string)):
#check to see if we hit the end of a word.
if(string[x] in whitespace_chars and current_word != ""):
words.append(current_word)
current_word = ""
#add current letter to current word.
else:
current_word += string[x]
#if the last letter isnt whitespace then the last word wont be added, so add here.
if(current_word != ""):
words.append(current_word)
return words
return words

Related

Separating a string with large letters into words that begin with the same letters

Suppose you have a string "TodayIsABeautifulDay". How can we get separate it in Python into words like this ["Today", "Is", "A", "Beautiful", "Day"]?

First, use an empty list ‘words’ and append the first letter of ‘word’ to it.
Now using a for loop, check if the current character is in lower case or not, if yes append it to the current string, otherwise, if uppercase, begin a new individual string.
def split_words(word):
words = [[word[0]]]
for char in word[1:]:
if words[-1][-1].islower() and char.isupper():
words.append(list(char))
else:
words[-1].append(char)
return [''.join(word) for word in words]
You can use this function :
word = "TodayIsABeautifulDay"
print(split_words(word))

need to use a 'for loop' for this one. The user has to enter a sentence and any spaces must be replaced with "%"

the input
sentence = input("Please enter a sentence:")
the for loop (incorrect here)
for i in sentence:
print(sentence)
space_loc = sentence.index(" ")
for c in sentence:
print(space_loc)
for b in range(space_loc):
print("%")
confused about how to get the answer out.

You can try using concatenation of strings and slicing in this one.
sentence = input()
After taking the input simply store the length of your string
length = len(sentence)
Then iterate through every characters in the string and when you find a " ", break the string into two halves using slicing such that each half has one side of the string from " ". And then, join it by a "%" :-
for i in range(length):
if sentence[i]==" ":
sentence = sentence[:i] + "%" + sentence[i+1:]
Here, sentence[:i] is the part of string before the space and sentence[i+1:] is the part of string after the space.

One way of solving your query:
Code
sentence = input("Please enter a sentence:")
ls=sentence.split() #Creating a list of words present in sentence
new_sentence='%'.join(ls) #Joining the list with '%'
print(new_sentence)
Output
Please enter a sentence:Hello there coders!
Hello%there%coders!
EDIT
I do not understand how exactly you want to use the for loop here. If you just want to include a for loop (no restrictions), then you can do this:
Code
ls=[]
a=0
sentence = input("Please enter a sentence:")
for i in range(0,len(sentence)): # This loop will find the words in the sentence and store them in a list. Words are determined by checking the white space. Each space is replaced with '%'
if sentence[i]==' ':
ls.append(sentence[a:i])
a=i
ls.append('%')
ls.append(sentence[a:]) # This is to save the last word
ls1=[]
for i in ls: # Removing any white space inside the list
j=i.replace(' ','')
ls1.append(j)
print(''.join(ls1)) # Displaying final output
Again, your question is very open ended and this is just one way of using for loop to get the desired result!

Python 2.7 - remove special characters from a string and camelCasing it

Input:
to-camel-case
to_camel_case
Desired output:
toCamelCase
My code:
def to_camel_case(text):
lst =['_', '-']
if text is None:
return ''
else:
for char in text:
if text in lst:
text = text.replace(char, '').title()
return text
Issues:
1) The input could be an empty string - the above code does not return '' but None;
2) I am not sure that the title()method could help me obtaining the desired output(only the first letter of each word before the '-' or the '_' in caps except for the first.
I prefer not to use regex if possible.

A better way to do this would be using a list comprehension. The problem with a for loop is that when you remove characters from text, the loop changes (since you're supposed to iterate over every item originally in the loop). It's also hard to capitalize the next letter after replacing a _ or - because you don't have any context about what came before or after.
def to_camel_case(text):
# Split also removes the characters
# Start by converting - to _, then splitting on _
l = text.replace('-','_').split('_')
# No text left after splitting
if not len(l):
return ""
# Break the list into two parts
first = l[0]
rest = l[1:]
return first + ''.join(word.capitalize() for word in rest)
And our result:
print to_camel_case("hello-world")
Gives helloWorld
This method is quite flexible, and can even handle cases like "hello_world-how_are--you--", which could be difficult using regex if you're new to it.

How to rewrite code

My comment is down below with the code
My program censors words. It works for both one word and many words. I was having trouble making the program work for many words. It would print out the sentence with the space censored too. I found code to make it work though but do not understand it.
sentence = input("Enter a sentence:")
word = input("Enter a word to replace:")
words = word
def censorWord(sentence,word):
# I would like to rewrite this code in a way I can understand and read clearer.
return " ".join(["-"*len(item) if item in word else item for item in sentence.split()])
def censorWords(sentence,words):
words1 = words.split()
for w in words1:
if w in sentence:
return replaceWord(sentence,word)
print(censorWords(sentence,words))

def censorWord (sentence, word):
result = [] #list to store the new sentence words
eachword = sentence.split() #splits each word in the sentence and store in a seperate array element
for item in eachword: #iterates the list until last word
if item == word: #if current list item matches the given word then insert - for length of the word
item = "-"*len(word)
result.append(item) #add the word to the list
return " ".join(result) #join all the words in the list with space in between

You can rewrite:
s = " ".join(["-" * len(item) if item in word else item for item in sentence.split()])
Into:
arr = []
for item in sentence.split():
if item in word:
arr.append("-" * len(item))
else:
arr.append(item)
s = " ".join(arr)
It basically splits sentence by spacing. Then if the current item is in word, then it gets replaced with it's own length in hyphens.

You seem to be a bit confused censorWord() is censoring all words in the sentence and censorWords() looks like it is trying to do the same thing but returns in the middle of processing. Just looking at censorWord():
More descriptive variable naming and breaking the one liner down would probably make it clearer, e.g.:
def redact(word):
return '-'*len(word)
def censorWord(sentence, censored_words):
words = sentence.split()
return " ".join([redact(word) if word in censored_words else word for word in words])
You can always turn this into a for loop but list comprehensions are a common part of python and you should get comfortable with them:
def censorWord(sentence, censored_words):
words = sentence.split()
clean_sentence = []
for word in words:
if word in censored_words:
clean_sentence.append(redact(word))
else:
clean_sentence.append(word)
return " ".join(clean_sentence)

Removing a string that startswith a specific char Python

text='I miss Wonderland #feeling sad #omg'
prefix=('#','#')
for line in text:
if line.startswith(prefix):
text=text.replace(line,'')
print(text)
The output should be:
'I miss Wonderland'
But my output is the original string with the prefix removed

So it seems that you do not in fact want to remove the whole "string" or "line", but rather the word? Then you'll want to split your string into words:
words = test.split(' ')
And now iterate through each element in words, performing your check on the first letter. Lastly, combine these elements back into one string:
result = ""
for word in words:
if !word.startswith(prefix):
result += (word + " ")

for line in text in your case will iterate over each character in the text, not each word. So when it gets to e.g., '#' in '#feeling', it will remove the #, but 'feeling' will remain because none of the other characters in that string start with/are '#' or '#'. You can confirm that your code is going character by character by doing:
for line in text:
print(line)
Try the following instead, which does the filtering in a single line:
text = 'I miss Wonderland #feeling sad #omg'
prefix = ('#','#')
words = text.split() # Split the text into a list of its individual words.
# Join only those words that don't start with prefix
print(' '.join([word for word in words if not word.startswith(prefix)]))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

parsing words in a document using specific delimiters - string

Related

Separating a string with large letters into words that begin with the same letters

need to use a 'for loop' for this one. The user has to enter a sentence and any spaces must be replaced with "%"

Python 2.7 - remove special characters from a string and camelCasing it

How to rewrite code

Removing a string that startswith a specific char Python

Categories

Resources