How do I remove the other stuff in the string and return a list that is made of other strings ? This is what I have written. Thanks in advance!!!
def get_poem_lines(poem):
r""" (str) -> list of str
Return the non-blank, non-empty lines of poem, with whitespace removed
from the beginning and end of each line.
>>> get_poem_lines('The first line leads off,\n\n\n'
... + 'With a gap before the next.\nThen the poem ends.\n')
['The first line leads off,', 'With a gap before the next.', 'Then the poem ends.']
"""
list=[]
for line in poem:
if line == '\n' and line == '+':
poem.remove(line)
s = poem.remove(line)
for a in s:
list.append(a)
return list
split and strip might be what you need:
s = 'The first line leads off,\n\n\n With a gap before the next.\nThen the poem ends.\n'
print([line.strip() for line in s.split("\n") if line])
['The first line leads off,', 'With a gap before the next.', 'Then the poem ends.']
Not sure where the + fits in as it is, if it is involved somehow either strip or str.replace it, also avoid using list as a variable name, it shadows the python list.
lastly strings have no remove method, you can .replace but since strings are immutable you will need to reassign the poem to the the return value of replace i.e poem = poem.replace("+","")
You can read all non-empty lines like this:
list_m = [line if line not in ["\n","\r\n"] for line in file];
Without looking at your input sample, I am assuming that you simply want your white spaces to be removed. In that case,
for x in range(0, len(list_m)):
list_m[x] = list_m[x].replace("[ ](?=\n)", "");
Related
Given a string how do i move part of the string in to a new line. without moving the rest of the line or characters
'This' and 'this' word should go in the next line
Output:
> and word should go in the next line
This this
This is just an example of the output i want assuming the words can be different by characters. To be more clear say i have some string elements in an array and i have to move every second and third word of the elements to a new line and printing the rest of the line as is. I've tried using \n and a for loop. But it also moves the rest of the string to a new line
['This and this', 'word should go', 'in the next']
Output:
> This word in
and this should go the next
So the 2nd and 3rd word of the elements are moved without affecting the rest of the line. Is it possible to do this without much complication? I'm aware of the format method but i don't know how to use it in this situation.
For your first example, in case you don't know the order of the target words in advance, I would use a dictionary to store the indices of the found words. Then you can sort those to put the found words in the second line in the same order as they appeared in the text:
targets = ['this', 'This']
source = 'This and this word should go in the next line.'
target_ixs = {source.find(target): target for target in targets}
line2 = ' '.join([target_ixs[i] for i in sorted(target_ixs)])
line1 = source
for target in targets:
line1 = line1.replace(target, '')
line1 = line1.replace(' ', ' ').lstrip()
result = line1 + '\n' + line2
print(result)
and word should go in the next line.
This this
Your second example is easier, because you already know which parts of the strings to put in the second line, so you just need to split each string into a list of words and select from those:
source = ['This and this', 'word should go', 'in the next']
source_lists = [s.split() for s in source]
line1 = ' '.join([source_list[0] for source_list in source_lists])
line2 = ' '.join([' '.join(source_list[1:]) for source_list in source_lists])
result = line1 + '\n' + line2
print(result)
This word in
and this should go the next
You can probably do quite a bit without much complication using the regular expression library and some python language features. That being said, it depends on how complex the rules are for determining what words go where. Typically, you want to start with a string and "tokenize" it into the constituent words. See the code example below:
import re
sentence = "This and this word should go in the next line"
all_words = re.split(r'\W+', sentence)
matched_words = " ".join(re.findall(r"this", sentence, re.IGNORECASE))
unmatched_words = " ".join([word for word in all_words if word not in matched_words])
print(f"{unmatched_words}\n{matched_words}")
> and word should go in the next line
This this
Final Thoughts:
I am by no means a regex ninja so, there may be even more clever things that can be done with just regex patterns and functions. Hopefully, this gives you some food for thought at least.
Got it:
data = ['This and this', 'word should go', 'in the next']
first_line = []
second_line = []
for item in data:
item = item.split(' ')
first_word = item[0]
item.remove(first_word)
others = " ".join(item)
first_line.append(first_word)
second_line.append(others)
print(" ".join(first_line) + "\n" + " ".join(second_line))
My Solution:
input_data = ['This and this', 'word should go ok', 'this next']
I've slightly altered your test string to better test the code.
# Example 1
# Print all words in input_data, moving any word matching the
# string "this" (match is case insensitive) to the next line.
print('Example 1')
lines = ([], [])
for words in input_data:
for word in words.split():
lines[word.lower() == 'this'].append(word)
result = ' '.join(lines[0]) + '\n' + ' '.join(lines[1])
print(result)
The code in example 1 sorts each word into the 2-element tuple, lines. The key part is the boolean expression that preforms the string comparison.
# Example 2
# Print all words in input_data, moving the second and third
# word in any string to the next line.
from itertools import count
print('\nExample 2')
lines = ([], [])
for words in input_data:
for q in zip(count(), words.split()):
lines[q[0] in (1, 2)].append(q[1])
result = ' '.join(lines[0]) + '\n' + ' '.join(lines[1])
print(result)
The next solution is basically the same as the first. I zip each word to an integer so you know the word's position when you get to the boolean expression which, again, sorts the words into their appropriate list in lines.
As you can see, this solution is fairly flexible and can be adjusted to fit a number of scenarios.
Good luck, and I hope this helped!
Input:
to-camel-case
to_camel_case
Desired output:
toCamelCase
My code:
def to_camel_case(text):
lst =['_', '-']
if text is None:
return ''
else:
for char in text:
if text in lst:
text = text.replace(char, '').title()
return text
Issues:
1) The input could be an empty string - the above code does not return '' but None;
2) I am not sure that the title()method could help me obtaining the desired output(only the first letter of each word before the '-' or the '_' in caps except for the first.
I prefer not to use regex if possible.
A better way to do this would be using a list comprehension. The problem with a for loop is that when you remove characters from text, the loop changes (since you're supposed to iterate over every item originally in the loop). It's also hard to capitalize the next letter after replacing a _ or - because you don't have any context about what came before or after.
def to_camel_case(text):
# Split also removes the characters
# Start by converting - to _, then splitting on _
l = text.replace('-','_').split('_')
# No text left after splitting
if not len(l):
return ""
# Break the list into two parts
first = l[0]
rest = l[1:]
return first + ''.join(word.capitalize() for word in rest)
And our result:
print to_camel_case("hello-world")
Gives helloWorld
This method is quite flexible, and can even handle cases like "hello_world-how_are--you--", which could be difficult using regex if you're new to it.
How can I take only one word from a line in file and save it in some string variable?
For example my file has line "this, line, is, super" and I want to save only first word ("this") in variable word. I tried to read it character by character until I got on "," but I when I check it I got an error "Argument of type 'int' is not iterable". How can I make this?
line = file.readline() # reading "this, line, is, super"
if "," in len(line): # checking, if it contains ','
for i in line:
if "," not in line[i]: # while character is not ',' -> this is where I get error
word += line[i] # add it to my string
You can do it like this, using split():
line = file.readline()
if "," in line:
split_line = line.split(",")
first_word = split_line[0]
print(first_word)
split() will create a list where each element is, in your case, a word. Commas will not be included.
At a glance, you are on the right track but there are a few things wrong that you can decipher if you always consider what data type is being stored where. For instance, your conditional 'if "," in len(line)' doesn't make sense, because it translates to 'if "," in 21'. Secondly, you iterate over each character in line, but your value for i is not what you think. You want the index of the character at that point in your for loop, to check if "," is there, but line[i] is not something like line[0], as you would imagine, it is actually line['t']. It is easy to assume that i is always an integer or index in your string, but what you want is a range of integer values, equal to the length of the line, to iterate through, and to find the associated character at each index. I have reformatted your code to work the way you intended, returning word = "this", with these clarifications in mind. I hope you find this instructional (there are shorter ways and built-in methods to do this, but understanding indices is crucial in programming). Assuming line is the string "this, line, is, super":
if "," in line: # checking that the string, not the number 21, has a comma
for i in range(0, len(line)): # for each character in the range 0 -> 21
if line[i] != ",": # e.g. if line[0] does not equal comma
word += line[i] # add character to your string
else:
break # break out of loop when encounter first comma, thus storing only first word
text='I miss Wonderland #feeling sad #omg'
prefix=('#','#')
for line in text:
if line.startswith(prefix):
text=text.replace(line,'')
print(text)
The output should be:
'I miss Wonderland'
But my output is the original string with the prefix removed
So it seems that you do not in fact want to remove the whole "string" or "line", but rather the word? Then you'll want to split your string into words:
words = test.split(' ')
And now iterate through each element in words, performing your check on the first letter. Lastly, combine these elements back into one string:
result = ""
for word in words:
if !word.startswith(prefix):
result += (word + " ")
for line in text in your case will iterate over each character in the text, not each word. So when it gets to e.g., '#' in '#feeling', it will remove the #, but 'feeling' will remain because none of the other characters in that string start with/are '#' or '#'. You can confirm that your code is going character by character by doing:
for line in text:
print(line)
Try the following instead, which does the filtering in a single line:
text = 'I miss Wonderland #feeling sad #omg'
prefix = ('#','#')
words = text.split() # Split the text into a list of its individual words.
# Join only those words that don't start with prefix
print(' '.join([word for word in words if not word.startswith(prefix)]))
I have some unpredictable log lines that I'm trying to split.
The one thing I can predict is that the first field always ends with either a . or a :.
Is there any way I can automatically split the string at whichever delimiter comes first?
Look at the index of the . and : characters in the string using the index() function.
Here’s a simple implementation:
def index_default(line, char):
"""Returns the index of a character in a line, or the length of the string
if the character does not appear.
"""
try:
retval = line.index(char)
except ValueError:
retval = len(line)
return retval
def split_log_line(line):
"""Splits a line at either a period or a colon, depending on which appears
first in the line.
"""
if index_default(line, ".") < index_default(line, ":"):
return line.split(".")
else:
return line.split(":")
I wrapped the index() function in an index_default() function because if the line doesn’t contain a character, index() throws a ValueError, and I wasn’t sure if every line in your log would contain both a period and a colon.
And then here’s a quick example:
mylines = [
"line1.split at the dot",
"line2:split at the colon",
"line3:a colon preceded. by a dot",
"line4-neither a colon nor a dot"
]
for line in mylines:
print split_log_line(line)
which returns
['line1', 'split at the dot']
['line2', 'split at the colon']
['line3', 'a colon preceded. by a dot']
['line4-neither a colon nor a dot']
Check the indexes for both both characters, then use the lowest index to split your string.