Get only one word from line - string

How can I take only one word from a line in file and save it in some string variable?
For example my file has line "this, line, is, super" and I want to save only first word ("this") in variable word. I tried to read it character by character until I got on "," but I when I check it I got an error "Argument of type 'int' is not iterable". How can I make this?
line = file.readline() # reading "this, line, is, super"
if "," in len(line): # checking, if it contains ','
for i in line:
if "," not in line[i]: # while character is not ',' -> this is where I get error
word += line[i] # add it to my string

You can do it like this, using split():
line = file.readline()
if "," in line:
split_line = line.split(",")
first_word = split_line[0]
print(first_word)
split() will create a list where each element is, in your case, a word. Commas will not be included.

At a glance, you are on the right track but there are a few things wrong that you can decipher if you always consider what data type is being stored where. For instance, your conditional 'if "," in len(line)' doesn't make sense, because it translates to 'if "," in 21'. Secondly, you iterate over each character in line, but your value for i is not what you think. You want the index of the character at that point in your for loop, to check if "," is there, but line[i] is not something like line[0], as you would imagine, it is actually line['t']. It is easy to assume that i is always an integer or index in your string, but what you want is a range of integer values, equal to the length of the line, to iterate through, and to find the associated character at each index. I have reformatted your code to work the way you intended, returning word = "this", with these clarifications in mind. I hope you find this instructional (there are shorter ways and built-in methods to do this, but understanding indices is crucial in programming). Assuming line is the string "this, line, is, super":
if "," in line: # checking that the string, not the number 21, has a comma
for i in range(0, len(line)): # for each character in the range 0 -> 21
if line[i] != ",": # e.g. if line[0] does not equal comma
word += line[i] # add character to your string
else:
break # break out of loop when encounter first comma, thus storing only first word

Related

Converting Input data to dictionary (no single delimiter)

I am trying to convert an file.txt into dictionary. I know if the delimiter is only used one time, Then the code is as follows:
dict = {}
with open('file.txt') as input_file:
for line in input_file:
entry = line.split(":")
dict[entry[0].strip()] = entry[1].strip()
However, how do you turn a input file into a dictionary with no clear delimiter?
file.txt:
cats****5
doggie**6
ox******7
output:
dict = {'cats':5, 'doggie':6, 'ox':7}
Thank you for your help :)
You can simply split on your delimeter as before, but take the first and last field:
for line in input_file:
entry = line.split("*")
dict[entry[0].strip()] = entry[-1].strip()
Negative indices fetch elements from the back of the list - the index -1 is the last element, -2 is the second-to-last element, and so on.
You can also use unpacking, which allows for self-documenting variable naming:
for line in input_file:
key, *_, value = line.split("*")
dict[key.strip()] = value.strip()
Here, *_ consumes an arbitrary number of values - but not the first or last, since key and value are before and after it and both consume exactly one value. The symbol * denotes the arbitrary size, while _ is a regular name that is just conventionally used for unused values.
If your delimiter also appears in the value, splitting is not robust. Use a regular expression to define the grammar of your delimiter, and capture key and value. For example, if your delimiter is . and you expect float values, the following works:
import re
kv_pattern = re.compile(r'^(.+?)\.+(.+?)$')
# ^ ^ ^ capture shortest match for any character sequence
# ^ ^ longest match of delimiter sequence
# ^ capture shortest match for any character sequence
data = {}
input_data = ['cats....5.0', 'doggie...6', 'ox.......7.']
for line in input_data:
key, value = kv_pattern.match(line).groups()
data[key.strip()] = value.strip()

How to divide text into several parts relative to one character

How to separate text from "^" to "^" and save data to different variables
Below I insert what is the main problem:
DM^126287^8209/2018^INLMDU 39942^70
It will often be the case that the number of characters between "^" will change, so I have to read from the character to the character.
Do you have any ideas?
I know how to check in which cell the sign is located, the code is presented below:
currentWord = "DM^126287^8209/2018^INLMDU 39942^70"
guess = "^"
occurrences = currentWord.count(guess)
indices = [i for i, a in enumerate(currentWord)
if a == guess]
print indices
But it needs to save "8209/2018" to the variable, "INLMDU 39942" to the next variable and "70" to the last variable
Thank you in advance

Removing a string that startswith a specific char Python

text='I miss Wonderland #feeling sad #omg'
prefix=('#','#')
for line in text:
if line.startswith(prefix):
text=text.replace(line,'')
print(text)
The output should be:
'I miss Wonderland'
But my output is the original string with the prefix removed
So it seems that you do not in fact want to remove the whole "string" or "line", but rather the word? Then you'll want to split your string into words:
words = test.split(' ')
And now iterate through each element in words, performing your check on the first letter. Lastly, combine these elements back into one string:
result = ""
for word in words:
if !word.startswith(prefix):
result += (word + " ")
for line in text in your case will iterate over each character in the text, not each word. So when it gets to e.g., '#' in '#feeling', it will remove the #, but 'feeling' will remain because none of the other characters in that string start with/are '#' or '#'. You can confirm that your code is going character by character by doing:
for line in text:
print(line)
Try the following instead, which does the filtering in a single line:
text = 'I miss Wonderland #feeling sad #omg'
prefix = ('#','#')
words = text.split() # Split the text into a list of its individual words.
# Join only those words that don't start with prefix
print(' '.join([word for word in words if not word.startswith(prefix)]))

remove the item in string

How do I remove the other stuff in the string and return a list that is made of other strings ? This is what I have written. Thanks in advance!!!
def get_poem_lines(poem):
r""" (str) -> list of str
Return the non-blank, non-empty lines of poem, with whitespace removed
from the beginning and end of each line.
>>> get_poem_lines('The first line leads off,\n\n\n'
... + 'With a gap before the next.\nThen the poem ends.\n')
['The first line leads off,', 'With a gap before the next.', 'Then the poem ends.']
"""
list=[]
for line in poem:
if line == '\n' and line == '+':
poem.remove(line)
s = poem.remove(line)
for a in s:
list.append(a)
return list
split and strip might be what you need:
s = 'The first line leads off,\n\n\n With a gap before the next.\nThen the poem ends.\n'
print([line.strip() for line in s.split("\n") if line])
['The first line leads off,', 'With a gap before the next.', 'Then the poem ends.']
Not sure where the + fits in as it is, if it is involved somehow either strip or str.replace it, also avoid using list as a variable name, it shadows the python list.
lastly strings have no remove method, you can .replace but since strings are immutable you will need to reassign the poem to the the return value of replace i.e poem = poem.replace("+","")
You can read all non-empty lines like this:
list_m = [line if line not in ["\n","\r\n"] for line in file];
Without looking at your input sample, I am assuming that you simply want your white spaces to be removed. In that case,
for x in range(0, len(list_m)):
list_m[x] = list_m[x].replace("[ ](?=\n)", "");

Split by the delimiter that comes first, Python

I have some unpredictable log lines that I'm trying to split.
The one thing I can predict is that the first field always ends with either a . or a :.
Is there any way I can automatically split the string at whichever delimiter comes first?
Look at the index of the . and : characters in the string using the index() function.
Here’s a simple implementation:
def index_default(line, char):
"""Returns the index of a character in a line, or the length of the string
if the character does not appear.
"""
try:
retval = line.index(char)
except ValueError:
retval = len(line)
return retval
def split_log_line(line):
"""Splits a line at either a period or a colon, depending on which appears
first in the line.
"""
if index_default(line, ".") < index_default(line, ":"):
return line.split(".")
else:
return line.split(":")
I wrapped the index() function in an index_default() function because if the line doesn’t contain a character, index() throws a ValueError, and I wasn’t sure if every line in your log would contain both a period and a colon.
And then here’s a quick example:
mylines = [
"line1.split at the dot",
"line2:split at the colon",
"line3:a colon preceded. by a dot",
"line4-neither a colon nor a dot"
]
for line in mylines:
print split_log_line(line)
which returns
['line1', 'split at the dot']
['line2', 'split at the colon']
['line3', 'a colon preceded. by a dot']
['line4-neither a colon nor a dot']
Check the indexes for both both characters, then use the lowest index to split your string.

Resources