opening text file and change it in python - python-3.x

I have a big text file like this example:
example:
</Attributes>
FovCount,555
FovCounted,536
ScanID,1803C0555
BindingDensity,0.51
as you see some lines are empty, some are comma separated and some others have different format.
I would like to open the file and look for the lines which start with these 3 words: FovCount, FovCounted and BindingDensity. if the line start with one of them I want to get the number after the comma. from the number related to FovCount and FovCounted I will make criteria and at the end the results is a list with 2 items: criteria and BD (which is the number after BindingDensity). I made the following function in python but it does not return what I want. do you know how to fix it?
def QC(file):
with open(file) as f:
for line in f.split(","):
if line.startswith("FovCount"):
FC = line[1]
elif line.startswith("FovCounted"):
FCed = line[1]
criteria = FC/FCed
elif line.startswith("BindingDensity"):
BD = line[1]
return [criteria, BD]

You are splitting the file into lines separated by a comma (,). But lines aren't separated by a command, they are separated by a newline character (\n).
Try changing f.split(",") to f.read().split("\n") or you can use f.readlines() which basically does the same thing.
You can then split each line into comma-separated segments using segments = line.split(",").
You can check if the first segment matches your text criteria: if segments[0] == "FovCounted", etc.
You can then obtain the value by getting the second segment: value = segments[1].

Related

How to find a substring in a line from a text file and add that line or the characters after the searched string into a list using Python?

I have a MIB dataset which is around 10k lines. I want to find a certain string (for eg: "SNMPv2-MIB::sysORID") in the text file and add the whole line into a list. I am using Jupyter Notebooks for running the code.
I used the below code to search the search string and it print the searched string along with the next two strings.
basic = open('mibdata.txt')
file = basic.read()
city_name = re.search(r"SNMPv2-MIB::sysORID(?:[^a-zA-Z'-]+[a-zA-Z'-]+) {1,2}", file)
city_name = city_name.group()
print(city_name)
Sample lines in file:
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
SNMPv2-MIB::sysORDescr.1 = STRING: The MIB for Message Processing and Dispatching.
The output expected is
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
but i get only
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB
The problem with changing the number of string after the searched strings is that the number of strings in each line is different and i cannot specify a constant. Instead i want to use '\n' as a delimiter but I could not find one such post.
P.S. Any other solution is also welcome
EDIT
You can read all lines one by one of the file and look for a certain Regex that matches the case.
r(NMPv2-MIB::sysORID).* finds the encounter of the string in the parenthesis and then matches everything followed after.
import re
basic = open('file.txt')
entries = map(lambda x : re.search(r"(SNMPv2-MIB::sys).*",x).group() if re.search(r"(SNMPv2-MIB::sys).*",x) is not None else "", basic.readlines())
non_empty_entries = list(filter(lambda x : x is not "", entries))
print(non_empty_entries)
If you are not comfortable with Lambdas, what the above script does is
taking the text from the file, splits it into lines and checks all lines individually for a regex match.
Entries is a list of all lines where the match was encountered.
EDIT vol2
Now when the regex doesn't match it will add an empty string and after we filter them out.

using title() creates a extra line in python

Purpose: Writing a code to capitalize the first letter of each word in a file.
Steps include opening the file in read mode and using title() in each line.
When the output is printed it creates extra blank line between each line in the file.
For example:
if the content is
one two three four
five six seven eight
output is:
One Two Three Four
Five Six Seven Eight
Not sure why the space shows up there
I used strip() followed by title() to escape the spaces but would like to know why we get spaces.
inputfile = input("Enter the file name:")
openedfile = open(inputfile, 'r')
for line in openedfile:
capitalized=line.title()
print(capitalized)
the above code prints output with an added blank line
solved it using below code:
inputfile = input("Enter the file name:")
openedfile = open(inputfile, 'r')
for line in openedfile:
capitalized=line.title().strip()
print(capitalized)
Expected to print capitalized words without spaces by just using title() and not title().strip()

Best way to fix inconsistent csv file in python

I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
Let's say that you have ① wrong.csv and want to produce ② fixed.csv.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])

Reading text files and calculate the mean length of every 3rd word

How to open a text file (includes 5 lines) and writting a program to calculate the mean length of the third word in line over all lines in this text file. (A word is defined as a group of characters surrounded by spaces and/or a line ending.)
I suggest reading this Reading and writing Files in Python .. since what you are asking is a pretty basic question and I believe there are many resources out there. Just search :]
But not to leave you empty handed...
# mean_word.py
with open('file.txt') as data_file:
# Split data into lists representing lines
word_lists = [line.split(' ') for line in data_file.readlines()]
word_count = sum(len(line) for line in word_lists)
n_of_words = sum(len(word) for line in word_lists for word in line)
mean_word_len = n_of_words / word_count

Python read file contents into nested list

I have this file that contains something like this:
OOOOOOXOOOO
OOOOOXOOOOO
OOOOXOOOOOO
XXOOXOOOOOO
XXXXOOOOOOO
OOOOOOOOOOO
And I need to read it into a 2D list so it looks like this:
[[O,O,O,O,O,O,X,O,O,O,O],[O,O,O,O,O,X,O,O,O,O,O],[O,O,O,O,X,O,O,O,O,O,O],[X,X,O,O,X,O,O,O,O,O,O],[X,X,X,X,O,O,O,O,O,O,O,O],[O,O,O,O,O,O,O,O,O,O,O]
I have this code:
ins = open(filename, "r" )
data = []
for line in ins:
number_strings = line.split() # Split the line on runs of whitespace
numbers = [(n) for n in number_strings]
data.append(numbers) # Add the "row" to your list.
return data
But it doesn't seem to be working because the O's and X's do not have spaces between them. Any ideas?
Just use data.append(list(line.rstrip())) list accepts a string as argument and just splits them on every character.

Resources