Best way to fix inconsistent csv file in python - python-3.x

I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida

Let's say that you have ① wrong.csv and want to produce ② fixed.csv.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])

Related

How to search a text file using input method

I have a .txt file that I want to search for specific words, or phrases. I want to be able to use an input to do this. Then I would like the file parsed for the input and printed. Basically something like this:
input("Search For:")I WANT TO ENTER MY SEARCH TERM HERE
print(I WANT TO PRINT WHAT I SEARCHED FOR ABOVE)
I am able to do this another way by creating a variable, and then just changing the variable name as needed, but this is not ideal for me. Any ideas on how to create an input to search my .txt?
word = 'Scrubbing'
#variable to store search term
with open(r'/Users/kev/PycharmProjects/find_text/common.txt', 'r') as fp:
lines = fp.readlines()
# read all lines in a list
for line in lines:
if line.find(word) != -1:
# check if string present on a current line
print(word, 'string exists in file')
print('Line Number:', lines.index(line))
print('Line:', line)

Count the number of characters in a file

The question:
Write a function file_size(filename) that returns a count of the number of characters in the file whose name is given as a parameter. You may assume that when being tested in this CodeRunner question your function will never be called with a non-existent filename.
For example, if data.txt is a file containing just the following line: Hi there!
A call to file_size('data.txt') should return the value 10. This includes the newline character that will be added to the line when you're creating the file (be sure to hit the 'Enter' key at the end of each line).
What I have tried:
def file_size(data):
"""Count the number of characters in a file"""
infile = open('data.txt')
data = infile.read()
infile.close()
return len(data)
print(file_size('data.txt'))
# data.txt contains 'Hi there!' followed by a new line
character.
I get the correct answer for this file however I fail a test that users a larger/longer file which should have a character count of 81 but I still get 10. I am trying to get the code to count the correct size of any file.

remove white spaces from the list

I am reading from a CSV file and appending the rows into a list. There are some white spaces that are causing issues in my script. I need to remove those white spaces from the list which I have managed to remove. However can someone please advise if this is the right way to do it?
ip_list = []
with open('name.csv') as open_file:
read_file = csv.DictReader(open_file)
for read_rows in read_file:
ip_list.append(read_rows['column1'])
ip_list = list(filter(None, ip_list))
print(ip_list)
Or a function would be preferable?
Here is a good way to read a csv file and store in list.
L=[] #Create an empty list for the main array
for line in open('log.csv'): #Open the file and read all the lines
x=line.rstrip() #Strip the \n from each line
L.append(x.split(',')) #Split each line into a list and add it to the
#Multidimensional array
print(L)
For example this csv file would produce an output like
This is the first line, Line1
This is the second line, Line2
This is the third line, Line3
This,
List = [('This is the first line', 'Line1'),
('This is the second line', 'Line2'),
('This is the third line', 'Line3')]
Because csv means comma seprated values you can filter based on commas

opening text file and change it in python

I have a big text file like this example:
example:
</Attributes>
FovCount,555
FovCounted,536
ScanID,1803C0555
BindingDensity,0.51
as you see some lines are empty, some are comma separated and some others have different format.
I would like to open the file and look for the lines which start with these 3 words: FovCount, FovCounted and BindingDensity. if the line start with one of them I want to get the number after the comma. from the number related to FovCount and FovCounted I will make criteria and at the end the results is a list with 2 items: criteria and BD (which is the number after BindingDensity). I made the following function in python but it does not return what I want. do you know how to fix it?
def QC(file):
with open(file) as f:
for line in f.split(","):
if line.startswith("FovCount"):
FC = line[1]
elif line.startswith("FovCounted"):
FCed = line[1]
criteria = FC/FCed
elif line.startswith("BindingDensity"):
BD = line[1]
return [criteria, BD]
You are splitting the file into lines separated by a comma (,). But lines aren't separated by a command, they are separated by a newline character (\n).
Try changing f.split(",") to f.read().split("\n") or you can use f.readlines() which basically does the same thing.
You can then split each line into comma-separated segments using segments = line.split(",").
You can check if the first segment matches your text criteria: if segments[0] == "FovCounted", etc.
You can then obtain the value by getting the second segment: value = segments[1].

How to read a text file line by line like a set of commands

I have a text file that has written a function name along with the parameters such as "insert 3" where I need to read the insert and 3 individually to call a function insert with parameter 3.
I have so far opened the file and called .readlines() on it to separate each line into a list of each line of text. I am now struggling to find a way to apply .split() to each element recursively. I am to do this with functional programming and I cannot use a for loop to apply the .split() function.
def execute(fileName):
file = open(fileName + '.txt', 'r').readlines()
print(file)
reduce(lambda x, a: map(x, a), )
I would like to use each line independently with different amounts of parameters so I can call my test script and have it run each function.
Hey I just wrote the code on repl.it you should check it out. But here is the breakdown.
Read each line from file
Now you should a list where each element is a new line from the file
lines = ["command argument", "command argument" ... "command argument"]
Now iterate through each element in the list where you split the element at the " " (space character) and append it to a new list where all the commands and their respective arguments will be stored.
for line in lines:
commands.append(line.split(" "))
Now the commands list should be a multidimensional array containing data like
commands = [["command", "argument"], ["command", "argument"], ... ["command", "argument"]]
Now you can just iterate through each sub-list where value at index 0 is the command and value at index 1 is the argument. After this you can use if statements to check for what command/ function to run with what datatype as an argument
HERE IS THE WHOLE CODE:
command = []
with open("command_files.txt", "r") as f:
lines = f.read().strip().split("\n") # removing spaces on both ends, and spliting at the new line character \n
print(lines) # now we have a list where each element is a line from the file
# breaking each line at " " (space) to get command and the argument
for line in lines:
# appending the list to command list
command.append(line.split(" "))
# now the command list should be a multidimensional array
# we just have to go through each of the sub list and where the value at 0 index should be the command, and at index 1 the arguments
for i in command:
if i[0] == "print":
print(i[1])
else:
print("Command not recognized")

Resources