Python Text File Compare and Concatenate - string

I need help with concatenating two text files based on common strings.
My first txt file looks like this:
Hello abc
Wonders xyz
World abc
And my second txt file looks like this:
abc A
xyz B
abc C
I want my output file to be:
Hello abc A
Wonders xyz B
World abc C
My Code goes something like this:
a = open("file1","r")
b = open("file2","r")
c = open("output","w")
for line in b:
chk = line.split(" ")
for line_new in a:
chk_new = line_new.split(" ")
if (chk_new[0] == chk[1]):
c.write(chk[0])
c.write(chk_new[0])
c.write(chk_new[1])
But when I use this code, I get the output as:
Hello abc A
Wonders xyz B
Hello abc C
Line 3 mismatch occurs. What should I do to get it the correct way?

I'm afraid you are mistaken, your code does not produce the output you say it does.
Partly because a file can only be read once, with the exception being if you move the read cursor back to the beginning of the file (file.seek(0), docs).
Partly because the second element of a line in the first file ends with a newline character, thus you are comparing e.g. "abc" with "abc\n" etc. which will never be true.
Hence the output file will be completely empty.
So how do you solve the problem? Reading a file more than once seems overly complicated, don't do that. I suggest you do something along the lines of:
# open all the files simultaneously
with open('file1', 'r') as (f1
), open('file2', 'r') as (f2
), open('output', 'w') as (outf
):
lines_left = True
while lines_left:
f1_line = f1.readline().rstrip()
# check if there's more to read
if len(f1_line) != 0:
f1_line_tokens = f1_line.split(' ')
# no need to strip the line from the second file
f2_line_tokens = f2.readline().split(' ')
if f1_line_tokens[1] == f2_line_tokens[0]:
outf.write(f1_line + ' ' + f2_line_tokens[1])
else:
lines_left = False
I've tested it on your example input and it produces the correct output (where file1 is the first example file and file2 is the second). If we talk about huge files (millions of lines), this version will be considerably faster than aarons. In other cases the performance difference will be negligible.

The open streams aren't safe and you can only read a file once. Do this:
aLines = []
bLines = []
with open("file1","r") as a:
for line in a:
aLines.append(line.strip().split(" "))
with open("file2","r") as b:
for line in b:
bLines.append(line.strip().split(" "))
bLines.reverse()
with open("output","w") as c:
for chk in aLines:
chk_new = bLines.pop()
if chk_new[0] == chk[1]:
c.write(chk[0])
c.write(chk_new[0])
c.write(chk_new[1])

Related

How to separate lines of data read from a textfile? Customers with their orders

I have this data in a text file. (Doesn't have the spacing I added for clarity)
I am using Python3:
orders = open('orders.txt', 'r')
lines = orders.readlines()
I need to loop through the lines variable that contains all the lines of the data and separate the CO lines as I've spaced them.
CO are customers and the lines below each CO are the orders that customer placed.
The CO lines tells us how many lines of orders exist if you look at the index[7-9] of the CO string.
I illustrating this below.
CO77812002D10212020 <---(002)
125^LO917^11212020. <----line 1
235^IL993^11252020 <----line 2
CO77812002S10212020
125^LO917^11212020
235^IL993^11252020
CO95307005D06092019 <---(005)
194^AF977^06292019 <---line 1
72^L223^07142019 <---line 2
370^IL993^08022019 <---line 3
258^Y337^07072019 <---line 4
253^O261^06182019 <---line 5
CO30950003D06012019
139^LM485^06272019
113^N669^06192019
249^P530^07112019
CO37501001D05252020
479^IL993^06162020
I have thought of a brute force way of doing this but it won't work against much larger datasets.
Any help would be greatly appreciated!
You can use fileinput (source) to "simultaneously" read and modify your file. In fact, the in-place functionality that offers to modify a file while parsing it is implemented through a second backup file. Specifically, as stated here:
Optional in-place filtering: if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (...) by default, the extension is '.bak' and it is deleted when the output file is closed.
Therefore, you can format your file as specified this way:
import fileinput
with fileinput.input(files = ['orders.txt'], inplace=True) as orders_file:
for line in orders_file:
if line[:2] == 'CO': # Detect customer line
orders_counter = 0
num_of_orders = int(line[7:10]) # Extract number of orders
else:
orders_counter += 1
# If last order for specific customer has been reached
# append a '\n' character to format it as desired
if orders_counter == num_of_orders:
line += '\n'
# Since standard output is redirected to the file, print writes in the file
print(line, end='')
Note: it's supposed that the file with the orders is formatted exactly in the way you specified:
CO...
(order_1)
(order_2)
...
(order_i)
CO...
(order_1)
...
This did what I hoping to get done!
tot_customers = []
with open("orders.txt", "r") as a_file:
customer = []
for line in a_file:
stripped_line = line.strip()
if stripped_line[:2] == "CO":
customer.append(stripped_line)
print("customers: ", customer)
orders_counter = 0
num_of_orders = int(stripped_line[7:10])
else:
customer.append(stripped_line)
orders_counter +=1
if orders_counter == num_of_orders:
tot_customers.append(customer)
customer = []
orders_counter = 0

Find items in a text file that is a incantinated string of capitalized words that begin with a certain capital letter in python

I am trying to pull a string of input names that get saved to a text file. I need to pull them by capital letter which is input. I.E. the saved text file contains names DanielDanClark, and I need to pull the names that begin with D. I am stuck at this part
for i in range(num):
print("Name",i+1," >> Enter the name:")
n=input("")
names+=n
file=open("names.txt","w")
file.write(names)
lookUp=input("Did you want to look up any names?(Y/N)")
x= ord(lookUp)
if x == 110 or x == 78:
quit()
else:
letter=input("Enter the first letter of the names you want to look up in uppercase:")
file=open("names.txt","r")
fileNames=[]
file.list()
for letter in file:
fileNames.index(letter)
fileNames.close()
I know that the last 4 lines are probably way wrong. It is what I tried in my last failed attempt
Lets break down your code block by block
num = 5
names = ""
for i in range(num)
print("Name",i+1," >> Enter the name:")
n=input("")
names+=n
I took the liberty of giving num a value of 5, and names a value of "", just so the code will run. This block has no problems. And will create a string called names with all the input taken. You might consider putting a delimiter in, which makes it more easier to read back your data. A suggestion would be to use \n which is a line break, so when you get to writing the file, you actually have one name on each line, example:
num = 5
names = ""
for i in range(num)
print("Name",i+1," >> Enter the name:")
n = input()
names += n + "\n"
Now you are going to write the file:
file=open("names.txt","w")
file.write(names)
In this block you forget to close the file, and a better way is to fully specify the pathname of the file, example:
file = open(r"c:\somedir\somesubdir\names.txt","w")
file.write(names)
file.close()
or even better using with:
with open(r"c:\somedir\somesubdir\names.txt","w") as openfile:
openfile.write(names)
The following block you are asking if the user want to lookup a name, and then exit:
lookUp=input("Did you want to look up any names?(Y/N)")
x= ord(lookUp)
if x == 110 or x == 78:
quit()
First thing is that you are using quit() which should not be used in production code, see answers here you really should use sys.exit() which means you need to import the sys module. You then proceed to get the numeric value of the answer being either N or n and you check this in a if statement. You do not have to do ord() you can use a string comparisson directly in your if statement. Example:
lookup = input("Did you want to look up any names?(Y/N)")
if lookup.lower() == "n":
sys.exit()
Then you proceed to lookup the requested data, in the else: block of previous if statement:
else:
letter=input("Enter the first letter of the names you want to look up in uppercase:")
file=open("names.txt","r")
fileNames=[]
file.list()
for letter in file:
fileNames.index(letter)
fileNames.close()
This is not really working properly either, so this is where the delimiter \n is coming in handy. When a text file is opened, you can use a for line in file block to enumerate through the file line by line, and with \n delimiter added in your first block, each line is a name. You also go wrong in the for letter in file block, it does not do what you think it should be doing. It actually returns each letter in the file, regardless of whay you type in the input earlier. Here is a working example:
letter = input("Enter the first letter of the names you want to look up in uppercase:")
result = []
with open(r"c:\somedir\somesubdir\names.txt", "r") as openfile:
for line in openfile: ## loop thru the file line by line
line = line.strip('\n') ## get rid of the delimiter
if line[0].lower() == letter.lower(): ## compare the first (zero) character of the line
result.append(line) ## append to result
print(result) ## do something with the result
Putting it all together:
import sys
num = 5
names = ""
for i in range(num)
print("Name",i+1," >> Enter the name:")
n = input("")
names += n + "\n"
with open(r"c:\somedir\somesubdir\names.txt","w") as openfile:
openfile.write(names)
lookup = input("Did you want to look up any names?(Y/N)")
if lookup.lower() == "n":
sys.exit()
letter = input("Enter the first letter of the names you want to look up in uppercase:")
result = []
with open(r"c:\somedir\somesubdir\names.txt", "r") as openfile:
for line in openfile:
line = line.strip('\n')
if line[0].lower() == letter.lower():
result.append(line)
print(result)
One caveat I like to point out, when you create the file, you open the file in w mode, which will create a new file every time, therefore overwriting the a previous file. If you like to append to a file, you need to open it in a mode, which will append to an existing file, or create a new file when the file does not exist.

Search different words between two files

I have two files (txt), for examle FILE A and FILE B and i want to
find all the words in FILE A that exist in FILE B,
for example if file A is :
HIS HOUSE IS VERY SMALL
and file B is
HIS DOG IS VERY NICE
I want to write a program that show me that HOUSE is not
in file B.
I thought to use the SPLIT command and looping over the file
but since I do not know the python well, does anyone can help me
if there is another command that can help me?
Maybe there is a better solution, but this one below will solve your problem.
import re
def is_letter(s):
return re.match('[a-z]|[A-Z]$', s)
def words_only(s):
for i, x in enumerate(s):
if not is_letter(x):
s = s[:i] + s[i:].replace(x, ' ')
s = re.sub( '\s+', ' ', s).strip().upper().split(' ')
return s
file_a = words_only(open('file_a.txt', 'r').read())
file_b = words_only(open('file_b.txt', 'r').read())
for x in file_a:
if x not in file_b:
print(x)
file_a.txt
HIS DOG IS VERY SMALL.
His wife is very nice.
file_b.txt
HIS DOG IS VERY NICE.
He is very ugly.

Something's wrong with my Python code (complete beginner)

So I am completely new to Python and can't figure out what's wrong with my code.
I need to write a program that asks for the name of the existing text file and then of the other one, that doesn't necessarily need to exist. The task of the program is to take content of the first file, convert it to upper-case letters and paste to the second file. Then it should return the number of symbols used in the file(s).
The code is:
file1 = input("The name of the first text file: ")
file2 = input("The name of the second file: ")
f = open(file1)
file1content = f.read()
f.close
f2 = open(file2, "w")
file2content = f2.write(file1content.upper())
f2.close
print("There is ", len(str(file2content)), "symbols in the second file.")
I created two text files to check whether Python performs the operations correctly. Turns out the length of the file(s) is incorrect as there were 18 symbols in my file(s) and Python showed there were 2.
Could you please help me with this one?
Issues I see with your code:
close is a method, so you need to use the () operator otherwise f.close does not do what your think.
It is usually preferred in any case to use the with form of opening a file -- then it is close automatically at the end.
the write method does not return anything, so file2content = f2.write(file1content.upper()) is None
There is no reason the read the entire file contents in; just loop over each line if it is a text file.
(Not tested) but I would write your program like this:
file1 = input("The name of the first text file: ")
file2 = input("The name of the second file: ")
chars=0
with open(file1) as f, open(file2, 'w') as f2:
for line in f:
f2.write(line.upper())
chars+=len(line)
print("There are ", chars, "symbols in the second file.")
input() does not do what you expect, use raw_input() instead.

How do I write a Python program that computes the average from a .dat file?

I have this so far but I don't know how to write over the .dat file:
def main():
fname = input("Enter filename:")
infile = open(fname, "r")
data = infile.read()
print(data)
for line in infile.readlines():
score = int(line)
counts[score] = counts[score]+1
infile.close()
total=0
for c in enumerate(counts):
total = total + i*c
average = float(total)/float(sum(counts))
print(average)
main()
Here is my .dat file:
4
3
5
6
7
My statistics professor expects us to learn Python to compute the mean and standard deviation. All I need to know is how to do the mean and then I've got the rest figured out. I want to know how does Python write over each line in a .dat file. Could someone tell me how to fix this code? I've never done programming before.
To answer your question, as I understand it, in three parts:
How to read the file in
in your example you use
infile.read()
which reads the entire contents of the file into a string and takes you to the end of file. Therefore the following
infile.readlines()
will read nothing more. You should omit the first read().
How to compute the mean
There are many ways to do this in python - more or less elegant - and also I guess it depends on exactly what the problem is. But in the simplest case you can just sum and count the values as you go , then divide sum by count at the end to get the result:
infile = open("d.dat", "r")
total = 0.0
count = 0
for line in infile.readlines():
print ("reading in line: ",line)
try:
line_value = float(line)
total += line_value
count += 1
print ("value = ",line_value, "running total =",total, "valid lines read = ",count)
except:
pass #skipping non-numeric lines or characters
infile.close()
The try/except part is just in case you have lines or characters in the file that can't be turned into floats, these will be skipped.
How to write to the .dat file
Finally you seem to be asking how to write the result back out to the d.dat file. Not sure whether you really need to do this, it should be acceptable to just display the result as in the above code. However if you do need to write it back to the same file, just close it after reading from it, reopen it for writing (in 'append' mode so output goes to the end of the file), and output the result using write().
outfile = open("d.dat","a")
outfile.write("\naverage = final total / number of data points = " + str(total/count)+"\n")
outfile.close()
fname = input("Enter filename:")
infile = open(fname, "r")
data = infile.readline() #Reads first line
print(data)
data = infile.readline() #Reads second line
print(data)
You can put this in a loop.
Also, these values will come in as Strings convert them to floats using float(data) each time.
Also, the guys over at StackOverflow are not as bad at math as you think. This could have easily been answered there. (And maybe in a better fashion)

Resources