Reading a list of tuples from a text file in python - python-3.x

I am reading a text file and I want to read a list of tuples so that I can add another tuple to it in my program and write that appended tuple back to the text file.
Example in the file
[('john', 'abc')]
Want to write back to the file as
[('john', 'abc'), ('jack', 'def')]
However, I whenever I keep writing back to the file, the appended tuple seems to add in double quotes along with square brackets. I just want it to appear as above.

You can write a reusable function which takes 2 parameters file_path (on which you want to write tuple), tup (which you want to append) and put your logic inside that. Later you can supply proper data to this function and it will do the job for you.
Note: Don't forget to read the documentation as comments in code
tuples.txt (Before writing)
[('john', 'abc')]
Code
def add_tuple_to_file(file_path, tup):
with open(file_path, 'r+') as f:
content = f.read().strip() # read content from file and remove whitespaces around
tuples = eval(content) # convert string format tuple to original tuple object (not possible using json.loads())
tuples.append(tup) # append new tuple `tup` to the old list
f.seek(0) # After reading file, file pointer reaches to end of file so place it again at beginning
f.truncate() # truncate file (erase old content)
f.write(str(tuples)) # write back the updated list
# Try
add_tuple_to_file("./tuples.txt", ('jack', 'def'))
tuples.txt (After writing back)
[('john', 'abc'), ('jack', 'def')]
References
https://www.geeksforgeeks.org/python-ways-to-convert-string-to-json-object/
How to open a file for both reading and writing?

You can use ast.literal_eval to get the list object from the string.
import ast
s = "[('john', 'abc')]"
o = ast.literal_eval(s)
print(repr(o)==s)
o.append(('jack', 'def'))
newstr = repr(o)
print(newstr)
Here it is in action.

Related

file reading in python usnig different methods

# open file in read mode
f=open(text_file,'r')
# iterate over the file object
for line in f.read():
print(line)
# close the file
f.close()
the content of file is "Congratulations you have successfully opened the file"! when i try to run this code the output comes in following form:
c (newline) o (newline) n (newline) g.................
...... that is each character is printed individually on a new line because i used read()! but with readline it gives the answer in a single line! why is it so?
r.read() returns one string will all characters (the full file content).
Iterating a string iterates it character wise.
Use
for line in f: # no read()
instead to iterate line wise.
f.read() returns the whole file in a string. for i in iterates something. For a string, it iterates over its characters.
For readline(), it should not print the line. It would read the first line of the file, then print it character by character, like read. Is it possible that you used readlines(), which returns the lines as a list.
One more thing: there is with which takes a "closable" object and auto-closes it at the end of scope. And you can iterate over a file object. So, your code can be improved like this:
with open(text_file, 'r') as f:
for i in f:
print(i)

Separating header from the rest of the dataset

I am reading in a csv file and then trying to separate the header from the rest of the file.
hn variable is is the read-in file without the first line.
hn_header is supposed to be the first row in the dataset.
If I define just one of these two variables, the code works. If I define both of them, then the one written later does not contain any data. How is that possible?
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)[1:] #this should contain all rows except the header
hn_header = list(read_file)[0] # this should be the header
print(hn[:5]) #works
print(len(hn_header)) #empty list, does not contain the header
The CSV reader can only iterate through the file once, which it does the first time you convert it to a list. To avoid needing to iterate through multiple times, you can save the list to a variable.
hn_list = list(read_file)
hn = hn_list[1:]
hn_header = hn_list[0]
Or you can split up the file using extended iterable unpacking
hn_header, *hn = list(read_file)
Just change below line in your code, no additional steps needed. read_file = list(reader(opened_file)). I hope now your code is running perfectly.
The reader object is an iterator, and by definition iterator objects can only be used once. When they're done iterating you don't get any more out of them.
You can refer more about from this Why can I only use a reader object once? question and also above block-quote taken from that question.

Erasing part of a text file in Python

I have a text file in my hard disk which is really big. It has around 8 million json files which are separated by comma and I want to remove the last json ; however, because it is really big I cannot do it via regular editors (Notepad++, Sublime, Visual Studio Code, ...). So, I decided to use Python, but I have no clue how to erase part of an existing file using python. Any kind of help would be appreciated.
P.S: My file has such a structure:
json1, json2, json3, ...
when each json looks like {"a":"something", "b":"something", "c":"something"}
The easiest way would be to make the file content valid JSON by enclosing it with [ and ] so it becomes a list of dicts, and after removing the last item from the list, you can dump it back into a string and then remove its first and the last characters, which will be [ and ], which your original text file does not want:
import json
with open('file.txt', 'r') as r, open('newfile.txt', 'w') as w:
w.write(json.dumps(json.loads('[%s]' % r.read())[:-1])[1:-1])
Since you only want the last JSON object removed from the file, a much more efficient method would be to identify the first valid JSON object at the end of the file and truncate the file from where that JSON object's preceding comma is positioned.
This can be accomplished by seeking and reading backwards from the end of the file, one relatively small chunk at a time, split the buffer by { (since it marks the beginning of a JSON object), and prepend the fragments one at a time to a buffer until the buffer is parsable as a JSON object (this makes the code able to handle nested dict structures), at which point you should find the preceding comma from the preceding fragment and prepend the comma to the buffer, so that finally, you can seek the file to where the buffer starts and truncate the file:
import json
chunk_size = 1024
with open('file.txt', 'rb+') as f:
f.seek(-chunk_size, 2)
buffer = ''
while True:
fragments = f.read(chunk_size).decode().split('{')
f.seek(-chunk_size * 2, 1)
i = len(fragments)
for fragment in fragments[:0:-1]:
i -= 1
buffer = '{%s%s' % (fragment, buffer)
try:
json.loads(buffer)
break
except ValueError:
pass
else:
buffer = fragments[0] + buffer
continue
break
next_fragment = fragments[i - 1]
# if we don't have a comma in the preceding fragment and it is already the first
# fragment, we need to read backwards a little more
if i == 1 and ',' not in fragments[0]:
f.seek(-2, 1)
next_fragment = f.read(2).decode() + next_fragment
buffer = next_fragment[next_fragment.rindex(','):] + buffer
f.seek(-len(buffer.encode()), 2)
f.truncate()

issue in saving string list in to text file

I am trying to save and read the strings which are saved in a text file.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'D:\\Trails\\test.txt'
# writing list to txt file
thefile = open(file,'w')
for item in a:
thefile.write("%s\n" % item)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = readfile.readlines()#
print(a[0][0])
print(data[0][1]) # display data read
the output:
str1
'
both a[0][0] and data[0][0] should have the same value, reading which i saved returns empty. What is the mistake in saving the file?
Update:
the 'a' array is having strings on different lengths. what are changes that I can make in saving the file, so that output will be the same.
Update:
I have made changes by saving the file in csv instead of text using this link, incase of text how to save the data ?
You can save the list directly on file and use the eval function to translate the saved data on file in list again. Isn't recommendable but, the follow code works.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'test.txt'
# writing list to txt file
thefile = open(file,'w')
thefile.write("%s" % a)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = eval(readfile.readline())
print(data)
print(a[0][0])
print(data[0][1]) # display data read
print(a)
print(data)
a and data will not have same value as a is a list of three lists.
Whereas data is a list with three strings.
readfile.readlines() or list(readfile) writes all lines in a list.
So, when you perform data = readfile.readlines() python consider ['str1','str2','str3']\n as a single string and not as a list.
So,to get your desired output you can use following print statement.
print(data[0][2:6])

The output values in one line.(python3/csv.write)

I write a list of dics into a csv file. But the output is in one line. How could witer each value in new lines?
f = open(os.getcwd() + '/friend1.csv','w+',newline='')
for Member in MemberList:
f.write(str(Member))
f.close()
Take a look at the writing example in the csv module of the standard library and this question. Either that, or simply append a newline ("\n") after each write: f.write(str(Member)) + "\n").

Resources