How to convert Navigable String to File Object - python-3.x

I am trying to get some data from a website (using the modules named requests & BeautifulSoup) and print it in a text file but every time I try to do so, it says the following:
TypeError: descriptor 'write' requires a 'file' object but received a 'NavigableString'
I have tried using the csv library to import the data but since I couldn't add the line by line data to the csv, I decided to add all the output to a text file and then take out the data I require.
file_object = open("name-list.txt", "w") #Opening the file
name = soup.find(class_='table-responsive') #Extracting the data
name_list = name.find_all('td') #Refining the data
for final in name_list:
all = final.contents[0] #Final result
file.write(all) #This is where the Error Comes
file.close()
When I use print(all) in the for loop, I get the output that I need which consists of multi-line text including the names, age, gender, etc. of the people from the table on the website but when I try to print that output into the text file, the error pops up.

Related

How to read the data and the associated field name that is in a filled-in PDF form

I am writing a python script that needs to pull the data filled in a PDF form as part of a larger script. I tried using pyPDF3 but while it can show me the strings in the form, it does not show the filled-in data. I have a form where I have entered the value 'XXX" into a field and I want the script to be able to return that data and the name of the field but I can't seem to read the data. The fillpdfs module is very helpful but AFAICT it can return the field names but not the data.
I have this snippet:
from PyPDF3 import PdfFileWriter, PdfFileReader
# Open the PDF file
pdf_file = open('filename.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)
# Extract text data from each page
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
'XXX' in page.extractText()
There is a function for pdf forms:
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
print(dictionary)
Documentation

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Is there a way to save a txt file under a new name after reading/writing the file?

I am trying to run a python program to open a template multiple times and while running through a loop, save multiple copies of the txt template under distinct file names.
An example problem is included below: The example template takes the following form:
Null Null
Null
This is the test
But there is still more text.
The code I've made to do a quick edit is as follows:
longStr = (r"C:\Users\jrwaller\Documents\Automated Eve\NewTest.txt")
import fileinput
for line in fileinput.FileInput(longStr,inplace=1):
if "This" in line:
line=line.replace(line,line+"added\n")
print(line, end='')
The output of the code correctly adds the new line "added" to the text file:
Null Null
Null
This is the test
added
But there is still more text.
However, I want to save this new text as a new file name, say "New Test Edited" while keeping a copy of the old txt file available for further edits.
Here is a working example for you:
longStr = (r"C:\Users\jrwaller\Documents\Automated Eve\NewTest.txt")
with open(longStr) as old_file:
with open(r"C:\Users\jrwaller\Documents\Automated Eve\NewTestEdited.txt", "w") as new_file:
for line in old_file:
if "This" in line:
line=line.replace(line,line+"added\n")
new_file.write(line)
A simple file read and write operation with a context managers to close up when you're finished.

Why the output of "open" function doesn't allow me to attribute index?

I started to learn programming in python3 and i am doing a project that reads the content of a text file and tells you how many words are in the file. Being me I always want to challenge myself and tried to add in the output message the name of the file so in the future I will do a GUI for it and so on.
The error that I get is : AttributeError: '_io.TextIOWrapper' object has no attribute 'index'
Here is my code:
# Open text file
document = open("text2.txt", "r+")
# Reads the text file and splits it into arrays
text_split = document.read().split()
# Count the words
words = len(text_split)
# Display the counted words
document_name = document[document.index("name=")]
output = "In the file {} there are {} words.".format(document_name, words)
print (output)
Decided to take #Jean-François Fabre 's advice and abandoned the idea to also output the name of the file (FOR NOW).

issue in saving string list in to text file

I am trying to save and read the strings which are saved in a text file.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'D:\\Trails\\test.txt'
# writing list to txt file
thefile = open(file,'w')
for item in a:
thefile.write("%s\n" % item)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = readfile.readlines()#
print(a[0][0])
print(data[0][1]) # display data read
the output:
str1
'
both a[0][0] and data[0][0] should have the same value, reading which i saved returns empty. What is the mistake in saving the file?
Update:
the 'a' array is having strings on different lengths. what are changes that I can make in saving the file, so that output will be the same.
Update:
I have made changes by saving the file in csv instead of text using this link, incase of text how to save the data ?
You can save the list directly on file and use the eval function to translate the saved data on file in list again. Isn't recommendable but, the follow code works.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'test.txt'
# writing list to txt file
thefile = open(file,'w')
thefile.write("%s" % a)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = eval(readfile.readline())
print(data)
print(a[0][0])
print(data[0][1]) # display data read
print(a)
print(data)
a and data will not have same value as a is a list of three lists.
Whereas data is a list with three strings.
readfile.readlines() or list(readfile) writes all lines in a list.
So, when you perform data = readfile.readlines() python consider ['str1','str2','str3']\n as a single string and not as a list.
So,to get your desired output you can use following print statement.
print(data[0][2:6])

Resources