Python filewriting with write(), writelines(), to_csv() not working - python-3.x

I'm running a piece of code which takes input from a txt file, uses the input to scrape a tor webpage and then gives a list of strings called result. I'm using the tbselenium module. I need to write this list to two output files valid.txt and address.txt, when i run the script i get the result (a list of strings) but nothing is written to the two output files. There is no error raised and the print statements inside the two functions work perfectly. The input is read successfully
from tbselenium.tbdriver import TorBrowserDriver
import requests
import time
import pandas as pd
def read_input():
with open('Entries.txt') as fp:
users = fp.readlines()
return users
users = read_input()
result = some_function(users) # This function scrapes the webpage using selenium
def write_output(result):
with open('valid.txt', 'a+') as fw:
fw.writelines(result)
print('Writing to valid.txt', result)
def write_addr(result):
with open('address.txt', 'a+') as fw:
for x in result:
fw.write(x.split(':')[5]+'\n')
print('Writing to address.txt')
write_output(result)
write_addr(result)
I then tried writing the same output to a csv file.
df = pd.DataFrame(result)
print(df)
df.to_csv('valid.csv', mode='a', header=False)
The dataFrame is created but nothing is written to the csv file. It is not even created if i haven't already created one in my folder.
If i don't run the scraping function and try to write something to the output files then it works.

Solved. While running the selenium driver, it changes the current working directory to the directory where the tor browser is located and hence all the files are being
saved there.

import pandas as pd
def write_output(result):
with open('valid.txt', 'a+') as fw:
fw.writelines(result)
print('Writing to valid.txt', result)
def write_addr(result):
with open('address.txt', 'a+') as fw:
for x in result:
fw.write(x.split(':')[5]+'\n')
print('Writing to address.txt')
result = ['I am :scrap:ed d:ata:from tor wit:h add:ress:es\n', 'I am :scrap:ed d:ata:from tor wit:h add:ress:es\n', 'I am :scrap:ed d:ata:from tor wit:h add:ress:es\n']
write_output(result)
write_addr(result)
df = pd.DataFrame(result)
print(df)
df.to_csv('valid.csv', mode='a', header=False)
I didn't find any problem with your code (at least not with the write functions you have for creating valid.txt, addrress.txt, and valid.csv).
I have tested your code with my own dummy result. As seen from the attached images, the respective files were created successfully. I suspect the error might be from your result list. You should also check to make sure the 5th index after the split(':') is not a space character and note that the files will be created in the directory the python script is opened from (in case you are looking for the files in the wrong directory). Other than these the desired functions should run properly provided your result is returned from your web scraping function.
Cheers

Related

How to write a simple test code to test a python program that has a function with two arguments?

I'm new in python, I have written a python program that reads a list of files and saves the total number of a particular character (ch) in a dictionary and then returns it.
The program works fine, now I'm trying to write a simple test code to test the program.
I tried with the following code,
def test_read_files():
assert read_files("H:\\SomeTextFiles\\zero-k.txt", 'k') == 0, "Should be 0"
if __name__ == "__main__":
test_read_files()
print("Everything passed")
I named the program as test_read_files.py
My python code is as follows:
# This function reads a list of files and saves number of
# a particular character (ch) in dictionary and returns it.
def read_files(filePaths, ch):
# dictionary for saing no of character's in each file
dictionary = {}
for filePath in filePaths:
try:
# using "with statement" with open() function
with open(filePath, "r") as file_object:
# read file content
fileContent = file_object.read()
dictionary[filePath] = fileContent.count(ch)
except Exception:
# handling exception
print('An Error with opening the file '+filePath)
dictionary[filePath] = -1
return dictionary
fileLists = ["H:\\SomeTextFiles\\16.txt", "H:\\SomeTextFiles\\Statement1.txt",
"H:\\SomeTextFiles\\zero-k.txt", "H:\\SomeTextFiles"]
print(read_files(fileLists, 'k'))
I named it as read_files.py
When I run the test code, getting an error: NameError: name 'read_files' is not defined
The program and the test code all are in the same folder (different than the python folder though).
Hopefully I am understanding this correctly, but if both of you python files:
test_read_files.py
read_files.py
Are in the same directory.. Then you should be able to just add at the top of the test_read_files.py the following import command:
from read_files import read_files
This will import the read_files function from your read_files.py script and that way you will be able to run it inside the other file.

Why my function that creates a pandas dataframe changes the dtype to none when called

I'm working on processing csv files, I was writing my code without functions and it worked, albeit some problems when trying to fillna with a string, before I did a try and except.
For some reason it didn't work before creating the while loop.
My question is why a dataframe object created inside of a function by reading a csv file name I passed when I called the function, returned an empty object? I thought when the dataframe was in memory it wouldn't be destroyed, what am I missing?
My code:
import pandas as pd
grossmargin = 1.2
def read_wholesalefile(name):
mac = name
apple = pd.read_csv(mac)
apple['price'] = apple['Wholesale'] * grossmargin
while True:
try:
apple.fillna('N/A', inplace=True)
break
except ValueError:
print('Not Valid')
read_wholesalefile('Wholesalelist5182021.csv')
Well sorry guys, I figure it out by myself:
I was missing the scope, sorry again for the newb stuff. I just started coding in Python a few months ago(last December) and I'm learning in the process.
What worked for me was to add the scope Global, within the function, seriously I didn't know dataframes behaved as variables ... inside a function.
#My Modified code that works
import pandas as pd
grossmargin = 1.2
def read_wholesalefile(name):
global apple
mac = name
apple = pd.read_csv(mac)
apple['price'] = apple['Wholesale'] * grossmargin
while True:
try:
apple.fillna('N/A', inplace=True)
break
except ValueError:
print('Not Valid')
read_wholesalefile('Wholesalelist5182021.csv')

Is it possible to use Google's ortool in my script without downloaded ortool library?

Basically, I have to test my script on a server which I cant add new libraries. How do i write/change my
from ortools.algorithms import pywrapknapsack_solver
in my .py file such that I can still utilise Google's ortool when i submit onto a server without ortools installed? Is there something like html tag which i can just link to and use ortool library?
I have to sent my whole code.py file to test, and i can add along other .py files with my code.py.
I tried to download from Google the source code but i dont know how to get it to work.
Currently my code.py:
from __future__ import print_function
from ortools.algorithms import pywrapknapsack_solver
def getBestSet(W, packages):
final_arr = []
pID = ['1','2','3','4','5'] #sample data
values = [20,44,12,5,16]
weights = [10,11,21,3,9]
solver = pywrapknapsack_solver.KnapsackSolver(
pywrapknapsack_solver.KnapsackSolver.
KNAPSACK_MULTIDIMENSION_BRANCH_AND_BOUND_SOLVER, 'KnapsackExample')
solver.Init(values, [weights], [W])
computed_value = solver.Solve()
packed_items = []
packed_weights = []
total_weight = 0
# print('Total value =', computed_value)
for i in range(len(values)):
if solver.BestSolutionContains(i):
packed_items.append(i)
packed_weights.append(weights[i])
total_weight += weights[i]
# print('Total weight:', total_weight)
# print('Packed items:', packed_items)
# print('Packed_weights:', packed_weights)
for i in packed_items:
final_arr.append(pID[i])
return final_arr
You can try on Google Colab.
To install or-tools, in the first cell, run !pip install ortools
then put your code in a new cell below the first one.

google ads CRITERIA_PERFORMANCE_REPORT don't allow to remove first row

I have encounter a problem with the google ads report and I have no clue how to fix it... I use the following code to extract the data from google ads via API call
import sys
from googleads import adwords
import pandas as pd
import pandas as np
import io
output = io.StringIO()
def main(client):
# Initialize appropriate service.
report_downloader = client.GetReportDownloader(version='v201809')
# Create report query.
report_query = (adwords.ReportQueryBuilder()
.Select('CampaignId', 'AdGroupId', 'Id', 'Criteria',
'CriteriaType', 'FinalUrls', 'Impressions', 'Clicks',
'Cost')
.From('CRITERIA_PERFORMANCE_REPORT')
.Where('Status').In('ENABLED', 'PAUSED')
.During('LAST_7_DAYS')
.Build())
# You can provide a file object to write the output to. For this
# demonstration we use sys.stdout to write the report to the screen.
report_downloader.DownloadReportWithAwql(
report_query, 'CSV', output, skip_report_header=False,
skip_column_header=False, skip_report_summary=False,
include_zero_impressions=True)
output.seek(0)
df = pd.read_csv(output)
df = df.to_csv('results.csv')
if __name__ == '__main__':
# Initialize client object.
adwords_client = adwords.AdWordsClient.LoadFromStorage()
main(adwords_client)
the code works as expected and pulls the data and save it in a CSV file, however, when I access the columns it prints just one column 'CRITERIA_PERFORMANCE_REPORT (Nov 5, 2019-Nov 11, 2019)' when I open the csv file looks like this
result.csv
I have tried to drop the first row with df.drop(df.index[0]) to access the rest of the data however nothing seems to be working. is there any way I can remove the first row or change to use the second row as the columns names which is the result I expected.
thanks in advance
I'm able to eliminate the header there with the following download request:
report_downloader.DownloadReportWithAwql(
report_query, 'CSV', output, skip_report_header=True,
skip_column_header=False, skip_report_summary=True,
include_zero_impressions=True
)
I think if you include skip_report_header=True, skip_report_summary=True you'll get what you want.

Unable to see the created .xlsx file in the directory

I had generated a few values and had populated them into a spreadsheet using xlsxwriter. This is how I did it:
class main1():
.
.
.
.
def fun1():
workbook = xlsxwriter.Workbook(self.Output_fold+'Test'+time.strftime("%H_%M_%S_%d_%m_%Y")+'.xlsx')
worksheet_A = workbook.add_worksheet('Sheet_A')
.
.
worksheet_A.write(row,col,<val>)
.
.
workbook.close()
Now, since I had to make multiple writes, and added more complex logic, I decided to have another function introduced fun2 which would write the values accordingly. The new logic requires generating values in fun1 as well as fun2 (by calling another function fun3). So, I decided to replace variables workbook etc with self.workbook and likewise. My modified script looks like this :
main_file.py
import xlsxwriter
import libex
import os
import time
import sys
import string
class main_cls():
def __init__(self):
self.i=0
self.t1=""
self.t2=""
pwd=os.getcwd().split('\\')
base='\\'.join(pwd[0:len(pwd)-1])+'\\'
print base
self.Output_fold=base+"Output\\"
self.Input_fold=base+"Input\\"
self.workbook=xlsxwriter.Workbook(self.Output_fold+'Test_'+time.strftime("%H_%M_%S_%d_%m_%Y")+'.xlsx')
self.worksheet_A = self.workbook.add_worksheet('Sheet_A')
self.worksheet_A.write(self.i,self.i,"Text 1")
self.worksheet_A.write(self.i,self.i+1,"Text 2")
self.i+=1
def fun1(self):
self.t1="1"
self.t2="2"
self.worksheet_A.write(self.i,self.i,self.t1)
self.worksheet_A.write(self.i,self.i+1,self.t2)
self.i+=1
self.eg=libex.exlib()
self.t1=self.eg.gen(0)
self.t2=self.eg.gen(0)
self.fun2()
self.workbook.close()
def fun2(self):
if option==1:
self.fun3()
def fun3(self):
self.t1=self.eg.gen(0)
self.t2=self.eg.gen(1)
self.worksheet_A.write(self.i,self.i,self.t1)
self.worksheet_A.write(self.i,self.i+1,self.t2)
self.i+=1
option=int(sys.argv[1])
if len(sys.argv)==2:
p=main_cls()
if option==1:
p.fun1()
else:
pass
else:
print "Wrong command"
libex.py
class exlib():
def __init__(self):
self.a="duh"
def gen(self,x):
if int(x)==0:
return(self.a)
elif int(x)==1:
self.a=str(self.a+" "+self.a+" "+self.a+" !!!")
return(self.a)
Now, this works in this particular case but in the actual code, it doesn't. The file itself is not getting created in the output directory. Then, I added the following line:
print "Workbook created at path : ",self.workbook.filename
to see if the file is getting created and it surprisingly showed with full path!!!
Where could I be going wrong here and how can I get this fixed?
UPDATE1: I played around a bit with it and found that removing self from self.workbook moving workbook to __init__(self) creates the file with the initial values populated.
UPDATE2: Have replicated my code in a minimal way as suggested. And this one works pretty well!!!
Tried to reproduce, file is being created just fine, maybe you have a problem with the self.Output_fold variable, or with file permissions or with your code editor's file explorer.

Resources