I wrote a .py script called Expiration_Report.py using the following libraries: pandas, numpy. This code runs perfectly fine when executed in Spyder(python 3.6).
(Using Anaconda for everything)
I then created another .py file called 'setup.py' with the following code in order to convert Expiration_Report.py to Expiration_Report.exe:
import sys
from cx_Freeze import setup, Executable
# Dependencies are automatically detected, but it might need fine tuning.
build_exe_options = {"packages": ["os"],
"excludes": ["tkinter"]}
# GUI applications require a different base on Windows (the default is for a
# console application).
base = None
if sys.platform == "win32":
base = "console"
setup( name = "my prog",
version = "1.0",
description = "My application!",
options = {"build_exe": build_exe_options},
executables = [Executable("Expiration_Report.py", base = base)])
Then in the command prompt I write:
python setup.py build
It builds without any errors. And the build folder is available with the .exe file as well. However, when I run the .exe file from the build folder: nothing happens.
Here is the code from the Expiration_Report.py script:
import pandas as pd
import numpy as np
df = pd.read_excel('C:/Users/Salman/Desktop/WIP Board - 007.xlsx', index_col=None, na_values=['NA'])
df.columns = df.iloc[12]
df.columns
df.shape
df = df.dropna(axis=1, how = 'all')
df
df.columns
df1 = df.copy()
df1 = df1.iloc[13:]
df1
df1 = df1.dropna(axis=1, how = 'all')
df1.shape
from datetime import datetime
print(str(datetime.now()))
df2 = df1.copy()
df2["Now_Time"] = pd.Series([datetime.now()] * (13+len(df1)))
df2["Now_Time"]
df2
df2.fillna(value='NaN')
df2 = df2.dropna(how='any')
df2.shape
df3 = df2.copy()
df3 = df3[df3.Size>0]
df3['Lot Expiration Date'] = pd.to_datetime(df3['Lot Expiration Date'])
df3['Days_Countdown'] = df3[['Lot Expiration Date']].sub(df3['Now_Time'], axis = 0 )
df3.dtypes
df3['Hours_Countdown'] = df3['Days_Countdown'] / np.timedelta64(1, 'h')
df3 = df3.sort_values('Hours_Countdown')
df_expiration = df3[df3.Hours_Countdown<12]
df_expiration['Hours_Countdown'].astype(int)
df_expiration
df_expiration.to_excel('C:/Users/Salman/Desktop/WIP Board - 000.xlsx', sheet_name = 'Sheet1')
The method for creating an exe file from cs_Freeze is correct. Because I converted a simple script HelloWorld.py to exe and it worked fine. It is not importing the pandas library and just exits the exe.
Maybe you need to add pandas and numpy to the packages list. cx_freeze can be a bit dodgy when it comes to finding all the necessary packages.
build_exe_options = {"packages": ["os", "numpy", "pandas"],
"excludes": ["tkinter"]}
It seems that this (including the packages in the setu.py file) doesn'work for CX_freeze 5 and 6 ( as I understand it, these are the latest versions).
I had the same problem and whatever advice I followed here, including adding the packages. It is numpy that seems to cause the trouble.
You can test this by putting import numpy in your very simple testscript, and see it crash when you freeze it.
The solution for me worked for python 3.4, but I doubt that it works under python 3.6:
I uninstalled cx_freeze and reinstalled cx_freeze 4.3.4 via pip install cx_Freeze==4.3.4 and then it worked.
pd.read_csv works but pd.read_excel does not, once the application is cx_freezed.
Pandas read_excel requires an import called xlrd. If you don't have it, you need to install it and pd.read_excel works after.
Related
I have downloaded a Zip file containing files(.dbf, .shp) from the website using python.
from zipfile import ZipFile
with ZipFile('.\ZipDataset\ABC.zip','r') as zip_object:
print(zip_object.namelist())
zip_object.extract('A.dbf')
zip_object.extract('B.shp')
My question is how to convert the above extension file to excel spread sheet using python?
You need an additional libraries like pandas , pyshp or geopandas
According to this link the easiest way is :
You need to install 2 libraries
pip install pandas
pip install pyshp
and then run this code :
import shapefile
import pandas as pd
#read file, parse out the records and shapes
sf = shapefile.Reader('.\ZipDataset\ABC.zip')
fields = [x[0] for x in sf.fields][1:]
records = sf.records()
shps = [s.points for s in sf.shapes()]
#write into a dataframe
df = pd.DataFrame(columns=fields, data=records)
df = df.assign(coords=shps)
df.to_csv('output.csv', index=False)
I am trying to extract a geographic coordinates in UTM format from a .pdf file with python3 in Ubuntu operative system, with the follow code:
from pathlib import Path
import textract
import numpy as np
import re
import os
import pdfminer
def main(_file):
try:
text = textract.process(_file, method="pdfminer")
except textract.exceptions.ShellError as ex:
print(ex)
return
with open("%s.csv" % Path(_file).name[: -len(Path(_file).suffix)],
"w+") as _file:
# find orders and DNIs
coords = re.compile(r"\d?\.?\d+\.+\d+\,\d{2}")
results = re.findall(coords, text.decode())
if results:
_file.write("|".join(results))
if __name__ == "__main__":
_file = "/home/cristian33/python_proj/folder1/buscarco.pdf"
main(_file)
when I run it give me the follow error:
The command pdf2txt.py /home/cristian33/python_proj/folder1/buscarco.pdf failed because the executable
pdf2txt.py is not installed on your system. Please make
sure the appropriate dependencies are installed before using
textract:
http://textract.readthedocs.org/en/latest/installation.html
somebody knows why is that error?
thanks
I have the following python script example.py, which I converted to a Windows Executable.
import pandas as pd
import pickle
def example(df1, df2):
print('Started the executable with example')
df1 = pickle.loads(df1)
df2 = pickle.loads(df2)
print('df1 has {df1.shape[0]} rows')
print('df2 has {df2.shape[0]} rows')
return pickle.dumps(pd.concat([df1,df2]))
if __name__=="__main__":
example(sys.argv[1],sys.argv[2])
Then I use PyInstaller to create an executable named example.exe using pyinstaller example.py -F
Next, I have create random Pandas DataFrames in Python, let's call them df1 and df2.
Now, I would like to use the subprocess module in the main Python script main.py to call this executable and get the results. This is the part with which I need help. Following is the code I wrote, but obviously isn't working.
import subprocess
import pickle
df_dump1 = pickle.dumps(df1)
df_dump2 = pickle.dumps(df2
command = ['./example.exe',df_dump1, df_dump2]
result = subprocess.run(command,
input,=df_dump,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
shell = True)
print(result.returncode, result.stdout, result.stderr)
The error message I get is this:
TypeError: a bytes-like object is required, not \'str\
Clearly, I am not able to send multiple Pandas dataframes (or even one) to the executable. Any ideas about how to achieve this?
I am really new to Python programming. I have a dataframe pandasql query which runs fine when I run my code with the standard Python3 implementation. However, after cythonizing it, I always get the following exception:
sqlite3.OperationalError: no such table: dataf
Following is the snippet from the processor.pyx file
import pandas as pd
from pandasql import sqldf
def process_date(json):
#json has the properties format [{"x": "1", "y": "2", "z": "3"}]
dataf = pd.read_json(json, orient='records')
sql = """select x, y, z from dataf;"""
result = sqldf(sql)
Could cythonizing the code make it behave differently? This exact same code runs
fine with the standard python3 implementation.
Following is the setup.py I have written to transpile the code to c.
# several files with ext .pyx, that i will call by their name
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules=[
Extension("c_processor", ["processor.pyx"])]
setup(
name = 'RTP_Cython',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules,
)
I also tried to use Numba and got the same error. Code below:
import pandas as pd
from pandasql import sqldf
from numba import jit
from numpy import arange
#jit
def process_data():
print("In process data")
json = "[{\"id\": 2, \"name\": \"zain\"}]"
df = pd.read_json(json, orient='records')
sql = "select id, name from df;"
df = sqldf(sql)
print("the df is %s" % df)
process_data()
If I comment out #jit annotation, the code works fine.
Should I be using another extension of the panda libraries that inter operate with C, since both Numba and Cython give me the same error?
I hope there is an easy solution to this.
We are in the transition at work from python 2.7 to python 3.5. It's a company wide change and most of our current scripts were written in 2.7 and no additional libraries. I've taken advantage of the Anaconda distro we are using and have already change most of our scripts over using the 2to3 module or completely rewriting them. I am stuck on one piece of code though, which I did not write and the original author is not here. He also did not supply comments so I can only guess at the whole of the script. 95% of the script works correctly until the end where after it creates 7 csv files with different parsed information it has a custom function to combine the csv files into and xls workbook with each csv as new tab.
import csv
import xlwt
import glob
import openpyxl
from openpyxl import Workbook
Parsefiles = glob.glob(directory + '/' + "Parsed*.csv")
def xlsmaker():
for f in Parsefiles:
(path, name) = os.path.split(f)
(chort_name, extension) = os.path.splittext(name)
ws = wb.add_sheet(short_name)
xreader = csv.reader(open(f, 'rb'))
newdata = [line for line in xreader]
for rowx, row in enumerate(newdata)
for colx, value in enumerate(row):
if value.isdigit():
ws.write(rowx, colx, value)
xlsmaker()
for f in Parsefiles:
os.remove(f)
wb.save(directory + '/' + "Finished" + '' + oshort + '' + timestr + ".xls")
This was written all in python 2.7 and still works correctly if I run it in python 2.7. The issue is that it throws an error when running in python 3.5.
File "parsetool.py", line 521, in (module)
xlsmaker()
File "parsetool.py", line 511, in xlsmaker
ws = wb.add_sheet(short_name)
File "c:\pythonscripts\workbook.py", line 168 in add_sheet
raise TypeError("The paramete you have given is not of the type '%s'"% self._worksheet_class.__name__)
TypeError: The parameter you have given is not of the type "Worksheet"
Any ideas about what should be done to fix the above error? Iv'e tried multiple rewrites, but I get similar errors or new errors. I'm considering just figuring our a whole new method to create the xls, possibly pandas instead.
Not sure why it errs. It is worth the effort to rewrite the code and use pandas instead. Pandas can read each csv file into a separate dataframe and save all dataframes as a separate sheet in an xls(x) file. This can be done by using the ExcelWriter of pandas. E.g.
import pandas as pd
writer = pd.ExcelWriter('yourfile.xlsx', engine='xlsxwriter')
df = pd.read_csv('originalfile.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
Since you have multiple csv files, you would probably want to read all csv files and store them as a df in a dict. Then write each df to Excel with a new sheet name.
Multi-csv Example:
import pandas as pd
import sys
import os
writer = pd.ExcelWriter('default.xlsx') # Arbitrary output name
for csvfilename in sys.argv[1:]:
df = pd.read_csv(csvfilename)
df.to_excel(writer,sheet_name=os.path.splitext(csvfilename)[0])
writer.save()
(Note that it may be necessary to pip install openpyxl to resolve errors with xlsxwriter import missing.)
You can use the code below, to read multiple .csv files into one big .xlsx Excel file.
I also added the code for replacing ',' by '.' (or vice versa) for improved compatibility on windows environments and according to your locale settings.
import pandas as pd
import sys
import os
import glob
from pathlib import Path
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
writer = pd.ExcelWriter('fc15.xlsx') # Arbitrary output name
for csvfilename in all_filenames:
txt = Path(csvfilename).read_text()
txt = txt.replace(',', '.')
text_file = open(csvfilename, "w")
text_file.write(txt)
text_file.close()
print("Loading "+ csvfilename)
df= pd.read_csv(csvfilename,sep=';', encoding='utf-8')
df.to_excel(writer,sheet_name=os.path.splitext(csvfilename)[0])
print("done")
writer.save()
print("task completed")
Here's a slight extension to the accepted answer. Pandas 1.5 complains about the call to writer.save(). The fix is to use the writer as a context manager.
import sys
from pathlib import Path
import pandas as pd
with pd.ExcelWriter("default.xlsx") as writer:
for csvfilename in sys.argv[1:]:
p = Path(csvfilename)
sheet_name = p.stem[:31]
df = pd.read_csv(p)
df.to_excel(writer, sheet_name=sheet_name)
This version also trims the sheet name down to fit in Excel's maximum sheet name length, which is 31 characters.
If your csv file is in Chinese with gbk encoding, you can use the following code
import pandas as pd
import glob
import datetime
from pathlib import Path
now = datetime.datetime.now()
extension = "csv"
all_filenames = [i for i in glob.glob(f"*.{extension}")]
with pd.ExcelWriter(f"{now:%Y%m%d}.xlsx") as writer:
for csvfilename in all_filenames:
print("Loading " + csvfilename)
df = pd.read_csv(csvfilename, encoding="gb18030")
df.to_excel(writer, index=False, sheet_name=Path(csvfilename).stem)
print("done")
print("task completed")