I try to read all excel files in a directory to merge them into one df, which seems to work fine (the dataframe is correctly created)
How ever it seems that python is trying perform the loop a second time after all the files have been read, but in different location where the excel files do not exist and I get a traceback for 'file not found' which is obvious cause the files are not saved in this location.
It jumps from 'C:/users/folder/folder' to 'C:/users' and tries to loop again over all files, any idea why this happens?
The code I am using is the following:
import pandas as pd
import xlrd
import os
path = os.getcwd()
files = os.listdir(path)
files_xlsx = [f for f in files if f[-4:] == 'xlsx']
df = pd.DataFrame()
for f in files_xlsx:
data = pd.read_excel(f)
df = df.append(data)
print(df)
Here is the full traceback I get:
C:\ProgramData\Anaconda3\envs\pythonProject\python.exe "C:\Program Files (x86)\JetBrains\PyCharm 2020.2.1\plugins\python\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 58606 --file "C:/Users/Home/OneDrive/Ops/Media Management/Media Kitchen/Rezepte aktuell vom 10.05.2021/Rezepte Alphabetisch xlsx/XLSX merg.py"
pydev debugger: process 22124 is connecting
Connected to pydev debugger (build 202.6948.78)
C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\openpyxl\worksheet\header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
warn("""Cannot parse header or footer so it will be ignored""")
...
C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\openpyxl\worksheet\header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
warn("""Cannot parse header or footer so it will be ignored""")
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2020.2.1\plugins\python\helpers\pydev\pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 2020.2.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Home/OneDrive/Ops/Media Management/Media Kitchen/Rezepte aktuell vom 10.05.2021/Rezepte Alphabetisch xlsx/XLSX merg.py", line 46, in <module>
data = pd.read_excel(f)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\pandas\io\excel\_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\pandas\io\excel\_base.py", line 1071, in __init__
ext = inspect_excel_format(
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\pandas\io\excel\_base.py", line 949, in inspect_excel_format
with get_handle(
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\pandas\io\common.py", line 651, in get_handle
handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\$ACHR3-00.00.xlsx'
python-BaseException
Related
I work with SQL Server 2019 and Python 3.10.
When I try to read an Excel file with OPENROWSET using this statement:
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml;Database=\\192.168.7.9\\Import\6\strtinsertinput (4)-953aee07-ca14-4213-a91e-ab0b0f7f3db2.xlsx;HDR=YES','select * FROM [Sheet1$]')
It reads an Excel file successfully,
But when I try to read it using Python from SQL query
EXECUTE sp_execute_external_script
#language = N'Python',
#script = N'import pandas as pd
df = pd.read_excel("\\192.168.7.9\\Import\6\strtinsertinput (4)-953aee07-ca14-4213-a91e-ab0b0f7f3db2.xlsx", sheet_name = "Sheet1")';
GO
I get this error:
Error in execution. Check the output for more information.
Traceback (most recent call last):
File "", line 5, in
File "C:\ProgramData\MSSQLSERVER\Temp-PY\Appcontainer1\46CB4A4F-004A-4329-A390-FEF283444F33\sqlindb_0.py", line 31, in transform
df = pd.read_excel("\192.168.7.9\Import\6\strtinsertinput (4)-953aee07-ca14-4213-a91e-ab0b0f7f3db2.xlsx", sheet_name = "Sheet1")
File "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\util_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\util_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\io\excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
Msg 39019, Level 16, State 2, Line 0
An external script error occurred:
File "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\io\excel.py", line 394, in init
self.book = xlrd.open_workbook(self.io)
File "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\xlrd_init.py", line 111, in open_workbook
with open(filename, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '\192.168.7.9\Import\x06\strtinsertinput (4)-953aee07-ca14-4213-a91e-ab0b0f7f3db2.xlsx'
How to solve this issue?
Updated Post : I spent more than 6 month to check this issue python reading and write locally good
but my question
Are python allow remote reading and write on remote server shared path
I need answer if possible to this question
I recommend changing pandas to openpyxl for reading operations.
df = pd.read_excel(file_path, engine='openpyxl')
import pandas as pd
data = pd.read_excel (r'C:\Users\royli\Downloads\Product List.xlsx',sheet_name='Sheet1' )
df = pd.DataFrame(data, columns= ['Product'])
print (df)
Error Message
Traceback (most recent call last):
File "main.py", line 3, in <module>
Traceback (most recent call last):
File "main.py", line 3, in <module>
data = pd.read_excel (r'C:\Users\royli\Downloads\Product List.xlsx',sheet_name='Sheet1' )
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/excel/_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/excel/_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/xlrd/__init__.py", line 111, in open_workbook
with open(filename, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\royli\\Downloads\\Product List.xlsx'
KeyboardInterrupt
Generally when I get that problem am gonna change \ symbols to \ \ symbols and generally its solved. Try it.
I had this problem in Visual Studio Code.
table = pd.read_excel('Sales.xlsx')
When running the program on Pycharm, there were no errors.
When trying to run the same program in Visual Studio Code, it showed an error, without any changes.
To fix it, I had to address the file with //. Ex:
table = pd.read_excel('C:\\Users\\paste\\Desktop\\archives\\Sales.xlsx')
I am using Pycharm and after reviewing the Post and replies, I was able to get this resolved (thanks very much). I didn't need to specify a worksheet, as there is only one sheet on the Excel file I am reading.
I had to add the r (raw string), and I also removed the drive specification c:
data = pd.read_excel(r'\folder\subfolder\filename.xlsx')
I created a script for mailing images to my parents. But while attaching files, I'm getting a Traceback for a KeyError, following attached is the source code
import smtplib
import os
import imghdr
from email.message import EmailMessage
username = os.environ.get('EMAIL_USER')
password = os.environ.get('EMAIL_PASS')
boy = EmailMessage()
boy['Subject'] = 'Check this pussy'
boy['From'] = username
boy['To'] = 'kryptonite#pm.me'
boy.set_content('image attached')
with open('pussy.jpg', 'rb') as f:
file_data = f.read
file_type = imghdr.what(f.name)
file_name = f.name
# print(file_type)
boy.add_attachment(file_data, maintype='image', subtype=file_type, filename=file_name)
with smtplib.SMTP_SSL('smtp.gmail.com', 465) as app:
app.login(username, password)
app.send_message(boy)
For this code I'm getting the following traceback
[Running] python -u "s:\DEV_ENV\python_playground\email_test.py"
Traceback (most recent call last):
File "s:\DEV_ENV\python_playground\email_test.py", line 32, in <module>
boy.add_attachment(file_data)
File "C:\Program Files\Python37\lib\email\message.py", line 1156, in add_attachment
self._add_multipart('mixed', *args, _disp='attachment', **kw)
File "C:\Program Files\Python37\lib\email\message.py", line 1144, in _add_multipart
part.set_content(*args, **kw)
File "C:\Program Files\Python37\lib\email\message.py", line 1171, in set_content
super().set_content(*args, **kw)
File "C:\Program Files\Python37\lib\email\message.py", line 1101, in set_content
content_manager.set_content(self, *args, **kw)
File "C:\Program Files\Python37\lib\email\contentmanager.py", line 35, in set_content
handler = self._find_set_handler(msg, obj)
File "C:\Program Files\Python37\lib\email\contentmanager.py", line 58, in _find_set_handler
raise KeyError(full_path_for_error)
KeyError: 'builtins.builtin_function_or_method'
[Done] exited with code=1 in 0.159 seconds
Is there anything wrong with the code?
It turns out this might be an error with the imghdr library. See link imghdr / python - Can't detec type of some images (image extension). The .what method only works 80% of the time, according to the poster.
Monkeypatch included in this answer in the same question: https://stackoverflow.com/a/57693121/5660315.
I have thousands of files as you can see the year range below. Some of the dates of the files are missing so I want to skip over them. But when I tried the method below, and calling data_in, the variable doesn't exist. Any help would be truly appreciated. I am new to python. Thank you.
path = r'file path here'
DataYears = ['2012','2013','2014', '2015','2016','2017','2018','2019', '2020']
Years = np.float64(DataYears)
NumOfYr = Years.size
DataMonths = ['01','02','03','04','05','06','07','08','09','10','11','12']
daysofmonth=[31,28,31,30,31,30,31,31,30,31,30,31]
for yy in range(NumOfYr):
for mm in range (12):
try:
data_in = pd.read_csv(path+DataYears[yy]+DataMonths[mm]+'/*.dat', skiprows=4, header=None, engine='python')
print('Reached data_in') # EDIT
a=data_in[0] #EDIT
except IOError:
pass
#print("File not accessible")
EDIT: Error added
Traceback (most recent call last):
File "Directory/Documents/test.py", line 23, in <module>
data_in = pd.read_csv(path+'.'+DataYears[yy]+DataMonths[mm]+'/*.cod', skiprows=4, header=None, engine='python')
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
self._engine = klass(self.f, **self.options)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2269, in __init__
memory_map=self.memory_map,
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/common.py", line 431, in get_handle
f = open(path_or_buf, mode, errors="replace", newline="")
FileNotFoundError: [Errno 2] No such file or directory: 'Directory/Documents/201201/*.dat'
You can adapt the code below to get a list of your date folders:
import glob
# Gives you a list of your folders with the different dates
folder_names = glob.glob("Directory/Documents/")
print(folder_names)
Then with the list of folder, you can iterate through there contents. If you just want a list of all .dat files can do something like:
import glob
# Gives you a list of your folders with the different dates
file_names = glob.glob("Directory/Documents/*/*.dat")
print(file_names)
The code above searches the contents of your directories so you bypass your problem with missing dates. The prints are there so you can see the results of glob.glob().
I am trying to just read in the data from Apache tika library to parse the pdf files. I installed it through pip install tika using python 3.
Code:
from tika import parser
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
or
from tika import parser
parsedPDF = parser.from_file("test.pdf")
Error:
Traceback (most recent call last):
File "tikaparsing-test.py", line 2, in <module>
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\parser.py", line 36, in from_file
jsonOutput = parse1('all', filename, serverEndpoint, headers=headers)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 316, in parse1
headers, verbose, tikaServerJar, rawResponse=rawResponse)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 510, in callServer
serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 565, in checkTikaServer
startServer(jarPath, serverHost, port, classpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 609, in startServer
cmd = Popen(cmd , stdout= logFile, stderr = STDOUT, shell =True)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
PermissionError: [WinError 5] Access is denied