Trying to get the day from a str 10/08/2020 - python-3.x

I need to get the day value from a str 10/08/2020.
I'm getting dates from a List containing a years worth of dates. I'm using the index num to do some date manipulation. First I need to get the day 08 from the date str.
Code segment:
print("Today is = ",re.sub('[^!-~]+',' ',calendarData[i]).strip())
print("indexTarget is = ",indexTarget)
dateTarget = re.sub('[^!-~]+',' ',calendarData[indexTarget]).strip()
print("Target date is = ",dateTarget)
dayTarget = datetime.strptime(dateTarget,"%d")
print("Day Target = ",dayTarget)
Console output:
Today is = 10/01/2020
indexTarget is = 281
Target date is = 10/08/2020
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\tkinter\__init__.py", line 1702, in __call__
return self.func(*args)
File "C:\Users\micha\source\repos\makeAReservation\makeAReservation\makeAReservation.py", line 183,
in actual_time
alarm(set_alarm_timer)
File "C:\Users\micha\source\repos\makeAReservation\makeAReservation\makeAReservation.py", line 173, in alarm
makeAReservation()
File "C:\Users\micha\source\repos\makeAReservation\makeAReservation\makeAReservation.py", line 62, in makeAReservation
getIndex4TagetDate()
File "C:\Users\micha\source\repos\makeAReservation\makeAReservation\makeAReservation.py", line 46, in getIndex4TagetDate
dayTarget = datetime.strptime(dateTarget,"%d")
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\_strptime.py", line 365, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: /08/2020

Your dateTarget is a string, however datetime.strprime is for creating date object from string. Since you already have a string, you can just slice it and then print.
print("Today is = ",re.sub('[^!-~]+',' ',calendarData[i]).strip())
print("indexTarget is = ",indexTarget)
dateTarget = re.sub('[^!-~]+',' ',calendarData[indexTarget]).strip()
print("Target date is = ",dateTarget)
dayTarget = dateTarget[:2]
print("Day Target = ",dayTarget)

Related

Python 3.10.6 and Camelot crashes trying to read a table in a PDF file

Using Python 3.10.6, I'm trying to read the tables in this PDF file, specifically on pages 24 and 26. I can read the tables on pages 21-23 and 25.
The only "commonality" I can find between those tables is: the unreadable tables' 2nd row is a single column, but the readable tables' 2nd row has multiple columns.
Here's the command I'm using:
tables = camelot.read_pdf( cFile
, pages=str(24)
, password=None
, flavor='lattice'
, flag_size=True
, strip_text='\n'
# , backend='poppler'
# , process_background=True
# , table_areas=['36,731,576,396']
# , table_regions=['36,731,576,396']
# , edge_tol=200
# , row_tol=10
# , line_scale=40
# , shift_text=['r', 'b']
# , copy_text=['h']
)
I've tried all those various arguments with no success.
Here's the error I'm getting:
Traceback (most recent call last):
File "G:\My Drive\Bugs\15888 - Convert PDF table data to Oracle data\Test Table 9.py", line 58, in <module>
tables = camelot.read_pdf( cFile
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\camelot\io.py", line 113, in read_pdf
tables = p.parse(
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\camelot\handlers.py", line 172, in parse
self._save_page(self.filepath, p, tempdir)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\camelot\handlers.py", line 120, in _save_page
outfile.write(f)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_writer.py", line 839, in write
self.write_stream(stream)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_writer.py", line 812, in write_stream
self._sweep_indirect_references(self._root)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_writer.py", line 961, in _sweep_indirect_references
data = self._resolve_indirect_object(data)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_writer.py", line 1006, in _resolve_indirect_object
real_obj = data.pdf.get_object(data)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_reader.py", line 1160, in get_object
retval = self._encryption.decrypt_object(
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_encryption.py", line 744, in decrypt_object
return cf.decrypt_object(obj)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_encryption.py", line 185, in decrypt_object
obj[dictkey] = self.decrypt_object(value)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_encryption.py", line 179, in decrypt_object
data = self.strCrypt.decrypt(obj.original_bytes)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\PyPDF2\_encryption.py", line 87, in decrypt
d = aes.decrypt(data)
File "C:\Users\56663\AppData\Roaming\Python\Python310\site-packages\Crypto\Cipher\_mode_cbc.py", line 246, in decrypt
raise ValueError("Data must be padded to %d byte boundary in CBC mode" % self.block_size)
ValueError: Data must be padded to 16 byte boundary in CBC mode
I'm 99% positive the PDF is not encrypted.
Thanks for any suggestions.
edit: The PDF was created using Word in Microsoft Office Professional Plus 2019.

How do I find the specific document insert_many() fails on?

if (constant.gc in file.sheet_names):
coll = db[constant.gc]
print("Adding to " + constant.gc + " database")
df = file.parse(constant.gc)
df = clean(df)
data_dict = df.to_dict('r')
try:
result = coll.insert_many(data_dict)
nr_inserts = len(result.inserted_ids)
print(str(nr_inserts) + "Cases added to database")
except pymongo.errors.BulkWriteError as bwe:
nr_inserts = bwe.details["nInserted"]
print(nr_inserts)
I keep getting a NaTType error and I can't find which row of the dataframe has the blank date. Unfortunately, it's off of a 39k row Excel file. So, just looking through isn't going to help. I tried an except that theoretically could tell me how many were successfully inserted until the error, and therefore give me a hint on where to look, but it hasn't printed.
The error looks like this:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python38\lib\tkinter\__init__.py", line 1883, in __call__
return self.func(*args)
File "dataimport.py", line 71, in importFromExcel
result = coll.insert_many(data_dict)
File "C:\Python38\lib\site-packages\pymongo\collection.py", line 758, in insert_many
blk.execute(write_concern, session=session)
File "C:\Python38\lib\site-packages\pymongo\bulk.py", line 511, in execute
return self.execute_command(generator, write_concern, session)
File "C:\Python38\lib\site-packages\pymongo\bulk.py", line 345, in execute_command
client._retry_with_session(
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1384, in _retry_with_session
return func(session, sock_info, retryable)
File "C:\Python38\lib\site-packages\pymongo\bulk.py", line 339, in retryable_bulk
self._execute_command(
File "C:\Python38\lib\site-packages\pymongo\bulk.py", line 295, in _execute_command
result, to_send = bwc.execute(ops, client)
File "C:\Python38\lib\site-packages\pymongo\message.py", line 898, in execute
request_id, msg, to_send = self._batch_command(docs)
File "C:\Python38\lib\site-packages\pymongo\message.py", line 890, in _batch_command
request_id, msg, to_send = _do_bulk_write_command(
File "C:\Python38\lib\site-packages\pymongo\message.py", line 1382, in _do_bulk_write_command
return _do_batched_op_msg(
File "C:\Python38\lib\site-packages\pymongo\message.py", line 1307, in _do_batched_op_msg
return _batched_op_msg(
File "pandas\_libs\tslibs\nattype.pyx", line 64, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support utcoffset
At a guess, ValueError and BulkWriteError are not the same, so nInserted never prints. Does anyone have an idea of how to get the number of successful inserts before the failure?
I doubt that any inserts are performed as the error is likely occurring before the data is passed to mongodb to insert.
In any case, if you want to hunt down which row in the dataframe has the NaT value, try: (subsitute 'date' for your column containing the date)
null_df = df[pd.isnull(df['date'])]
print(null_df)
To remove null dated items use:
df = df[pd.notnull(df['date'])]

ignore missing files in loop - data did not show up

I have thousands of files as you can see the year range below. Some of the dates of the files are missing so I want to skip over them. But when I tried the method below, and calling data_in, the variable doesn't exist. Any help would be truly appreciated. I am new to python. Thank you.
path = r'file path here'
DataYears = ['2012','2013','2014', '2015','2016','2017','2018','2019', '2020']
Years = np.float64(DataYears)
NumOfYr = Years.size
DataMonths = ['01','02','03','04','05','06','07','08','09','10','11','12']
daysofmonth=[31,28,31,30,31,30,31,31,30,31,30,31]
for yy in range(NumOfYr):
for mm in range (12):
try:
data_in = pd.read_csv(path+DataYears[yy]+DataMonths[mm]+'/*.dat', skiprows=4, header=None, engine='python')
print('Reached data_in') # EDIT
a=data_in[0] #EDIT
except IOError:
pass
#print("File not accessible")
EDIT: Error added
Traceback (most recent call last):
File "Directory/Documents/test.py", line 23, in <module>
data_in = pd.read_csv(path+'.'+DataYears[yy]+DataMonths[mm]+'/*.cod', skiprows=4, header=None, engine='python')
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
self._engine = klass(self.f, **self.options)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2269, in __init__
memory_map=self.memory_map,
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/common.py", line 431, in get_handle
f = open(path_or_buf, mode, errors="replace", newline="")
FileNotFoundError: [Errno 2] No such file or directory: 'Directory/Documents/201201/*.dat'
You can adapt the code below to get a list of your date folders:
import glob
# Gives you a list of your folders with the different dates
folder_names = glob.glob("Directory/Documents/")
print(folder_names)
Then with the list of folder, you can iterate through there contents. If you just want a list of all .dat files can do something like:
import glob
# Gives you a list of your folders with the different dates
file_names = glob.glob("Directory/Documents/*/*.dat")
print(file_names)
The code above searches the contents of your directories so you bypass your problem with missing dates. The prints are there so you can see the results of glob.glob().

I have a problem with writing to my excel file

I am getting error message when im trying to write to my excel file.
On this line: df_Percent_Change.to_excel(writer, sheet_name=x, startcol=8)
Traceback (most recent call last):
File "/Users/david.soderstrom/Dropbox (Diagona)/DS/Python/Byggmarknad_index/Byggmarknad_index.py", line 125, in
df_Percent_Change.to_excel(writer, sheet_name=x, startcol=8)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 1766, in to_excel
engine=engine)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/pandas/io/formats/excel.py", line 652, in write
freeze_panes=freeze_panes)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 1742, in write_cells
wks = self.book.add_worksheet(sheet_name)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 179, in add_worksheet
return self._add_sheet(name, worksheet_class=worksheet_class)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 666, in _add_sheet
name = self._check_sheetname(name, isinstance(worksheet, Chartsheet))
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 717, in _check_sheetname
if len(sheetname) > 31:
TypeError: object of type 'int' has no len()
Exception ignored in: >
Traceback (most recent call last):
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 154, in del
Exception: Exception caught in workbook destructor. Explicit close() may be required for workbook.
# Calculate and save the percent change for each asset
if 'Percent_Change' not in excel_db.columns:
print('Percent_Change does not exist in excel file.')
print('Calculating...')
# Loop through and read each sheet
x = 0
for x in range(countSheets):
# Read in data for the calculation
data = pd.read_excel('databas.xlsx', sheet_name=x, index_col='Date')
# Calculate the percent change from day to day
Percent_Change = data['Adj Close'].pct_change()*100
print(type(Percent_Change))
df_Percent_Change = pd.DataFrame(Percent_Change)
print(type(df_Percent_Change))
writer = pd.ExcelWriter('databas.xlsx', engine='xlsxwriter')
df_Percent_Change.to_excel(writer, sheet_name=x, startcol=8)
# Save the result
writer.save()
writer.close()
x += 1

Uppercases convert to lowercase when loading a file with h5py

Hello I can't load a hdf5 file with h5py:
$ python verif.py
Traceback (most recent call last):
File "verif.py", line 4, in <module>
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/'+'LowRes_13434_overlapping_pairs.h5",'r')
File "/home/jeanpat/VirtualEnv/venv3/lib/python3.5/site-packages/h5py/_hl/files.py", line 272, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/home/jeanpat/VirtualEnv/venv3/lib/python3.5/site-packages/h5py/_hl/files.py", line 92, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-at6d2npe-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-at6d2npe-build/h5py/_objects.c:2642)
File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-at6d2npe-build/h5py/h5f.c:1930)
OSError: Unable to open file (Unable to open file: name = '../deepfish-github_projects/deepfish/dataset/'+'lowres_13434_overlapping_pairs.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0
The string containing the path to the file:
../DeepFISH-Github_projects/DeepFISH/dataset'+'LowRes_13434_overlapping_pairs.h5
seems to be modified by h5py
../deepfish-github_projects/deepfish/dataset/lowres_13434_overlapping_pairs.h5
I could modify the directory name, but it's weird.
In this line
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/'+'LowRes_13434_overlapping_pairs.h5",'r')
you're trying to open a file with a literal '+' in its name. The outer quotes are double quotes, so the single quotes within the string are just part of the name. What you probably wanted to use is:
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/" + "LowRes_13434_overlapping_pairs.h5",'r')
I don't know why the error message is all lower case, maybe the library tries to find the file in a case insensitive way if it doesn't find it by the original name, or the underlying file system is case insensitive and this is just how the OS reports the missing file error.

Resources