wkb: could not create geometry because of errors while reading input - python-3.x

I am trying to translate ewkb coordinates into the associated longitude and latitude on Python. The ewkb strings are listed in a one-column csv file (named "/home/nick/Documents/Sepi/WKB_coordinates_sing.csv").
I deleted the other columns for the sake of simplicity, but eventually I would like to use the original data set and read just the right column with ewkb.
Moreover, I would like to read and translate one line at a time, because I have files with millions of lines and coordinates to process.
I wrote the following code:
from shapely import wkb
with open ("/home/nick/Documents/Sepi/WKB_coordinates_sing.csv") as f:
for line in f:
hexloc=f.readline()
print(hexloc)
point=wkb.loads(hexloc,hex=True)
print(point.x,point.y)
However, when I run it, I get the following:
~$ python /home/nick/Documents/Sepi/ewkb.py
0101000020E610000072604C0D47AA37402C306475ABA85140
ParseException: Premature end of HEX string
Traceback (most recent call last):
File "/home/nick/Documents/Sepi/ewkb.py", line 7, in <module>
point=wkb.loads(hexloc,hex=True)
File "/home/nick/anaconda3/lib/python3.6/site-packages/shapely/wkb.py", line 14, in loads
return reader.read_hex(data)
File "/home/nick/anaconda3/lib/python3.6/site-packages/shapely/geos.py", line 409, in read_hex
"Could not create geometry because of errors "
shapely.errors.WKBReadingError: Could not create geometry because of errors while reading input.
However, I can obtain longitude and latitude if I run the following code with the first hexadecimal string from my csv file as argument of wkb.loads:
Code:
from shapely import wkb
hexloc="0101000020E610000072604C0D47AA37402C306475ABA85140"
print(hexloc)
point=wkb.loads(hexloc,hex=True)
print(point.x,point.y)
Result:
~$ python /home/nick/Documents/Sepi/ewkb.py
0101000020E610000072604C0D47AA37402C306475ABA85140
23.665146666666665 70.63546500000001
Thank you in advance!

There seem to be several possible issues. First, your code snippet is mixing iteration and direct "read" methods. With this example:
with open ("/home/nick/Documents/Sepi/WKB_coordinates_sing.csv") as f:
for line in f:
hexloc=f.readline()
#do something with hexloc
hexloc will effectively iterate only over every second line in the input file. You might want to replace this with:
with open ("/home/nick/Documents/Sepi/WKB_coordinates_sing.csv") as f:
for hexloc in f:
#do something with hexloc
Moreover, when you read the input lines like this, they retain the trailing newline which confuses the loads method. I would suggest to try:
with open ("/home/nick/Documents/Sepi/WKB_coordinates_sing.csv") as f:
for line in f:
hexloc = line.strip()
point = wkb.loads(hexloc, hex=True)

Related

PyPDF2 - Byte Data vs Binary Data - TypeError

I am trying to get print one page of a PDF to a new PDF document. I am using the following code:
from PyPDF2 import PdfFileReader, PdfFileWriter
file_path = "/file_path/.pdf"
input_pdf = PdfFileReader(file_path)
output_file = PdfFileWriter()
cover_page = input_pdf.getPage(0)
output_file.addPage(cover_page)
with open("portion.pdf", "wb") as output_file:
output_file.write(output_file)
When I run this code I get the following error:
Traceback (most recent call last):
File ".../Extract a portion of PDF.py", line 18, in <module>
output_file.write(output_file)
TypeError: a bytes-like object is required, not '_io.BufferedWriter'
I have specified that the output needs to write binary, so why is it saying that I must use byte-like objects?
Cheers,
In the with statement, you named the opened file output_file. This essentially reassigned output_file from a PdfFileWriter() to the file stream you just opened. When you tried to do output_file.write(output_file), that's basically trying to write the file stream object itself into the file stream, which makes no sense and causes the TypeError.
To fix this, simply rename the variable you used in the with statement:
with open("portion.pdf", "wb") as output_file_stream:
output_file.write(output_file_stream)
Alternatively, you can also rename the PdfFileWriter() to output_pdf instead of output_file and change the with statement to something like:
with open("portion.pdf", "wb") as output_file:
output_pdf.write(output_file)
which might make more sense.

Running a function on multiple files simultaneously with python

i have a specific function that manipulates text files via input of directory and file name.
The defined function is as below
def nav2xy(target_directory, target_file):
after_rows = f'MOD {target_file}_alines.txt'
after_columns = f'MOD {target_file}_acolumns.txt'
# this segment is used to remove top lines(8 in this case) for work with only the actual data
infile = open(f'{target_directory}/{target_file}', 'r').readlines()
with open(after_rows, 'w') as outfile:
for index, line in enumerate(infile):
if index >= 8:
outfile.write(line)
# this segment removes the necessary columns, in this case leaving only coordinates for gmt use
with open(after_rows) as In, open(after_columns, "w") as Out:
for line in In:
values = line.split()
Out.write(f"{values[4]} {values[5]}\n")
i am searching for a way to run this code once on all files in the chosen directory(could be targeted by name or just do all of them),
should i change the function to use only the file name?
tried running the function this way, to no avail
for i in os.listdir('Geoseas_related_files'):
nav2xy('target_directory', i)
this way works perfectly, although somehow i still get this error with it.
(base) ms-iMac:python gan$ python3 coordinates_fromtxt.py
Traceback (most recent call last):
File "coordinates_fromtxt.py", line 7, in <module>
nav2xy('Geoseas_related_files', str(i))
File "/Users/gadraifman/research/python/GAD_MSC/Nav.py", line 19, in nav2xy
Out.write(f"{values[4]} {values[5]}\n")
IndexError: list index out of range
any help or advice would be a great help,
From what I gather from Iterating through directories with Python, the best way to loop directories is using glob.
I made some extensive other modifications to your code to simplify it and remove the middle step of saving lines to a file just to read them again. If this step is mandatory, then feel free to add it back.
import os, glob
def nav2xy(target_file):
# New file name, just appending stuff.
# "target_file" will contain the path as defined by root_dir + current filename
after_columns = f'{target_file}_acolumns.txt'
with open(target_file, 'r') as infile, open(after_columns, "w") as outfile:
content = infile.readlines()
#
# --- Skip 8 lines here
# |
# v
for line in content[8:]:
# No need to write the lines to a file, just to read them again.
# Process directly
values = line.split()
outfile.write(f"{values[4]} {values[5]}\n")
# I guess this is the dir you want to loop through.
# Maybe an absolute path c:\path\to\files is better.
root_dir = 'Geoseas_related_files/*'
for file_or_dir in glob.iglob(os.path.join(root_dir,"*")):
# Skip directories, if there are any.
if os.path.isfile(file_or_dir):
nav2xy(file_or_dir)

Python IndexError: list index out of range large file

I have a very large file ~40GB and 674,877,098 lines I want to read and extract specific columns from. I can get about 3GB of data transferred then I get the following error.
Traceback (most recent call last):
File "C:\Users\Codes\Read_cat_write.py", line 44, in <module>
tid = int(columns[2])
IndexError: list index out of range
Sample of data that is being read in.
1,100000000,100000000,39,2.704006988169216e15,310057,0
2,100000001,100000000,38,2.650346740514816e15,303904,0.01
3,100000002,100000000,37,2.136985003098112e15,245039,0.03
4,100000003,100000000,36,2.29479163101184e15,263134,0.05
5,100000004,100000000,35,1.834645477916672e15,210371,0.06
6,100000005,100000000,34,1.814063860416512e15,208011,0.08
7,100000006,100000000,33,1.808883592986624e15,207417,0.1
8,100000007,100000000,32,1.806241248575488e15,207114,0.12
9,100000008,100000000,31,1.651783621410816e15,189403,0.14
10,100000009,100000000,30,1.634821184946176e15,187458,0.16
Code
from itertools import islice
F = r'C:\Users\Outfiles\comp_cat_raw.txt'
w = open(r'C:\Users\Outfiles\comp_cat_3col.txt','a')
def filesave(TID,M,R):
X = str(TID)
Y = str(M)
Z = str(R)
w.write(X)
w.write('\t')
w.write(Y)
w.write('\t')
w.write(Z)
w.write('\n')
N = 680000000
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ') # Replace comma with space
columns = line.split() # Splits into column
tid = int(columns[2])
m = float(columns[4])
r = float(columns[6])
filesave(tid,m,r)
w.close()
I have looked at the file being read in at the point where the error occurs, but I don't see anything wrong with the file so I am at a loss as to the cause of this error.
Chances are, there is some line with maybe one single comma in there, or none, or an empty line, whatever. Probably just put a try-except statement around the statement and catch the index error, probably printing out the line in question, and you should be done. Besides that, there are some things in your code, that might be worth to improve.
Have a look at the csv module especially. It has some optimized C-code exactly for what you want to do, so it should be much faster. This answer shows mainly how to write the iteration with csv.
This whole slice construction seems to be superfluous. A simple for line in f: will do and is the most efficient way to handle this iteration.
Use line.split(',') directly, instead of replacing them first with spaces.
Use with open(F) as f: instead of calling close yourself. For this script it might make no difference, but this way you make sure, that you e.g. don't create open file handles in case of errors.

User input after file input in Python?

First year Comp Sci student here.
I have an assignment that is asking us to make a simple game using Python, which takes an input file to create the game-world (2D grid). You're then supposed to give movement commands via user input afterwards. My program reads the input file one line at a time to create the world using:
def getFile():
try:
line = input()
except EOFError:
line = EOF
return line
...after which it creates a list to represent the line, with each member being a character in the line, and then creates a list containing each of these lists (amounting to a grid with row and column coordinates).
The thing is, I later need to take input in order to move the character, and I can't do this because it still wants to read the file input, and the last line from the file is an EOF character, causing an error. Specifically the "EOF when reading a line" error.
How can I get around this?
Sounds like you are reading the file directly from stdin -- something like:
python3 my_game.py < game_world.txt
Instead, you need to pass the file name as an argument to your program, that way stdin will still be connected to the console:
python3 my_game.py game_world.txt
and then get_file looks more like:
def getFile(file_name):
with open(file_name) as fh:
for line in fh:
return line
File interaction is python3 goes like this:
# the open keyword opens a file in read-only mode by default
f = open("path/to/file.txt")
# read all the lines in the file and return them in a list
lines = f.readlines()
#or iterate them at the same time
for line in f:
#now get each character from each line
for char_in_line in line:
#do something
#close file
f.close()
line terminator for the file is by default \n
If you want something else you pass it as a parameter to the open method (the newline parameter. Default=None='\n'):
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

How does one add string to tarfile in Python3

I have problem adding an str to a tar arhive in python. In python 2 I used such method:
fname = "archive_name"
params_src = "some arbitrarty string to be added to the archive"
params_sio = io.StringIO(params_src)
archive = tarfile.open(fname+".tgz", "w:gz")
tarinfo = tarfile.TarInfo(name="params")
tarinfo.size = len(params_src)
archive.addfile(tarinfo, params_sio)
Its essentially the same what can be found in this here.
It worked well. However, going to python 3 it broke and results with the following error:
File "./translate_report.py", line 67, in <module>
main()
File "./translate_report.py", line 48, in main
archive.addfile(tarinfo, params_sio)
File "/usr/lib/python3.2/tarfile.py", line 2111, in addfile
copyfileobj(fileobj, self.fileobj, tarinfo.size)
File "/usr/lib/python3.2/tarfile.py", line 276, in copyfileobj
dst.write(buf)
File "/usr/lib/python3.2/gzip.py", line 317, in write
self.crc = zlib.crc32(data, self.crc) & 0xffffffff
TypeError: 'str' does not support the buffer interface
To be honest I have trouble understanding where it comes from since I do not feed any str to tarfile module back to the point where I do construct StringIO object.
I know the meanings of StringIO and str, bytes and such changed a bit from python 2 to 3 but I do not see a mistake and cannot come up with better logic to solve this task.
I create StringIO object precisely to provide buffer methods around the string I want to add to the archive. Yet it strikes me that some str does not provide it. On top of it the exception is raised around lines that seem to be responsible for checksum calculations.
Can some one please explain what I am miss-understanding or at least give an example how to add a simple str to the tar archive with out creating an intermediate file on the file-system.
When writing to a file, you need to encode your unicode data to bytes explicitly; StringIO objects do not do this for you, it's a text memory file. Use io.BytesIO() instead and encode:
params_sio = io.BytesIO(params_src.encode('utf8'))
Adjust your encoding to your data, of course.

Resources