Save projected raster as a variable and not as a file Python - python-3.x

I would like to reproject a raster and keep working on that reprojected raster instead of loading it again from a file.
To project a raster I use either gdal:
# Source
src = gdal.Open(vv_path, gdalconst.GA_ReadOnly)
src_proj = src.GetProjection()
src_geotrans = src.GetGeoTransform()
# We want a section of source that matches this:
match_ds = gdal.Open(sn2_red_path, gdalconst.GA_ReadOnly)
match_proj = match_ds.GetProjection()
match_geotrans = match_ds.GetGeoTransform()
wide = match_ds.RasterXSize
high = match_ds.RasterYSize
# Output / destination
dst_filename = os.path.join(sn1_processed_path,'vv.tif')
dst = gdal.GetDriverByName('Gtiff').Create(dst_filename, wide, high, 1, gdalconst.GDT_Float32)
dst.SetGeoTransform( match_geotrans )
dst.SetProjection( match_proj)
# Do the work
aa = gdal.ReprojectImage(src, dst, src_proj, match_proj, gdalconst.GRA_NearestNeighbour)
del dst # Flush
or rasterio from here
In both cases, the projected raster is saved to a file, and I have to load it again to procees it. Is it possible to save the projected raster also as a variable?

You could use VRT datasets:
src = gdal.Open(“reference.tif”)
dst = gdal.Warp(“warped.vrt”, src, format=“vrt”, dstSRS=“EPSG:3857”)
This way only a small VRT file will be created, and you can use the dst dataset in downstream processing at which point the warping will be actually performed.
You can even create the VRT itself in memory, so nothing is written to disk at all:
dst = gdal.Warp(“”, src, format=“vrt”, dstSRS=“EPSG:3857”)
If your dataset fits entirely in memory, you can create the actual dataset in memory using the vsimem virtual file system driver, which has the upside that you have to perform the processing only once if you want to use it downstream in multiple functions:
dst = gdal.Warp(“/vsimem/result_inmemory.tif”, src, format=“tif”, dstSRS=“EPSG:3857”)
This way the processing will be performed immediately, but then you can use the dataset object to e.g. write it to disk and then perform additional processing.

Related

Vertex AI scheduled notebooks doesn't recognize existence of folders

I have a managed jupyter notebook in Vertex AI that I want to schedule. The notebook works just fine as long as I start it manually, but as soon as it is scheduled, it fails. There are in fact many things that go wrong when scheduled, some of them are fixable. Before explaining what my trouble is, let me first give some details of the context.
The notebook gathers information from an API for several stores and saves the data in different folders before processing it, saving csv-files to store-specific folders and to bigquery. So, in the location of the notebook, I have:
The notebook
Functions needed for the handling of data (as *.py files)
A series of folders, some of which have subfolders which also have subfolders
When I execute this manually, no problem. Everything works well and all files end up exactly where they should, as well as in different bigQuery tables.
However, when scheduling the execution of the notebook, everything goes wrong. First, the files *.py cannot be read (as import). No problem, I added the functions in the notebook.
Now, the following error is where I am at a loss, because I have no idea why it does work or how to fix it. The code that leads to the error is the following:
internal = "https://api.************************"
df_descriptions = []
storess = internal
response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
filepath = "stores"
files = os.listdir(filepath)
for file in files:
with open(filepath + "/"+file) as json_string:
jsonstr = json.load(json_string)
information = pd.json_normalize(jsonstr)
df_descriptions.append(information)
StoreINFO = pd.concat(df_descriptions)
StoreINFO = StoreINFO.dropna()
StoreINFO = StoreINFO[StoreINFO['storeIdMappings'].map(lambda d: len(d)) > 0]
cloud_store_ids = list(set(StoreINFO.cloudStoreId))
LastWeek = datetime.date.today()- timedelta(days=2)
LastWeek =np.datetime64(LastWeek)
and the error reported is:
FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_165/2970402631.py in <module>
5 storess = internal
6 response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
----> 7 pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
8
9 filepath = "stores"
/opt/conda/lib/python3.7/pathlib.py in write_bytes(self, data)
1228 # type-check for the buffer interface before truncating the file
1229 view = memoryview(data)
-> 1230 with self.open(mode='wb') as f:
1231 return f.write(view)
1232
/opt/conda/lib/python3.7/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
1206 self._raise_closed()
1207 return io.open(self, mode, buffering, encoding, errors, newline,
-> 1208 opener=self._opener)
1209
1210 def read_bytes(self):
/opt/conda/lib/python3.7/pathlib.py in _opener(self, name, flags, mode)
1061 def _opener(self, name, flags, mode=0o666):
1062 # A stub for the opener argument to built-in open()
-> 1063 return self._accessor.open(self, flags, mode)
1064
1065 def _raw_open(self, flags, mode=0o777):
FileNotFoundError: [Errno 2] No such file or directory: 'stores/request_1.json'
I would gladly find another way to do this, for instance by using GCS buckets, but my issue is the existence of sub-folders. There are many stores and I do not wish to do this operation manually because some retailers for which I am doing this have over 1000 stores. My python code generates all these folders and as I understand it, this is not feasible in GCS.
How can I solve this issue?
GCS uses a flat namespace, so folders don't actually exist, but can be simulated as given in this documentation.For your requirement, you can either use absolute path (starting with "/" -- not relative) or create the "stores" directory (with "mkdir"). For more information you can check this blog.

How to modify TIF file's EXIF data

I am trying to modify existing metadata within python 3. More specifically I have GPS coordinates and altitude in a my metadata, and I need to modify it.
I'm using piexif mudule, and I ancounter two problems.
First, I managed to change Altitude, using
exif_dict['GPS'][piexif.GPSIFD.GPSAltitude] = (140, 1)
and it works.
But I can't understand how to change Latitude and Longtitude? as they consist of three fields, like ((53, 1), (291191, 10000), (0, 1)).
The second problem occurs when I try to save tiff file with modified metadata. If I save it as TIFF file:
img.save(fname_2, 'tiff', exif=exif_bytes),
the fname_2 file is created, but it's metadata isn't changed. If Isave as JPEG -
img.save(fname_2, 'jpeg', exif=exif_bytes)
- metadata changes, but the file is compressed from 289 MB to 15 MB, that makes it impossible to use it for my purposes.
Has anyone managed to do this? It sounds like it would be very simple, but I can't seem to work it out.
import piexif
from PIL import Image
Image.MAX_IMAGE_PIXELS = 1000000000
fname_1='D:\EZG\Codding\photo\iiq/eee.tif'
fname_2='D:\EZG\Codding\photo\iiq/eee_change.tif'
img = Image.open(fname_1)
exif_dict = piexif.load(fname_1)
latitide = exif_dict['GPS'][piexif.GPSIFD.GPSLatitude]
longtitude = exif_dict['GPS'][piexif.GPSIFD.GPSLongitude]
altitude = exif_dict['GPS'][piexif.GPSIFD.GPSAltitude]
print(latitide)
print(longtitude)
print(altitude)
exif_dict['GPS'][piexif.GPSIFD.GPSAltitude] = (140, 1)
exif_bytes = piexif.dump(exif_dict)
img.save(fname_2, 'tiff', exif=exif_bytes)
the fname_2 file is created, but it's metadata isn't changed
Based on other questions and answers on SO it seems that the values are encoded as fractions:
((53, 1), (291191, 10000), (0, 1))
is 53 degrees 291191/10000 = 29.1191 minutes North (0 == N; 1 == S)
You may also want to check this answer, as there is a better package to edit GPS coordinates in photo metadata.

How to download a sentinel images from google earth engine using python API in tfrecord

While trying to download sentinel image for a specific location, the tif file is generated by default in drive but its not readable by openCV or PIL.Image().Below is the code for the same. If I use the file format as tfrecord. There are no Images downloaded in the drive.
starting_time = '2018-12-15'
delta = 15
L = -96.98
B = 28.78
R = -97.02
T = 28.74
cordinates = [L,B,R,T]
my_scale = 30
fname = 'sinton_texas_30'
llx = cordinates[0]
lly = cordinates[1]
urx = cordinates[2]
ury = cordinates[3]
geometry = [[llx,lly], [llx,ury], [urx,ury], [urx,lly]]
tstart = datetime.datetime.strptime(starting_time, '%Y-%m-%d') tend =
tstart+datetime.timedelta(days=delta)
collSent = ee.ImageCollection('COPERNICUS/S2').filterDate(str(tstart).split('')[0], str(tend).split(' ')[0]).filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)).map(mask2clouds)
medianSent = ee.Image(collSent.reduce(ee.Reducer.median())) cropLand = ee.ImageCollection('USDA/NASS/CDL').filterDate('2017-01-01','2017-12-31').first()
task_config = {
'scale': my_scale,
'region': geometry,
'fileFormat':'TFRecord'
}
f1 = medianSent.select(['B1_median','B2_median','B3_median'])
taskSent = ee.batch.Export.image(f1,fname+"_Sent",task_config)
taskSent.start()
I expect the output to be readable in python so I can covert into numpy. In case of file format 'tfrecord', I expect the file to be downloaded in my drive.
I think you should think about the following things:
File format
If you want to open your file with PIL or OpenCV, and not with TensorFlow, you would rather use GeoTIFF. Try with this format and see if things are improved.
Saving to drive
Normally saving to your Drive is the default behavior. However, you can try to force writing to your drive:
ee.batch.Export.image.toDrive(image=f1, ...)
You can further try to setup a folder, where the images should be sent to:
ee.batch.Export.image.toDrive(image=f1, folder='foo', ...)
In addition, the Export data help page and this tutorial are good starting points for further research.

Pyueye image saving with wrong resolution

personally pretty new to programming and I am trying to save a high mp Image from an IDS camera using the pyueye module with python.
my Code works to save the Image, but the Problem is it saves the Image as a 1280x720 Image inside a 4192x3104
I have no idea why its saving the small Image inside the larger file and am asking if anyone knows what i am doing wrong and how can I fix it so the Image is the whole 4192x3104
from pyueye import ueye
import ctypes
hcam = ueye.HIDS(0)
pccmem = ueye.c_mem_p()
memID = ueye.c_int()
hWnd = ctypes.c_voidp()
ueye.is_InitCamera(hcam, hWnd)
ueye.is_SetDisplayMode(hcam, 0)
sensorinfo = ueye.SENSORINFO()
ueye.is_GetSensorInfo(hcam, sensorinfo)
ueye.is_AllocImageMem(hcam, sensorinfo.nMaxWidth, sensorinfo.nMaxHeight,24, pccmem, memID)
ueye.is_SetImageMem(hcam, pccmem, memID)
ueye.is_SetDisplayPos(hcam, 100, 100)
nret = ueye.is_FreezeVideo(hcam, ueye.IS_WAIT)
print(nret)
FileParams = ueye.IMAGE_FILE_PARAMS()
FileParams.pwchFileName = "python-test-image.bmp"
FileParams.nFileType = ueye.IS_IMG_BMP
FileParams.ppcImageMem = None
FileParams.pnImageID = None
nret = ueye.is_ImageFile(hcam, ueye.IS_IMAGE_FILE_CMD_SAVE, FileParams, ueye.sizeof(FileParams))
print(nret)
ueye.is_FreeImageMem(hcam, pccmem, memID)
ueye.is_ExitCamera(hcam)
The size of the image depends on the sensor size of the camera.By printing sensorinfo.nMaxWidth and sensorinfo.nMaxHeight you will get the maximum size of the image which the camera captures. I think that it depends on the model of the camera. For me it is 2056x1542.
Could you please elaborate on the last sentence of the question.

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Resources