How to retrieve bbox for osmdata from spatial feature? - geospatial

How to define the bbox to download OSM data based on the extent of a spatial file?
The following example returns an error message:
...the only allowed values are floats between -90.0 and 90.0
This shows that the bbox-values are out of allowed range. It also shows that the convertion between NAD27 and EPSG:3857 did not return the spatial data at place where it should be.
With other spatial data I had similar problems. Eventhough within allowed range, the data didn't appear at the expected place. Downloaded OSM data appeared at a different place as the input spatial file.
library(sf)
library(raster)
library(osmdata)
osm_proj <-("+init=epsg:3857")
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, osm_proj)
bbox.nc <- as.vector(extent(nc[22,]))/100000
q <- opq(bbox = bbox.nc) %>%
add_osm_feature(key = 'natural', value = 'water')
osm.water <- osmdata_sf(q)
How to prepare the bbox that downloaded OSM data matches spatial extend of input spatial file?

OSM works in lat-lon, which means EPSG:4326. You need to transform the coordinates accordingly. You also don't need raster::extent(); sf::st_bbox() will be sufficient in this use case.
Or in your context consider this code; as this is only a toy example I am not using the whole NC state, but a single county (otherwise errors on timeout may occur, which would be a separate kind of a problem - this question is about bounding boxes).
library(sf)
library(osmdata)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
strelitz <- st_transform(nc, 4326) %>%
dplyr::filter(NAME == "Mecklenburg") # as in Charlotte of Mecklenburg-Strelitz
q <- opq(bbox = sf::st_bbox(strelitz)) %>%
add_osm_feature(key = 'natural', value = 'water') %>%
osmdata_sf()
plot(st_geometry(strelitz))
plot(st_geometry(q$osm_lines), col = 'blue', add = T)
A shameles plug: I wrote about querying OSM for points of interest a while back, you may find this post interesting :)
https://www.jla-data.net/eng/finding-pois-along-a-route/

Related

Expand netcdf to the whole globe with xarray

I have a dataset that looks like this:
As you can see, it only covers Latitudes between -55.75 and 83.25. I would like to expand that dataset so that it covers the whole globe (-89.75 to 89.75 in my case) and fill it with an arbitrary NA value.
Ideally I would want to do this with xarray. I have looked at .pad(), .expand_dims() and .assign_coords(), but did not really get a handle on the working ofeither of those.
If someone can provide an alternative solution with cdo, I would also be grateful for that.
You could do this with nctoolkit (https://nctoolkit.readthedocs.io/en/latest/), which uses CDO as a backend.
The example below shows how you could do it. Example starts by cropping a global temperature dataset to latitudes between -50 and 50. You would then need to regrid it to a global dataset, at whatever resolution you need. This uses CDO, which will extrapolate at the edges. So you probably want to set everything to NA outside the original dataset's values, so my code calls masklonlatbox from CDO.
import nctoolkit as nc
ds = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE2/sst.mon.ltm.1981-2010.nc")
ds.subset(time = 0)
ds.crop(lat = [-50, 50])
ds.to_latlon(lon = [-179.5, 179.5], lat = [-89.5, 89.5], res = 1)
ds.mask_box(lon = [-179.5, 179.5], lat = [-50, 50])
ds.plot()
# convert to xarray dataset
ds_xr = ds.to_xarray()

OSMNx : get coordinates of nodes/corners/edges of polygons/buildings

I am trying to retrieve the coordinates of all nodes/corners/edges of each commercial building in a list. E.g. for the supermarket Aldi in Macclesfield (UK), I can get from the UI 10 nodes (all the corners/edges of the supermarket) but I can only retrieve from osmnx 2 of those 10 nodes. I would need to access to the complete list of nodes but it truncates the results giving only 2 nodes of 10 in this case.Using this code below:
import osmnx as ox
test = ox.geocode_to_gdf('aldi, Macclesfield, Cheshire, GB')
ax = ox.project_gdf(test).plot()
test.geometry
or
gdf = ox.geometries_from_place('Grosvenor, Macclesfield, Cheshire, GB', tags)
gdf.geometry
Both return just two coordinates and truncate other info/results that is available in openStreetMap UI (you can see it in the first column of the image attached geometry>POLYGON>only two coordinates and other results truncated...). I would appreciate some help on this, thanks in advance.
It's hard to guess what you're doing here because you didn't provide a reproducible example (e.g., tags is undefined). But I'll try to guess what you're going for.
I am trying to retrieve the coordinates of all nodes/corners/edges of commercial buildings
Here I retrieve all the tagged commercial building footprints in Macclesfield, then extract the first one's polygon coordinates. You could instead filter these by other attribute values as you see fit if you only want certain kinds of buildings. Proper usage of OSMnx's geometries module is described in the documentation.
import osmnx as ox
# get the building footprints in Macclesfield
place = 'Macclesfield, Cheshire, England, UK'
tags = {'building': 'commercial'}
gdf = ox.geometries_from_place(place, tags)
# how many did we get?
print(gdf.shape) # (57, 10)
# extract the coordinates for the first building's footprint
gdf.iloc[0]['geometry'].exterior.coords
Alternatively, if you want a specific building's footprint, you can look up its OSM ID and tell OSMnx to geocode that value:
gdf = ox.geocode_to_gdf('W251154408', by_osmid=True)
polygon = gdf.iloc[0]['geometry']
polygon.exterior.coords
gdf = ox.geocode_to_gdf('W352332709', by_osmid=True)
polygon = gdf.iloc[0]['geometry']
polygon.exterior.coords
list(polygon.exterior.coords)

decision tree in R- extract data from a specific branch

I am trying to build a classify decision tree using rpart and partykit, and I am wondering is there any function within those packages (or any packages, for that matter) to allow me to create a dataset containing data from a specific subtree or branch?
I know that I can manually create the subset from original data set with DT rules, but I am trying to automate certain process and finding that function will help me immensely.
Example:
library (rpart)
library(partykit)
data("Titanic", package = "datasets")
ttnc <- as.data.frame(Titanic)
ttnc <- ttnc[rep(1:nrow(ttnc), ttnc$Freq), 1:4]
names(ttnc)[2] <- "Gender"
rp <- rpart(Survived ~ Gender + Age + Class, data = ttnc)
prp <- as.party(rp)
prp[5]
Lets say that I wanna extract data from the subtree #5, is there any function within those packages that allow me to do that?
Thank you!
In addition to the solution posted by #JakobGepp you can use the data_party() function provided by partykit:
data_party(prp, id = 5)
Essentially, this does the same thing internally that Jakob did explicitly by hand.
I don't know if you meant this by using the DT rules, but you could use the predict() function of the partykit package to predict the node / branches and then split the data according to your subtree.
ttnc$Node <- predict(prp, newdata = ttnc, type = "node")
subtree <- subset(ttnc, Node == 5)

Creating a map with basemap, filling countries

I'm currently working in my final project for my Coding class (my first coding class, so kind of an amateur).
My idea is for a code to search every newspaper in the world for a specific word within the titles (using bs4) and then obtaining a dictionary with the average mentions by country, taking into account the number of newspaper in each country. Afterwards, and this is the part where I'm stuck, I want to put this in a map.
The whole program is already working properly, until the part where I have a CSV with the following form:
'Country','Average'
'Afghanistan',10
'Albania',5
'Algeria',0
'Andorra',2
'Antigua and Barbuda',7
'Argentina',0
'Armenia',4
Now, I want to create a worldmap where the higher the number, the redder (or any other color) the whole polygon of the country. So far I've found many codes that work well placing points in space, but I haven't found one that "appends" the CSV data presented above and then fills each country accordingly. Below is the part of the code that currently created the worldmap:
# Now we proceed with the creation of the map
fig, ax = plt.subplots(figsize=(15,10)) # We define the size of the map
m = Basemap(resolution='c', # c, l, i, h, f or None
projection='merc', # Mercator projection
lat_0=24.20, lon_0=-6.67, # The center of the mas, so that the whole world is shown without splitting Asia
llcrnrlon=-180, llcrnrlat= -85,urcrnrlon=180, urcrnrlat=85) # The coordinates of the whole world
m.drawmapboundary(fill_color='#46bcec') # We choose a color for the boundary of the map
m.fillcontinents(color='#f2f2f2',lake_color='#46bcec') # We choose a color for the land and one for the lakes
m.drawcoastlines() # We choose to draw the lines of the map
m.readshapefile('Final project\\vincent_map_data-master\\ne_110m_admin_0_countries\\ne_110m_admin_0_countries', 'areas') # We import the shape file of the whole world
df_poly = pd.DataFrame({ # We define the polygon structure
'shapes': [Polygon(np.array(shape), True) for shape in m.areas],
'area': [area['name'] for area in m.areas_info]
})
cmap = plt.get_cmap('Oranges')
pc = PatchCollection(df_poly.shapes, zorder=2)
norm = Normalize()
mapper = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)
# We show the map
plt.show(m)
I opened the shapefile of the countries and the way to identify the countries is with the variable "sovereignty". There might be some non-sensical things within my code, since I've extracted things from many places. Sorry about that.
If someone could help me out, I would really appreciated.
Thanks

apply functions for xts

My data is currently an xts or zoo object of daily stock prices per row and each column is a different company.
library(quantmod)
getSymbols("AAPL;MSFT;YHOO")
closePrices <- merge(Cl(AAPL),Cl(MSFT),Cl(YHOO))
I am still new to R and need some assistance reproducing this Excel function. My first thought was to split the function into numerator and denominator, and then compute the index:
dailyDiff <- abs(diff(closePrices,1))
numerJ <- diff(closePrices,10)
denomJ <- as.xts(rollapply(dailyDiff,11, sum))
idx <- abs(numerJ/denomJ)
This was great because the values for each portion were accurate, but are aligned by incorrect dates for denomJ. For example, the tail of numerJ goes to 6/21/2012, while the tail of denomJ goes to 6/14/2012.
The output that I am looking for is:
6/21/2012 = .11
6/20/2012 = .27
6/19/2012 = .46
6/18/2012 = .39
6/15/2012 = .22
It's hard to tell exactly what your problem is without exact data, but the problem appears to be with rollapply. rollapply will only apply the function to whole intervals unless the argument partial is set to TRUE. Consider the following example
require(zoo)
#make up some data
mat <- matrix(1:100,ncol=2)
colnames(mat) <- c("x1","x2")
dates <- seq.Date(from=as.Date("2010-01-01"),length.out=50,by="1 day")
zoo.obj <- zoo(mat,dates)
#apply the funcitons
numerJ <- diff(zoo.obj,10) #dates okay
denomJ <- rollapply(zoo.obj,11, sum,partial=TRUE) #right dates
denomJ2 <- rollapply(zoo.obj,11,sum) #wrong dates
index <- abs(numerJ/denomJ) #right dates
You can use a combination of diff and either runSum or rollapplyr
#Get the data
library(quantmod)
getSymbols("AAPL")
I think this is what you're trying to do (note the use of the lag argument to diff.xts, and the n argument to runSum)
out <- diff(Cl(AAPL), lag=10) / runSum(abs(diff(Cl(AAPL))), n=11)
tail(out['/2012-06-21'])
# AAPL.Close
#2012-06-14 -0.1047297
#2012-06-15 0.2176938
#2012-06-18 0.3888185
#2012-06-19 0.4585821
#2012-06-20 0.2653782
#2012-06-21 0.1117371
Edit
Upon closer review of your question, I do not understand why rollapplyr is not the answer you're looking for. If I take your code, exactly as is, except I change rollapply to rollapplyr, it looks to me like it's exactly the output you're looking for.
dailyDiff <- abs(diff(closePrices,1))
numerJ <- diff(closePrices,10)
denomJ <- as.xts(rollapplyr(dailyDiff,11, sum))
idx <- abs(numerJ/denomJ)
# AAPL.Close MSFT.Close YHOO.Close
#2012-06-14 0.1047297 0.03826531 0.06936416
#2012-06-15 0.2176938 0.35280899 0.25581395
#2012-06-18 0.3888185 0.33161954 0.31372549
#2012-06-19 0.4585821 0.47096774 0.34375000
#2012-06-20 0.2653782 0.32644628 0.23750000
#2012-06-21 0.1117371 0.18997912 0.10256410
Also, note that both numerJ and denomJ both end on the same date if you use rollapplyr (which is the same as using rollapply with align="right")
end(numerJ); end(denomJ)
#[1] "2012-07-20"
#[1] "2012-07-20"
Yahoo Bug
Maybe the problem you're seeing is the yahoo bug where sometimes -- for example, right now -- yahoo duplicates the last (chronologically speaking) row of data. If so, try deleting the duplicated row before attempting to use the data for your calculations.
tidx <- tail(index(closePrices), 2)
if(tidx[1] == tidx[2]) {
closePrices <- closePrices[-NROW(closePrices), ]
}

Resources