Assign a timezone label to a list of geometries of a geodataframe - geometry

I have a geodataframe ("timezones") that contains the timezone labels of all the world (for example: "Europe/Zurich", or "Pacific/Galapagos") and its corresponding geometry:
On the other hand, I have a geodataframe ("regions") with ~80K rows, where each row represents a region in some country, defined by a certain geometry (a Polygon or Multipolygon):
I need to assign a timezone to each of the regions, so the final dataframe has "region", "province", "geometry" and "timezone" columns.
I don't know how to do it, maybe using a for loop over each row and check if geometry is inside the timezone using geopandas within or contains?
for i,row in regions.iterrows():
if regions.geometry.within(timezones.geometry)=="True":
region['timezone'] = timezones['timezone']
This example does not work, but it could be something similar to this? Or maybe there is a better way to do it?
Any suggestions would be highly appreciated.

Related

Is there a pandas function that can create a dataframe of the mean, median, and mode of selected columns?

My attempt:
# Compute the mean, median and variance for the variables sph, acous and dur. Compare their level of variability.
sad_mean = dat_songs[['spch', 'acous', 'dur']].mean()
sad_mode = dat_songs[['spch', 'acous', 'dur']].mode()
sad_median = dat_songs[['spch', 'acous', 'dur']].median()
sad_mmm = pd.DataFrame({'mean':sad_mean, 'median':sad_median, 'mode':sad_mode})
sad_mmm
Which outputs this
First of all, the median column is not right at all and want to know how to fix that too.
Secondly, I feel like I have seen some quicker or shorter way to do this with a simple function with pandas.
My data head for reference
Simply try, dat_songs.describe(). Descriptive Statistics will be present for all the numerical columns.
For selected columns.
dat_songs[['spch', 'acous', 'dur']].describe()

Altair's selection and transform_filter via binding_range slider for datetime values doesn't seem to work with equality condition or selector itself

I wanted to bind a range slider with datetime values to filter a chart for data for a particular date only. Using the stocks data, what I want to do is have the x-axis show the companies and y-axis the price of the stocks for a particular day which the user selects via a range slider.
Based on inputs from this answer and this issue I have the following code which shows something
when the slider is moved around after one particular value (with the inequality condition in transform_filter), but is empty for the rest.
What is peculiar is that if I have inequality operator then at least it shows something, but everything is empty when its ==.
import altair as alt
from vega_datasets import data
source = data.stocks()
def timestamp(t):
return pd.to_datetime(t).timestamp()
slider = alt.binding_range(step=86400, min=timestamp(min(source['date'])), max=timestamp(max(source['date']))) #86400 is the difference b/w consequetive days
select_date = alt.selection_single(fields=['date'], bind=slider, init={'date': timestamp(min(source['date']))})
alt.Chart(source).mark_bar().encode(
x='symbol',
y='price',
).add_selection(select_date).transform_filter(alt.datum.date == select_date.date)
Since the output is empty I am inclined to conclude that it's the transform_filter that is causing issues, but I have been at it for more than 6 hours now and tried all the permutation and combinations of using alt.expr.toDate and other conversions here and there but I cannot get it to work.
Also tried just transform_filter(select_date.date) and transform_filter(date) along with other things but nothing quite works.
The expected output is that, the heights of bars change(due to data being filtered on date) as the user drags the slider.
Any help would be really appreciated.
There are several issues here:
In Vega-Lite, timestamps are expressed in milliseconds, not seconds
You are filtering on equality between a numerical timestamp and a string representation of a date.
Even if you parse the date in the filter expression, Python date parsing and Javascript date parsing behave differently and the results will generally not match. Even within javascript, the date parsing behavior can vary from browser to browser; all this means that filtering on equality of a Python and Javascript timestamp is generally problematic
The data you are using has monthly timestamps, so the slider step should account for this
Keeping all that in mind, the best course would probably be to adjust the slider values and filter on matching year and month, rather than trying to achieve equality in the exact timestamp. The result looks like this:
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.stocks()
def timestamp(t):
return pd.to_datetime(t).timestamp() * 1000
slider = alt.binding_range(
step=30 * 24 * 60 * 60 * 1000, # 30 days in milliseconds
min=timestamp(min(source['date'])),
max=timestamp(max(source['date'])))
select_date = alt.selection_single(
fields=['date'],
bind=slider,
init={'date': timestamp(min(source['date']))},
name='slider')
alt.Chart(source).mark_bar().encode(
x='symbol',
y='price',
).add_selection(select_date).transform_filter(
"(year(datum.date) == year(slider.date[0])) && "
"(month(datum.date) == month(slider.date[0]))"
)
You can view the result here: vega editor.

code produces a 2d histogram but the results dont match with hist2d

I am trying to write a histogram builder to construct a 2d histogram for my assignment work. This is [my code][1]:
def Build2DHistogramClassifier(X1,X2,T,B,x1min,x1max,x2min,x2max):
HF=np.zeros((B,B),dtype='int');#initialising a empty array of integer type
HM=np.zeros((B,B),dtype='int');
bin_row_indices=(np.round(((B-1)*(X1-x1min)/(x1max-x1min)))).astype('int32');"""this logic decides which bin the value goes into"""
bin_column_indices=(np.round(((B-1)*(X2-x2min)/(x2max-x2min)))).astype('int32');"""np.round-->applies the formula to all the values in the array"""
for i,(r,c) in enumerate(zip(bin_row_indices, bin_column_indices)):
"""enumerate-->if we put array or list into it gives output with index/count i """
if T[i]=='Female':
HF[r,c]+=1;
else:
HM[r,c]+=1;
return [HF, HM]
but the problem is that the results( count in each bin) i am getting is not matching the what i get from using hist2d function in numpy( i passed the same bin size)
i am sorry if my code is not in the right format. Please click on the hyperlink to a gist i created with the same code.
what is the mistake in my code?
how do i correct it?
thanks
By rounding when assigning to bins you are treating the bins as bin centers. The numpy convention is to use them as bin edges.
Remove the two calls to round() from your code and change B-1 to B. You should now get the same results with your function and with np.histogram2d.

MKMetersBetweenMapPoints incorrectly returning distance in meters

running this code:
NSLog(#"%f", MKMetersBetweenMapPoints(MKMapPointMake(33.6523, -118.507), MKMapPointMake(34.516, -117.628)));
returns this:
0.015819
When the expected output should be ~136900. What am I doing wrong?
It looks like you're giving a regular latitude and longitude to MKMapPointMake(). An MKMapPoint contains and x and y value, not latitude and longitude.
Use MKMapPointForCoordinate(myLocation) to convert your locations to map points, then give those to your MKMetersBetweenMapPoints() function.
Or easier still, use -distanceFromLocation: with two CLLocation objects. It gives you back a distance in meters, taking into account the curvature of the earth.

Access list element using get()

I'm trying to use get() to access a list element in R, but am getting an error.
example.list <- list()
example.list$attribute <- c("test")
get("example.list") # Works just fine
get("example.list$attribute") # breaks
## Error in get("example.list$attribute") :
## object 'example.list$attribute' not found
Any tips? I am looping over a vector of strings which identify the list names, and this would be really useful.
Here's the incantation that you are probably looking for:
get("attribute", example.list)
# [1] "test"
Or perhaps, for your situation, this:
get("attribute", eval(as.symbol("example.list")))
# [1] "test"
# Applied to your situation, as I understand it...
example.list2 <- example.list
listNames <- c("example.list", "example.list2")
sapply(listNames, function(X) get("attribute", eval(as.symbol(X))))
# example.list example.list2
# "test" "test"
Why not simply:
example.list <- list(attribute="test")
listName <- "example.list"
get(listName)$attribute
# or, if both the list name and the element name are given as arguments:
elementName <- "attribute"
get(listName)[[elementName]]
If your strings contain more than just object names, e.g. operators like here, you can evaluate them as expressions as follows:
> string <- "example.list$attribute"
> eval(parse(text = string))
[1] "test"
If your strings are all of the type "object$attribute", you could also parse them into object/attribute, so you can still get the object, then extract the attribute with [[:
> parsed <- unlist(strsplit(string, "\\$"))
> get(parsed[1])[[parsed[2]]]
[1] "test"
flodel's answer worked for my application, so I'm gonna post what I built on it, even though this is pretty uninspired. You can access each list element with a for loop, like so:
#============== List with five elements of non-uniform length ================#
example.list=
list(letters[1:5], letters[6:10], letters[11:15], letters[16:20], letters[21:26])
#===============================================================================#
#====== for loop that names and concatenates each consecutive element ========#
derp=c(); for(i in 1:length(example.list))
{derp=append(derp,eval(parse(text=example.list[i])))}
derp #Not a particularly useful application here, but it proves the point.
I'm using code like this for a function that calls certain sets of columns from a data frame by the column names. The user enters a list with elements that each represent different sets of column names (each set is a group of items belonging to one measure), and the big data frame containing all those columns. The for loop applies each consecutive list element as the set of column names for an internal function* applied only to the currently named set of columns of the big data frame. It then populates one column per loop of a matrix with the output for the subset of the big data frame that corresponds to the names in the element of the list corresponding to that loop's number. After the for loop, the function ends by outputting that matrix it produced.
Not sure if you're looking to do something similar with your list elements, but I'm happy I picked up this trick. Thanks to everyone for the ideas!
"Second example" / tangential info regarding application in graded response model factor scoring:
Here's the function I described above, just in case anyone wants to calculate graded response model factor scores* in large batches...Each column of the output matrix corresponds to an element of the list (i.e., a latent trait with ordinal indicator items specified by column name in the list element), and the rows correspond to the rows of the data frame used as input. Each row should presumably contain mutually dependent observations, as from a given individual, to whom the factor scores in the same row of the ouput matrix belong. Also, I feel I should add that if all the items in a given list element use the exact same Likert scale rating options, the graded response model may be less appropriate for factor scoring than a rating scale model (cf. http://www.rasch.org/rmt/rmt143k.htm).
'grmscores'=function(ColumnNameList,DataFrame) {require(ltm) #(Rizopoulos,2006)
x = matrix ( NA , nrow = nrow ( DataFrame ), ncol = length ( ColumnNameList ))
for(i in 1:length(ColumnNameList)) #flodel's magic featured below!#
{x[,i]=factor.scores(grm(DataFrame[, eval(parse(text= ColumnNameList[i]))]),
resp.patterns=DataFrame[,eval(parse(text= ColumnNameList[i]))])$score.dat$z1}; x}
Reference
*Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses, Journal of Statistical Software, 17(5), 1-25. URL: http://www.jstatsoft.org/v17/i05/

Resources