using as.ppp on data frame to create marked process - spatstat

I am using a data frame to create a marked point process using as.ppp function. I get an error Error: is.numeric(x) is not TRUE. The data I am using is as follows:
dput(head(pointDataUTM[,1:2]))
structure(list(POINT_X = c(439845.0069, 450018.3603, 451873.2925,
446836.5498, 445040.8974, 442060.0477), POINT_Y = c(4624464.56,
4629024.646, 4624579.758, 4636291.222, 4614853.993, 4651264.579
)), .Names = c("POINT_X", "POINT_Y"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I can see that the first two columns are numeric, so I do not know why it is a problem.
> str(pointDataUTM)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5028 obs. of 31 variables:
$ POINT_X : num 439845 450018 451873 446837 445041 ...
$ POINT_Y : num 4624465 4629025 4624580 4636291 4614854 ...
Then I also checked for NA, which shows no NA
> sum(is.na(pointDataUTM$POINT_X))
[1] 0
> sum(is.na(pointDataUTM$POINT_Y))
[1] 0
When I tried even only the first two columns of the data.frame, the error I get on using as.ppp is this:
Error: is.numeric(x) is not TRUE
5.stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), ch), call. = FALSE, domain = NA)
4.stopifnot(is.numeric(x))
3.ppp(X[, 1], X[, 2], window = win, marks = marx, check = check)
2.as.ppp.data.frame(pointDataUTM[, 1:2], W = studyWindow)
1.as.ppp(pointDataUTM[, 1:2], W = studyWindow)
Could someone tell me what is the mistake here and why I get the not numeric error?
Thank you.

The critical check is whether PointDataUTM[,1] is numeric, rather than PointDataUTM$POINT_X.
Since PointDataUTM is a tbl object, and tbl is a function from the dplyr package, what is probably happening is that the subset operator for the tbl class is returning a data frame, and not a numeric vector, when a single column is extracted. Whereas the $ operator returns a numeric vector.
I suggest you convert your data to data.frame using as.data.frame() before calling as.ppp.
In the next version of spatstat we will make our code more robust against this kind of problem.

I'm on the phone, so can't check but I think it is happens because you have a tibble and not a data.frame. Please try to convert to a data.frame using as.data.frame first.

Related

How can I set a value of an indexed variable? - Pyomo

I'm working on a project related with AC OPF (Optimal Power Flow) and I was trying to solve a problem in python, using pyomo.
There are 3 buses and the bus voltage and bus angle are limited. However, the 1st bus must have a voltage=1 and an angle=0.
So, I tried this:
model.busvoltage = Var(model.bus, initialize=1, bounds=(0.95, 1.05), doc='Bus Voltage')
model.busvoltage[1].fixed=True
model.busangle = Var(model.bus, initialize=0, bounds=(-3.14, 3.14), doc='Bus angle')
model.busangle[1].fixed=True
The problem is that I just want to set the busvoltage and busangle for the first bus, without initializing the other ones with those values. I don't know if this is important to write, but I'm using ipopt as solver.
(This is my first time programming in Python) Any help would be appreciated!
You're after the .value attribute of the variable. Furthermore, setting the value of a variable and fixing it at the same time can be simplified to calling .fix():
model.busvoltage = Var(model.bus, bounds=(0.95, 1.05), doc='Bus Voltage')
model.busvoltage[1].fixed = True
model.busvoltage[1].value = 1
model.busangle = Var(model.bus, bounds=(-3.14, 3.14), doc='Bus angle')
model.busangle[1].fix(0)
Have you considered initializing the value in a constraint rather than in the Var definition?
model.busvoltage = Var(model.bus, bounds=(0.95, 1.05), doc='Bus Voltage')
model.busvoltage[1].fixed=True
def bus_voltage_1_value_constraint(model, b):
if b == 1:
return model.busvoltage[b] == 1
return Constraint.Skip
model.bus_voltage_1_value = Constraint(model.bus, rule=bus_voltage_1_value_constraint)

Python: [Informix][Informix ODBC Driver]Invalid string or buffer length. SQLCODE=-11071 when I fetch data

When I try to retreive my table in informix with ifxpy package, I get this error:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-6-b0557e7f099b> in <module>
16 while dictionary != False:
17 tph.append(dictionary)
---> 18 dictionary = IfxPy.fetch_assoc(stmt)
19 print(pd.DataFrame(tph))
Exception: [Informix][Informix ODBC Driver]Invalid string or buffer length. SQLCODE=-11071
This is my code:
import IfxPy
import pandas as pd
ConStr = "SERVER=informix1;DATABASE=ir_fmois;HOST=127.0.0.1;SERVICE=9092;UID=informix;PWD=1234;"
# netstat -a | findstr 9088
try:
# netstat -a | findstr 9088
conn = IfxPy.connect( ConStr, "", "")
except Exception as e:
print ('ERROR: Connect failed')
print ( e )
quit()
sql = "SELECT * FROM oih"
stmt = IfxPy.exec_immediate(conn, sql)
dictionary = IfxPy.fetch_assoc(stmt)
tph=[]
while dictionary != False:
tph.append(dictionary)
dictionary = IfxPy.fetch_assoc(stmt)
print(pd.DataFrame(tph))
When I print the dataframe I see that I got only the first 4 rows.
I tried also this code and i doesn't trhow any exception but it returns also the first 4 rows of my table:
import IfxPyDbi as dbapi2
ConStr = "SERVER=informix1;DATABASE=ir_fmois;HOST=127.0.0.1;SERVICE=9092;UID=informix;PWD=1234;"
conn = dbapi2.connect( ConStr, "", "")
cur = conn.cursor()
sql = "SELECT * FROM oih"
rows = cur.fetchall()
len(rows)
>>>4
EDIT
I tried importing columns one by one, and the same error has occured when I tried to select a byte (blob) column (with text data type). This column has emty values in the first 4 rows but the fifth wasn't empty, and I think this is why the error occured in line 5.
I would really appreciate if anyone has any idea on how to solve this.
The finderr command reports:
$ finderr -11071
-11071 Invalid string or buffer length.
This Informix CLI error code is the same as SQLSTATE value S1090. The
following functions can return this error code: SQLBindCol(),
SQLBindParameter(), SQLBrowseConnect(), SQLColAttributes(),
SQLColumnPrivileges(), SQLColumns(), SQLConnect(), SQLDataSources(),
SQLDescribeCol(), SQLDriverConnect(), SQLDrivers(), SQLExecDirect(),
SQLExecute(), SQLForeignKeys(), SQLGetCursorName(), SQLGetData(),
SQLGetInfo(), SQLNativeSql(), SQLPrepare(), SQLPrimaryKeys(),
SQLProcedureColumns(), SQLProcedures(), SQLPutData(), SQLSetCursorName(),
SQLSpecialColumns(), SQLStatistics(), SQLTablePrivileges(), and SQLTables().
The value specified for the argument cbValueMax is less than zero.
Supply a value for the argument cbValueMax that is zero or greater.
For all functions, an argument that specified a string or buffer length, such
as cbCursor, cbConnStrIn, or cbSqlStr, had one or more of the following
problems: (1) It was less than 0, (2) It was less than 0 but not equal to
SQL_NTS or SQL_NULL_DATA, (3) It was less than 0 but the corresponding
pointer was not a null pointer, (4) It was equal to 1, or (5) It was too
large. Set the string or buffer length to a valid value.
Additionally for SQLExecDirect() and SQLExecute(), a parameter value that was
set with SQLBindParameter() had one of the following problems: (1) It was a
null pointer, and the parameter length was not 0, SQL_NULL_DATA,
SQL_DATA_AT_EXEC, or less than or equal to SQL_LEN_DATA_AT_EXEC_OFFSET, or
(2) It was not a null pointer, and the parameter length was less than 0 but
was not SQL_NTS, SQL_NULL_DATA, SQL_DATA_AT_EXEC, or less than or equal to
SQL_LEN_EXEC_DATA_AT_EXEC_OFFSET. Set the parameter value to a valid value.
$
Since your Python code is not calling any of those functions directly, that suggests there is a bug in the ifxpy driver. It is probably calling one of the listed functions with an incorrect argument.
As such, you should probably report it as an 'issue' on the GitHub site for the driver: https://github.com/OpenInformix/IfxPy

How do I assign multiple values in a dictionary in python?

I am working with object variables based off ICD-9 codes and am trying to create a dictionary which identifies all ICD-9 codes between E880.xx and E888.xx.
I want to code all values between E880.xx and E888.xx as a 1, and all other values as a 0.
I attempted this:
fallinjury_Dictionary = {1 : 'E880', 1 : 'E881', 1 : 'E882' ...}
but the key gets overwritten every time and I only end up with one value in the dictionary (1 = E888)
I've also tried this:
fallinjury_Dictionary = {1 : {'E880' or 'E881' or 'E882' .... or 'E888'}
which just doesn't work.
Dictionaries can only have one value for each key. If you are looking to map the value 1 to E880 (instead of the other way around), you can do that, but otherwise you can't (uniqueness of keys is in an inherent part of dictionaries).
fallinjury_Dictionary = dict.fromkeys(['E880', 'E881', 'E882'], 1)
fallinjury_Dictionary.update(dict.fromkeys(['E880', 'E883'], 0)) # to update values
If i got you right it must be "0: 'E901'" for E901
fallinjury_Dictionary = {1 : 'E880', 1 : 'E881', 1 : 'E882', ... 0: 'E901'}
# Instead i would switch the key and the value for this case
fallinjury_Dictionary = {'E880': 1, 'E881': 1, 'E882':1, ... 'E901': 0}
# You could use boolean
fallinjury_Dictionary = {'E880': True, 'E881': True, 'E882': True, ... 'E901': False}
You could also think about creating classes.

Updating multiple line plots dynamically in callback in bokeh

I have a use case where I have multiple line plots (with legends), and I need to update the line plots based on a column condition. Below is an example of two data set, based on the country, the column data source changes. But the issue I am facing is, the number of columns is not fixed for the data source, and even the types can vary. So, when I update the data source based on a callback when there is a new country selected, I get this error:
Error: attempted to retrieve property array for nonexistent field 'pay_conv_7d.content'.
I am guessing because in the new data source, the pay_conv_7d.content column doesn't exist, but in my plot those lines were already there. I have been trying to fix this issue by various means (making common columns for all country selection - adding the missing column in the data source in callback, but still get issues.
Is there any clean way to have multiple line plots updating using callback, and not do a lot of hackish way? Any insights or help would be really appreciated. Thanks much in advance! :)
def setup_multiline_plots(x_axis, y_axis, title_text, data_source, plot):
num_categories = len(data_source.data['categories'])
legends_list = list(data_source.data['categories'])
colors_list = Spectral11[0:num_categories]
# xs = [data_source.data['%s.'%x_axis].values] * num_categories
# ys = [data_source.data[('%s.%s')%(y_axis,column)] for column in data_source.data['categories']]
# data_source.data['x_series'] = xs
# data_source.data['y_series'] = ys
# plot.multi_line('x_series', 'y_series', line_color=colors_list,legend='categories', line_width=3, source=data_source)
plot_list = []
for (colr, leg, column) in zip(colors_list, legends_list, data_source.data['categories']):
xs, ys = '%s.'%x_axis, ('%s.%s')%(y_axis,column)
plot.line(xs,ys, source=data_source, color=colr, legend=leg, line_width=3, name=ys)
plot_list.append(ys)
data_source.data['plot_names'] = data_source.data.get('plot_names',[]) + plot_list
plot.title.text = title_text
def update_plot(country, timeseries_df, timeseries_source,
aggregate_df, aggregate_source, category,
plot_pay_7d, plot_r_pay_90d):
aggregate_metrics = aggregate_df.loc[aggregate_df.country == country]
aggregate_metrics = aggregate_metrics.nlargest(10, 'cost')
category_types = list(aggregate_metrics[category].unique())
timeseries_df = timeseries_df[timeseries_df[category].isin(category_types)]
timeseries_multi_line_metrics = get_multiline_column_datasource(timeseries_df, category, country)
# len_series = len(timeseries_multi_line_metrics.data['time.'])
# previous_legends = timeseries_source.data['plot_names']
# current_legends = timeseries_multi_line_metrics.data.keys()
# common_legends = list(set(previous_legends) & set(current_legends))
# additional_legends_list = list(set(previous_legends) - set(current_legends))
# for legend in additional_legends_list:
# zeros = pd.Series(np.array([0] * len_series), name=legend)
# timeseries_multi_line_metrics.add(zeros, legend)
# timeseries_multi_line_metrics.data['plot_names'] = previous_legends
timeseries_source.data = timeseries_multi_line_metrics.data
aggregate_source.data = aggregate_source.from_df(aggregate_metrics)
def get_multiline_column_datasource(df, category, country):
df_country = df[df.country == country]
df_pivoted = pd.DataFrame(df_country.pivot_table(index='time', columns=category, aggfunc=np.sum).reset_index())
df_pivoted.columns = df_pivoted.columns.to_series().str.join('.')
categories = list(set([column.split('.')[1] for column in list(df_pivoted.columns)]))[1:]
data_source = ColumnDataSource(df_pivoted)
data_source.data['categories'] = categories
Recently I had to update data on a Multiline glyph. Check my question if you want to take a look at my algorithm.
I think you can update a ColumnDataSource in three ways at least:
You can create a dataframe to instantiate a new CDS
cds = ColumnDataSource(df_pivoted)
data_source.data = cds.data
You can create a dictionary and assign it to the data attribute directly
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [17.0, 116.0], [17.0, 126.0]],
'ys1': [[179.0, 169.0], [179.0, 1169.0], [1729.0, 169.0]],
'xs2': [[27.0, 276.0], [27.0, 216.0], [27.0, 226.0]],
'ys2': [[279.0, 269.0], [279.0, 2619.0], [2579.0, 2569.0]]
}
data_source.data = d
Here if you need different sizes of columns or empty columns you can fill the gaps with NaN values in order to keep column sizes. And I think this is the solution to your question:
import numpy as np
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [np.nan], [np.nan]],
'ys1': [[179.0, 169.0], [np.nan], [np.nan]],
'xs2': [[np.nan], [np.nan], [np.nan]],
'ys2': [[np.nan], [np.nan], [np.nan]]
}
data_source.data = d
Or if you only need to modify a few values then you can use the method patch. Check the documentation here.
The following example shows how to patch entire column elements. In this case,
source = ColumnDataSource(data=dict(foo=[10, 20, 30], bar=[100, 200, 300]))
patches = {
'foo' : [ (slice(2), [11, 12]) ],
'bar' : [ (0, 101), (2, 301) ],
}
source.patch(patches)
After this operation, the value of the source.data will be:
dict(foo=[11, 22, 30], bar=[101, 200, 301])
NOTE: It is important to make the update in one go to avoid performance issues

Quinlan Attribute C5.0

Error in UseMethod("QuinlanAttributes") :
no applicable method for 'QuinlanAttributes' applied to an object of class "logical"
I am getting this error whenever I am running a code. I have installed several packages but this error is keep on repitiating.
it seems that C50 does not accept BOOLEAN features.
you can simpliy drop that column or replace BOOLEAN to 0/1.
if "tdata$Windy" is the BOOLEAN feature, replace the value of it.
library(C50)
tdata = read.csv('play.csv', header = TRUE, sep = ",")
xdata <- data.frame(tdata$Outlook,tdata$Temperature, tdata$Humidity, tdata$Windy)
ydata <- tdata$Play
treeModel <- C5.0(x = xdata, y = ydata )
summary(treeModel)
It's old question but I had same issue today and realized it's was due to read_sav().
I solved applying haven::as_factor to columns that should be factors.
data <- read_sav("datafile.sav")
data <- mutate(data, across(ends_with("_fct"), haven::as_factor ))

Resources