Tensor Flow - Feed Data Frame to DNNClassifier For Predictions - python-3.x

I am struggling to feed data to the tf.esitimator.DNNClassifier after reloading it through tf.contrib.predictor.from_saved_model. I would very much appreciate your help.
I found this and this links but I am getting an error. Below is my implementation:
Saving Model:
feature_spec = tf.feature_column.make_parse_example_spec(feat_cols)
export_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
tuned_model.export_savedmodel('./model_dir/saved_models/', export_fn)
This successfully saves the model with the following info:
INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling
model_fn. INFO:tensorflow:Signatures INCLUDED in export for Classify:
['serving_default', 'classification'] INFO:tensorflow:Signatures
INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from
/nimble/kdalal/model_dir/model.ckpt-28917 INFO:tensorflow:Assets added
to graph. INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to:
./model_dir/saved_models/temp-b'1556819228'/saved_model.pb
Reloading For Predictions:
predict_prod = tf.contrib.predictor.from_saved_model('./model_dir/saved_models/1556819228')
predict_prod(dict(X_test))
I get the following error:
ValueError: Got unexpected keys in input_dict: {'DOW', 'JOB_FUNCTION',
'ACC_SIZE', 'answered_20D', 'MatchType', 'CONTACT_STATE', 'SEASONS',
'called_20D', 'st_cb_ans_20D', 'JOB_ROLE', 'st_cb_called_20D',
'CALL_BLOCKS'} expected: {'inputs'}
My X_test is a data frame that I'm trying to get predictions for.
[EDITED]:
My input dict looks like as follows:
{'JOB_ROLE': 714859 Manager-Level
714860 Manager-Level
714861 Manager-Level
714862 Manager-Level
714863 Director-Level
Name: JOB_ROLE, dtype: object,
'JOB_FUNCTION': 714859 Information Technology
714860 Information Technology
714861 Information Technology
714862 Information Technology
714863 Information Technology
Name: JOB_FUNCTION, dtype: object,
'MatchType': 714859 Work Phone
714860 Work Phone
714861 Work Phone
714862 Work Phone
714863 Account Main Phone
Name: MatchType, dtype: object,
'CALL_BLOCKS': 714859 17_18
714860 17_18
714861 17_18
714862 17_18
714863 17_18
Name: CALL_BLOCKS, dtype: object,
'ACC_SIZE': 714859 StartUps
714860 StartUps
714861 Small
714862 StartUps
714863 Small
Name: ACC_SIZE, dtype: object,
'CONTACT_STATE': 714859 WA
714860 CA
714861 CA
714862 CA
714863 CA
Name: CONTACT_STATE, dtype: object,
'SEASONS': 714859 Spring
714860 Spring
714861 Spring
714862 Spring
714863 Spring
Name: SEASONS, dtype: object,
'DOW': 714859 Monday
714860 Monday
714861 Monday
714862 Monday
714863 Monday
Name: DOW, dtype: object,
'called_20D': 714859 0.038760
714860 0.077519
714861 0.217054
714862 0.046512
714863 0.038760
Name: called_20D, dtype: float64,
'answered_20D': 714859 0.000000
714860 0.086957
714861 0.043478
714862 0.000000
714863 0.130435
Name: answered_20D, dtype: float64,
'st_cb_called_20D': 714859 0.050233
714860 0.282496
714861 0.282496
714862 0.282496
714863 0.282496
Name: st_cb_called_20D, dtype: float64,
'st_cb_ans_20D': 714859 0.059761
714860 0.314741
714861 0.314741
714862 0.314741
714863 0.314741
Name: st_cb_ans_20D, dtype: float64}
I am a beginner with tf and I don't know how to pass data frames to the model so that I can call predcit method and get the predictions.
Also, should I be converting my input data to some other dtype?

I found the answer. Please refer the link to understand how to feed data to the imported estimator model.

ValueError: Cannot feed value of shape (75116, 12) for Tensor 'input_example_tensor:0', which has shape '(?,)
about this question your model looks like to predict once 1 item
you can only feed one item like {'inputs': X_test.values[0]}
you can change the model to predict bunch of item
Good luck

Related

How to get monthly cost data from Azure using Python SDK?

I'm trying to get monthly cost data from Azure using Azure SDK for Python, but Microsoft Documentation seems very confusing and outdated, without examples. I need to create a monthly evolution chart outside Azure Portal.
What is the right way to retrieve this information about monthly costs from Azure?
I already tried to use BillingManagementClient class, get_for_billing_period_by_billing_account method from ConsumptionManagementClient.balances, and now I'm trying to use usage_details.list method from ConsumptionManagementClient, but I'm receiving a strange duplicated data:
consumption_client = ConsumptionManagementClient(self.credential, self.subscription_id)
start_date = "2022-11-19T00:00:00.0000000Z"
end_date = "2022-11-20T00:00:00.0000000Z"
filters = f"properties/usageStart eq '{start_date}' and properties/usageEnd eq '{end_date}'"
consumption_list = consumption_client.usage_details.list(f"/subscriptions/{subscription_id}", None, filters)
for consumption_data in consumption_list:
print(f"date: {consumption_data.date} \nstart_date: {consumption_data.billing_period_start_date} \nend_date: {consumption_data.billing_period_end_date}\ncost: {consumption_data.cost} \n")
Script output:
date: 2022-11-20 00:00:00+00:00
start_date: 2022-11-11 00:00:00+00:00
end_date: 2022-12-10 00:00:00+00:00
cost: 0.658392
date: 2022-11-19 00:00:00+00:00
start_date: 2022-11-11 00:00:00+00:00
end_date: 2022-12-10 00:00:00+00:00
cost: 0.658392
date: 2022-11-19 00:00:00+00:00
start_date: 2022-11-11 00:00:00+00:00
end_date: 2022-12-10 00:00:00+00:00
cost: 0.67425593616
date: 2022-11-20 00:00:00+00:00
start_date: 2022-11-11 00:00:00+00:00
end_date: 2022-12-10 00:00:00+00:00
cost: 0.67425593616

how does featuretools DiffDatetimes work within the dfs?

I've got the following dataset:
where:
customer id represents a unique customer
each customer has multiple invoices
each invoice is marked by a unique identifier (Invoice)
each invoice has multiple items (rows)
I want to determine the time difference between invoices for a customer. In other words, the time between one invoice and the next. Is this possible? and how should I do it with DiffDatetime?
Here is how I am setting up the entities:
es = ft.EntitySet(id="data")
es = es.add_dataframe(
dataframe=df,
dataframe_name="items",
index = "items",
make_index=True,
time_index="InvoiceDate",
)
es.normalize_dataframe(
base_dataframe_name="items",
new_dataframe_name="invoices",
index="Invoice",
copy_columns=["Customer ID"],
)
es.normalize_dataframe(
base_dataframe_name="invoices",
new_dataframe_name="customers",
index="Customer ID",
)
I tried:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="invoices",
agg_primitives=[],
trans_primitives=["diff_datetime"],
verbose=True,
)
And also changing the target dataframe to invoices or customers, but none of those work.
The df that I am trying to work on looks like this:
es["invoices"].head()
And what I want can be done with pandas like this:
es["invoices"].groupby("Customer ID")["first_items_time"].diff()
which returns:
489434 NaT
489435 0 days 00:01:00
489436 NaT
489437 NaT
489438 NaT
...
581582 0 days 00:01:00
581583 8 days 01:05:00
581584 0 days 00:02:00
581585 10 days 20:41:00
581586 14 days 02:27:00
Name: first_items_time, Length: 40505, dtype: timedelta64[ns]
Thank you for your question.
You can use the groupby_trans_primitives argument in the call to dfs.
Here is an example:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="invoices",
agg_primitives=[],
groupby_trans_primitives=["diff_datetime"],
return_types="all",
verbose=True,
)
The return_types argument is required since DiffDatetime returns a Feature with Timedelta logical type. Without specifying return_types="all", DeepFeatureSynthesis will only return Features with numeric, categorical, and boolean data types.

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Hi I'm trying to perform a fuzzy matcher code but the following error appears:
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
this happens even the type of the columns I want to match are the same, the variables in question are P009_01, P009_2, P009_03 on the left and apellido1, apellido2, names on the right dataframe.
df1.dtypes
P001 int64
P002 int64
P003 int64
P017 object
P020_01 float64
WP109 int64
WP114 int64
WP115 int64
P009_01 object
P009_02 object
P009_03 object
tenure_security float64
inv2 object
P086_01 object
inv7 object
inv11 object
P121 float64
c_ubigeo int64
dtype: object
and the other one:
df2.dtypes
COD_PREDIO object
AREA_HA float64
DEPARTAMENTO object
PREDIO object
NUM_PREDIO int64
ID_DIST int64
ID_PROV int64
P001 int64
P009_01 object
P009_02 object
P009_03 object
P002 int64
P003 int64
p_ubigeo int64
ubigeo int64
apellido1 object
apellido2 object
nombres object
PETT_0 int64
dtype: object
the variables needed to match are from the same type so I don't see where the problem is, can anyone spot it?
and here's the code:
matched_results = fuzzymatcher.fuzzy_left_join(df1,
df2,
left_on,
right_on,
left_id_col = 'P003',
right_id_col = 'ID_DIST')
Am I missing something?
Thanks in advance!

raise OverflowError("Overflow in int64 addition") OverflowError: Overflow in int64 addition

I am trying to calculating total age through minadmit and date of birth columns.
I tried this :
patient_admission['minadmit'] = pd.to_datetime(patient_admission['minadmit'], infer_datetime_format=True)
patient_admission['DOB'] = pd.to_datetime(patient_admission['DOB'], infer_datetime_format=True)
print("*******************")
print(patient_admission['minadmit'])
print("*******************")
print(patient_admission['DOB'])
And this is the result :
*******************
0 2149-12-17 20:41:00
1 2149-12-17 20:41:00
2 2149-12-17 20:41:00
3 2188-11-12 09:22:00
4 2110-07-27 06:46:00
...
58971 2111-09-30 12:04:00
58972 2161-07-15 12:00:00
58973 2135-01-06 07:15:00
58974 2129-01-03 07:15:00
58975 2149-06-08 15:21:00
Name: minadmit, Length: 58976, dtype: datetime64[ns]
*******************
0 2075-03-13
1 2075-03-13
2 2075-03-13
3 2164-12-27
4 2090-03-15
...
58971 2026-05-25
58972 2124-07-27
58973 2049-11-26
58974 2076-07-25
58975 2098-07-25
Name: DOB, Length: 58976, dtype: datetime64[ns]
After that, I just write this :
patient_admission['age'] = list(map(lambda x: x.days , (patient_admission['minadmit'] - patient_admission['DOB'])/365.242 ))
I have this error :
raise OverflowError("Overflow in int64 addition") OverflowError: Overflow in int64 addition
What is the cause of this error, and how can can I fix it.
Try this
patient_admission['date_of_admission'] = pd.to_datetime(patient_admission['date_of_admission']).dt.date
patient_admission['DOB'] = pd.to_datetime(patient_admission['DOB']).dt.date
Along with it not sure if this will fix but you may need to use x.dt.days in your lambda function

How to manually link in Brightway2 an imported exchange, given I have found the correct one in ecoinvent

I have been linking my data automatically with
import functools
from bw2io.strategies import link_iterable_by_fields
sp.apply_strategy(functools.partial(
link_iterable_by_fields,
other=Database("ecoinvent 3.2 cutoff"),
kind="technosphere",
fields=["reference product", "name", "unit", "location"]
))
sp.statistics()
When I list the remaining unlinked datasets with
bw2io.importers.simapro_csv.SimaProCSVImporter
it outputs e.g.:
Electricity, low voltage {ENTSO-E}| market group for | Alloc Rec, U kilowatt hour ('Electricity/heat',)
Given that I found the dataset in ecoinvent:
'market group for electricity, low voltage' (kilowatt hour, ENTSO-E, None)
How do I link these datasets together?
This is a dataset from ecoinvent 3.2, for which bw2io does not yet have the migration data for the "special" SimaPro names. Normally conversion from Simapro names (e.g. Electricity, low voltage {ENTSO-E}| market group for | Alloc Rec, U) to ecoinvent activity names and reference products would be handled by the migration simapro-ecoinvent-3. But this doesn't work in this case:
In [4]: Migration('simapro-ecoinvent-3').load()['Electricity, low voltage {ENTSO-E}| market group for | Alloc Rec, U']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
You can write your own migration:
migration_data = {
'fields': ['name'],
'data': [
(
# First element is input data in the order of `fields` above
('Electricity, low voltage {ENTSO-E}| market group for | Alloc Rec, U',),
# Second element is new values
{
'name': 'market group for electricity, low voltage',
'reference product': 'electricity, high voltage',
'location': 'ENTSO-E',
}
)
]
}
Migration("new-ecoinvent").write(
migration_data,
description="New datasets in ecoinvent 3.2"
)
And then apply this migration to your unlinked data:
sp.migrate("new-ecoinvent")
Migration only changes the data used to link; you will still have to apply link_iterable_by_fields to actually link against ecoinvent 3.2.

Resources