What is the value of nested runs in mlflow? I thought it would be that a child run inherits params of the parent, but I dont see that
with mlflow.start_run(run_name='myrun'):
mlflow.log_param('kl', '0p0')
mlflow.log_param('name', 'ios')
mlflow.log_metric('mu', 1.0)
with mlflow.start_run(run_name='myrun2', nested=True):
mlflow.log_param('name', 'weighted')
mlflow.log_metric('mu', 2.0)
if I collect the run info in python
df = mlflow.search_runs()
then we have
df['params.kl']
giving
0 None
1 0p0
Name: params.kl, dtype: object
From my understanding, the reason for nested runs are to track a collection of model training within a single run. This would have the structure: experiment --> run 1, run 2, run 3, ... --> run 1-1, run 1-2, run 2-1, run 2-2, run 3-1, run 3-2,...
In other words, the parent/outer mlflow.start_run generates a mlflow experiment entry (first-level run); the child/nested mlflow.start_run generates run-entry (second-level run).
Related
I have two GPUs and when I run
import torch
print('count: ', torch.cuda.device_count()) # prints count: 2
However, my model throws an error
RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1
on the line
torch.load(model_path, map_location='cuda:1')
What could cause it and how to fix it?
This issue is somehow linked to my Flask, because the training itself works with torch.load(model_path, map_location='cuda:1')
This is a known Flask-CUDA issue. Please run Flask with it with
print('count: ', torch.cuda.device_count()) and check if you see
count: 2
reloading
count: 1
If so, add app.run(... , use_reloader=False)
You say:
print('count: ', torch.cuda.device_count()) # prints count: 2
But the error says:
torch.cuda.device_count() is 1
Could you confirm that your run the two in the same worker?
edit: According to the message I had when trying to assign in wrong GPU, it could be due to asynchronous process calls. You may debug with os.environ['CUDA_LAUNCH_BLOCKING']='1'.
I am implementing an anomaly detection web service using MLflow and sklearn.pipeline.Pipeline(). The aim of the model is to detect web crawlers using server log and response_length column is one of my features. After serving model, for testing the web service I send below request that contains the 20 first columns of the train data.
$ curl --location --request POST '127.0.0.1:8000/invocations'
--header 'Content-Type: text/csv' \
--data-binary 'datasets/test.csv'
But response of the web server has status code 400 (BAD REQUEST) and this JSON body:
{
"error_code": "BAD_REQUEST",
"message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}
Here is the model compilation MLflow Tracking component log:
[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total= 4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total= 0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370
---------- logged params ----------
{'memory': 'None',
'pca': 'PCAEstimator()',
'rule_based': 'RuleBasedEstimator()',
'steps': "[('transform', <log_transformer.LogTransformer object at "
"0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
'RuleBasedEstimator())]',
'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
'verbose': 'True'}
---------- logged metrics ----------
{}
---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}
---------- logged artifacts ----------
['model/MLmodel',
'model/conda.yaml',
'model/model.pkl',
'model/requirements.txt']
Could anyone tell me exactly how I can fix this model serve problem?
The problem caused by mlflow.utils.autologging_utils WARNING.
When the model is created, data input signature is saved on the MLmodel file with some.
You should change response_length signature input type from string to double by replacing
{"name": "response_length", "type": "double"}
instead of
{"name": "response_length", "type": "string"}
so it doesn't need to be converted. After serving the model with edited MLmodel file, the web server worked as expected.
Working with search console api,
made it through the basics.
Now i'm stuck on splitting and arranging the data:
When trying to split, i'm getting a NaN, nothing i try works.
46 ((174.0, 3753.0, 0.04636290967226219, 7.816147...
47 ((93.0, 2155.0, 0.0431554524361949, 6.59025522...
48 ((176.0, 4657.0, 0.037792570324243074, 6.90251...
49 ((20.0, 1102.0, 0.018148820326678767, 7.435571...
50 ((31.0, 1133.0, 0.02736098852603707, 8.0935569...
Name: test, dtype: object
When trying to manipulate the data like this (and similar interactions):
data=source['test'].tolist()
data
Its clear that the data is not really available...
[<searchconsole.query.Report(rows=1)>,
<searchconsole.query.Report(rows=1)>,
<searchconsole.query.Report(rows=1)>,
<searchconsole.query.Report(rows=1)>,
<searchconsole.query.Report(rows=1)>]
Anyone have an idea how can i interact with my data ?
Thanks.
for reference, this is the code and the program i work with:
account = searchconsole.authenticate(client_config='client_secrets.json', credentials='credentials.json')
webproperty = account['https://www.example.com/']
def APIsc(date,keyword):
results=webproperty.query.range(date, days=-30).filter('query', keyword, 'contains').get()
return results
source['test']=source.apply(lambda x: APIsc(x.date, x.keyword), axis=1)
source
made by: https://github.com/joshcarty/google-searchconsole
I'm using TitanGraphDB + Cassandra. I'm starting Titan as follows
cd titan-cassandra-0.3.1
bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties
I have a Rexster shell that I can use to communicate to Titan + Cassandra above.
cd rexster-console-2.3.0
bin/rexster-console.sh
I'm attempting to model a network topology using Titan Graph DB. I want to program the Titan Graph DB from my python program. I'm using bulbs package for that.
I create five types of vertices
- switch
- port
- device
- flow
- flow_entry
I create edges between vertices that are connected logically. The edges are not labelled.
Let us say I want to test the connectivity between Vertex A and Vertex B
I have a groovy script is_connected.groovy
def isConnected (portA, portB) {
return portA.both().retain([portB]).hasNext()
}
Now from my rexster console
g = rexster.getGraph("graph")
==>titangraph[embeddedcassandra:null]
rexster[groovy]> g.V('type', 'flow')
==>v[116]
==>v[100]
rexster[groovy]> g.V('type', 'flow_entry')
==>v[120]
==>v[104]
As you can see above I have two vertices of type flow v[116] and v[100]
I have two vertices of type flow_entry v[120] and v[104]
I want to check for the connectivity between v[120] and v[116] for e.g
rexster[groovy]> ?e is_connected.groovy
==>null
rexster[groovy]> is_connected(g.v[116],g.v[120])
==>An error occurred while processing the script for language [groovy]. All transactions across all graphs in the session have been concluded with failure: java.util.concurrent.ExecutionException: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: v for class: com.thinkaurelius.titan.graphdb.database.StandardTitanGraph
Either I am doing something very wrong,or I am missing something obvious.It would be great if you could point me in the right direction.
This syntax is not valid groovy:
is_connected(g.v[116],g.v[120])
should be:
is_connected(g.v(116),g.v(120))
You're mixing up Python syntax with Gremlin-Groovy syntax:
You defined the Groovy script as:
def isConnected (portA, portB) {
return portA.both().retain([portB]).hasNext()
}
...so...
rexster[groovy]> is_connected(g.v[116], g.v[120])
...should be...
rexster[groovy]> isConnected(g.v(116), g.v(120))
I am using sVM-light for binary classification.and I am using SVM in the learning mode.
I have my train.dat file ready.but when i run this command ,instead of creating file model ,it writes somethings in terminal:
my command:
./svm_learn example1/train.dat example1/model
output:
Scanning examples...done
Reading examples into memory...Feature numbers must be larger or equal to 1!!!
: Success
LINE: -1 0:1.0 6:1.0 16:1.0 18:1.0 28:1.0 29:1.0 31:1.0 48:1.0 58:1.0 73:1.0 82:1.0 93:1.0 95:1.0 106:1.0 108:1.0 118:1.0 121:1.0 122:1.0151:1.0 164:1.0 167:1.0 169:1.0 170:1.0 179:1.0 190:1.0 193:1.0 220:1.0 237:1.0250:1.0 252:1.0 267:1.0 268:1.0 269:1.0 278:1.0 283:1.0 291:1.0 300:1.0 305:1.0320:1.0 332:1.0 336:1.0 342:1.0 345:1.0 348:1.0 349:1.0 350:1.0 368:1.0 370:1.0384:1.0 390:1.0 394:1.0 395:1.0 396:1.0 397:1.0 400:1.0 401:1.0 408:1.0 416:1.0427:1.0 433:1.0 435:1.0 438:1.0 441:1.0 446:1.0 456:1.0 471:1.0 485:1.0 510:1.0523:1.0 525:1.0 526:1.0 532:1.0 540:1.0 553:1.0 567:1.0 568:1.0 581:1.0 583:1.0604:1.0 611:1.0 615:1.0 616:1.0 618:1.0 623:1.0 624:1.0 626:1.0 651:1.0 659:1.0677:1.0 678:1.0 683:1.0 690:1.0 694:1.0 699:1.0 713:1.0 714:1.0 720:1.0 722:1.0731:1.0 738:1.0 755:1.0 761:1.0 763:1.0 768:1.0 776:1.0 782:1.0 792:1.0 817:1.0823:1.0 827:1.0 833:1.0 834:1.0 838:1.0 842:1.0 848:1.0 851:1.0 863:1.0 867:1.0890:1.0 900:1.0 903:1.0 923:1.0 935:1.0 942:1.0 946:1.0 947:1.0 949:1.0 956:1.0962:1.0 965:1.0 968:1.0 983:1.0 986:1.0 987:1.0 990:1.0 998:1.0 1007:1.0 1014:1.0 1019:1.0 1022:1.0 1024:1.0 1029:1.0 1030:1.01032:1.0 1047:1.0 1054:1.0 1063:1.0 1069:1.0 1076:1.0 1085:1.0 1093:1.0 1098:1.0 1108:1.0 1109:1.01116:1.0 1120:1.0 1133:1.0 1134:1.0 1135:1.0 1138:1.0 1139:1.0 1144:1.0 1146:1.0 1148:1.0 1149:1.01161:1.0 1165:1.0 1169:1.0 1170:1.0 1177:1.0 1187:1.0 1194:1.0 1212:1.0 1214:1.0 1239:1.0 1243:1.01251:1.0 1257:1.0 1274:1.0 1278:1.0 1292:1.0 1297:1.0 1304:1.0 1319:1.0 1324:1.0 1325:1.0 1353:1.01357:1.0 1366:1.0 1374:1.0 1379:1.0 1392:1.0 1394:1.0 1407:1.0 1412:1.0 1414:1.0 1419:1.0 1433:1.01435:1.0 1437:1.0 1453:1.0 1463:1.0 1464:1.0 1469:1.0 1477:1.0 1481:1.0 1487:1.0 1506:1.0 1514:1.01519:1.0 1526:1.0 1536:1.0 1549:1.0 1551:1.0 1553:1.0 1561:1.0 1569:1.0 1578:1.0 1603:1.0 1610:1.01615:1.0 1617:1.0 1625:1.0 1638:1.0 1646:1.0 1663:1.0 1666:1.0 1672:1.0 1681:1.0 1690:1.0 1697:1.01699:1.0 1706:1.0 1708:1.0 1717:1.0 1719:1.0 1732:1.0 1737:1.0 1756:1.0 1766:1.0 1771:1.0 1789:1.01804:1.0 1805:1.0 1808:1.0 1814:1.0 1815:1.0 1820:1.0 1824:1.0 1832:1.0 1841:1.0 1844:1.0 1852:1.01861:1.0 1875:1.0 1899:1.0 1902:1.0 1904:1.0 1905:1.0 1917:1.0 1918:1.0 1919:1.0 1921:1.0 1926:1.01934:1.0 1937:1.0 1942:1.0 1956:1.0 1965:1.0 1966:1.0 1970:1.0 1971:1.0 1980:1.0 1995:1.0 2000:1.02009:1.0 2010:1.0 2012:1.0 2015:1.0 2018:1.0 2022:1.0 2047:1.0 2076:1.0 2082:1.0 2095:1.0 2108:1.02114:1.0 2123:1.0 2130:1.0 2133:1.0 2141:1.0 2142:1.0 2143:1.0 2148:1.0 2157:1.0 2160:1.0 2162:1.02170:1.0 2195:1.0 2199:1.0 2201:1.0 2202:1.0 2205:1.0 2211:1.0 2218:1.0
I dont know what to do.
p.s.when i make my train.dat very shorter ,everything works fine!!!
Thank you
From what I could interpret from the log, your training set has an issue.
The first few characters of the training row that has issue are
-1 0:1.0 6:1.0
The issue is not with the size but with feature indexing. You are starting your feature index at 0 (0:1) whereas svmlight requires that all feature index be equal or greater than 1.
Change the indexing to start at 1 and it should work fine.