Can you start Myrrix with pre-calculated model? - myrrix

I noticed myrrix creates a file within a tmp directory that is like a model.
Can I start myrrix with this information in order to save time and not have to re-ingest the data.
Sat Jan 18 10:03:09 EST 2014 INFO Writing model to /tmp/DelegateGenerationManager7633240206665163912.bin.gz
Sat Jan 18 10:03:55 EST 2014 INFO Done, moving into place at /tmp/1390056408253-0/model.bin.gz
Sat Jan 18 10:03:57 EST 2014 INFO Pruning old entries...
Sat Jan 18 10:03:57 EST 2014 INFO Recomputing generation state...

Yes, you can. You just put a model.bin.gz in the local directory that it's running over. That could be a model you saved separately. You could also create one manually, although that would require some hacking on the code to serialize your own model.

Related

Bulk insert into Hyperledger Fabric keeps timing out

We're bulk inserting records into Hyperledger Fabric. However, we are hitting time out issue. Even we keep increasing the timeout, we will simply have this error happening at a later point.
Each transaction inserts 1000 records using PutState in a loop for all those records (blind insert, nothing in the read-set). We have also increased BatchTimeout to 3s and MaxMessageCount to 100, so that we get larger blocks (we see 4 transactions per block so 4000 [4x1000 records per transaction] records being inserted into ledger with every block).
When the bulk_update fails for CouchDB and the peer has to retry for each (of the 1000 records per transaction) records separately, the queries takes too long and overshoots the timeout. But this is our assumption. Also we found this : https://jira.hyperledger.org/browse/FAB-10558 , but it says it was already fixed in v1.2.0, which is the version we are using.
The error we got is net/http: request canceled (Client.Timeout exceeded while reading body) from the logs below:
We tried setting the following environment variable in the peer container:
CORE_CHAINCODE_EXECUTETIMEOUT=120s
And also req.setProposalWaitTime(120 * 1000) when using the Java SDK.
But then, we just get the same timeout error at a later point. So we can keep increasing timeout variable to a bigger number, but we believe it will happen again at a later point. Is the time required for inserting to CouchDB proportional to the number of records in CouchDB? Maybe updating the index will take more time when number of documents increase?
The runtime error log that we get (after 2-4 million or so records have been inserted) is as below:
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/core/committer.(*LedgerCommitter).CommitWithPvtData(0xc4222db8c0, 0xc451e4f470, 0xc4312ddd40, 0xdf8475800)
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/gossip/state.(*GossipStateProviderImpl).deliverPayloads(0xc4220c5a00)
October 5th 2018, 04:36:38.646 goroutine 283 [running]:
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/core/committer/committer_impl.go:105 +0x6b
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/privdata/coordinator.go:236 +0xc3b
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/state/state.go:771 +0x6c
October 5th 2018, 04:36:38.646
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/core/ledger/kvledger.(*kvLedger).CommitWithPvtData(0xc421fb1860, 0xc451e4f470, 0x0, 0x0)
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/gossip/privdata.(*coordinator).StoreBlock(0xc422286e60, 0xc42462cd80, 0x0, 0x0, 0x0, 0xc4312dde78, 0x7329db)
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/gossip/state.(*GossipStateProviderImpl).commitBlock(0xc4220c5a00, 0xc42462cd80, 0x0, 0x0, 0x0, 0x0, 0x0)
October 5th 2018, 04:36:38.646 panic: Error during commit to txmgr:net/http: request canceled (Client.Timeout exceeded while reading body)
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/core/ledger/kvledger/kv_ledger.go:273 +0x870
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/state/state.go:558 +0x3c5
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/state/state.go:239 +0x681
October 5th 2018, 04:36:38.646 created by github.com/hyperledger/fabric/gossip/state.NewGossipStateProvider
October 5th 2018, 04:36:03.645 2018-10-04 20:36:00.783 UTC [kvledger] CommitWithPvtData -> INFO 466e[0m Channel [mychannel]: Committed block [1719] with 4 transaction(s)
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.807 UTC [statecouchdb] commitUpdates -> WARN 465c[0m CouchDB batch document update encountered an problem. Retrying update for document ID:32216027-da66-4ecd-91a1-a37bdf47f07d
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.866 UTC [statecouchdb] commitUpdates -> WARN 4663[0m CouchDB batch document update encountered an problem. Retrying update for document ID:6eaed2ae-e5c4-48b1-b063-20eb3009969b
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.870 UTC [statecouchdb] commitUpdates -> WARN 4664[0m CouchDB batch document update encountered an problem. Retrying update for document ID:2ca2fbcc-e78f-4ed0-be70-2c4d7ecbee69
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.904 UTC [statecouchdb] commitUpdates -> WARN 4667[0m CouchDB batch document update encountered an problem. ... and so on
[33m2018-10-04 20:35:55.870 UTC [statecouchdb] commitUpdates -> WARN 4664[0m CouchDB batch document update encountered an problem. Retrying update for document ID:2ca2fbcc-e78f-4ed0-be70-2c4d7ecbee69
The above suggests that POST http://localhost:5984/db/_bulk_docks failed and so the individual documents were tried separately.
Looking at the different parameters available to configure, increasing requestTimeout under the ledger section might be worth a shot.
This can be done by setting the following environment variable in the docker-compose for your peer container :
CORE_LEDGER_STATE_COUCHDBCONFIG_REQUESTTIMEOUT=100s
The name of the environment variable associated to a configuration parameter can be derived by looking at this answer.
Configuring CORE_CHAINCODE_EXECUTETIMEOUT and proposalWaitime might not have had an effect as some other connection downstream (here it being the http connection between peer and couchdb) was timing out and then the timeout exception being propagated up.

Azure minute/hourly metric collection issue using SDK

hi i'm trying to get the hourly/minute metrics using the sdk provided by azure but i'm confused ,i got the below result isn't Partition key the Time?
PartitionKey=20170605T1531 RowKey=system;All TimeStamp=Mon Jun 05 21:02:24 IST 2017
PartitionKey=20170605T1533 RowKey=system;All TimeStamp=Mon Jun 05 21:04:23 IST 2017
PartitionKey=20170605T1539 RowKey=system;All TimeStamp=Mon Jun 05 21:10:24 IST 2017
PartitionKey=20170605T1540 RowKey=system;All TimeStamp=Mon Jun 05 21:11:24 IST 2017
Please explain the concept /difference in PartitionKey and TimeStamp?
Please explain the concept /difference in PartitionKey and TimeStamp?
In Azure Table service, each entity has 3 system properties that specify a partition key, a row key, and a timestamp column that the Table service uses to track when the entity was last updated (this happens automatically and you cannot manually overwrite the timestamp with an arbitrary value).

Node.js and MongoDB Time Zone Issue UTC not being converted correctly by driver?

I have a strange thing occurring and I hope someone can point out what i am missing.
In MongoDB I have a field DT that is of Type Date
An example of what the date looks like in MongoDB is 2014-10-01 10:28:04.329-04:00
When I query MongoDB from Node.js using MongoClient, Node.js is returning this:
2014-10-01T14:28:04.329Z
As i understand it the driver is suppose to convert UTC to local time. In my case it should be Eastern Time (EDT). Why would Node be adding 4 hours instead?
I am loading the date into MongoDB from Java using the Java driver. The variable is set using
new Date();
Node isn't adding 4 hours. Both show exactly the same instant.
2014-10-01 10:28:04.329-04:00
is exactly the same as
2014-10-01T14:28:04.329Z
only one is in a EDT timezone which has -04:00 offset to UTC (so it's four hours earlier there), and the other is in UTC.
Probably you have your server configured in EDT and your client is set to UTC or the other way around.
Unless you need the exact same strings, I wouldn't worry about it.
Or, even better, set both the client and server machine to the same timezone, preferably UTC.

How to convert UTC Date Time to Local Date time without TimeZoneInfo class?

i want to convert UTC date time to local date time by myself and do not want to use .net TimeZoneInfo or other classs about this.
i know Tehran is a GMT offset of +03:30 i use code below to convert UTC Date time to tehran (my local computer is in this location):
DateTime dt = DateTime.UtcNow.AddHours(3.30);
it shows time like 5/2/2014 8:32:05 PM but Tehran time is 5/2/2014 9:32:05 PM it has one Hour deference.
How can i fixed it?
i know Tehran is a GMT offset of +03:30
Well, that's its offset from UTC in standard time, but it's currently observing daylight saving time (details). So the current UTC offset is actually +04:30, hence the difference of an hour.
I suspect you're really off by more than an hour though, are you're adding an offset of 3.3 hours, which is 3 hours and 18 minutes. The literal 3.30 doesn't mean "3 hours and 30 minutes", it means 3.30 as a double literal. If you want 3 hours and 30 minutes, that's 3 and a half hours, so you'd need to use 3.5 instead. The time in Tehran when you posted was 9:46 PM... so I suspect you actually ran the code at 9:44 PM.
This sort of thing is why you should really, really, really use a proper time-zone-aware system rather than trying to code it yourself. Personally I wouldn't use TimeZoneInfo - I'd use my Noda Time library which allows you to either use the Windows time zones via TimeZoneInfo, or the IANA time zone database. The latter - also known as Olsen, or TZDB, or zoneinfo, is the most commonly-used time zone database on non-Windows platforms.

What is the minimum value dataset for MYRRIX SERVING LAYER

I am okay wit the example data set from audioscrobbler, which is totals in 75K users and 50K items. But mine is to tiny ,since I am in the start of the road. So will be happy to know what are the minimum data set used in Myrrix. The reason of asking that is a warning:
INFO: Converged
Aug 14, 2013 10:15:41 PM net.myrrix.online.generation.DelegateGenerationManager$RefreshCallable runFactorization
INFO: Factorization complete
Aug 14, 2013 10:15:41 PM net.myrrix.online.generation.Generation recomputeSolver
WARNING: X'*X or Y'*Y has small inf norm (0.9254986853162671); try decreasing model.als.lambda
Aug 14, 2013 10:15:41 PM net.myrrix.online.generation.DelegateGenerationManager$RefreshCallable call
WARNING: Unable to compute a valid generation yet; waiting for more data
thank you for everybody who can assist
I was able to ingest a file with 10 lines of associations only.
By the way, Myrrix is migrating to Oryx now, you may ask Sean Owen on https://groups.google.com/a/cloudera.org/forum/#!forum/oryx-user

Resources