Would like to check how to convert utc time to GMT +8 timezone in Presto sql?
http://teradata.github.io/presto/docs/127t/functions/datetime.html suggests using current_timezone but I am not sure how.
I have a Hive table which contains a timestamp field and it can have any timezone ..(UTC/PST/CST....)
I want to convert all of them to a single timestamp, EST. it can be done either in Hive or Pyspark.
Basically, i am using it in my pyspark application which has a grouping logic on this datetime field and before doing that we want to have all the times in Hive table to be converted to EST time.
Sid
Mention to the facts which HIV Timezone have limitation on maximum time associates to Y2K38 bugs and JDBC compatibility issue,
TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision using both LazyBinary and LazySimple SerDes.
For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp parsable strings.
HIV-2272
Here is simulation associates to supporting timestamps earlier than 1970 and later than 2038.
Hive JDBC doesn't support TIMESTAMP column
Therefore, I think will be better if you are using HIV DataType of Date Type or String Type. Then you can use any timezone offset as the default on persistent.
* utc_timestamp is the column name */
/* bellow will convert a timestamp in UTC to EST timezone */
select from_utc_timestamp(utc_timestamp, 'EST') from table1;
Hope this helps.
HIV Data Types
Sidd, usually Hive uses the local timezone of the host where the data was written. The function from_utc_timestamp() and to_utc_timestamp can we very helpful. Instead of stating the timezone as UTC/EST you should rather use location/region in that case, since this will account for the day light savings.
Here's a helpful link for more examples: Local Time Convert To UTC Time In Hive
In case you have further questions, please share what have you already tried and share a sample snippet of your data for investigating further.
We're bulk inserting records into Hyperledger Fabric. However, we are hitting time out issue. Even we keep increasing the timeout, we will simply have this error happening at a later point.
Each transaction inserts 1000 records using PutState in a loop for all those records (blind insert, nothing in the read-set). We have also increased BatchTimeout to 3s and MaxMessageCount to 100, so that we get larger blocks (we see 4 transactions per block so 4000 [4x1000 records per transaction] records being inserted into ledger with every block).
When the bulk_update fails for CouchDB and the peer has to retry for each (of the 1000 records per transaction) records separately, the queries takes too long and overshoots the timeout. But this is our assumption. Also we found this : https://jira.hyperledger.org/browse/FAB-10558 , but it says it was already fixed in v1.2.0, which is the version we are using.
The error we got is net/http: request canceled (Client.Timeout exceeded while reading body) from the logs below:
We tried setting the following environment variable in the peer container:
CORE_CHAINCODE_EXECUTETIMEOUT=120s
And also req.setProposalWaitTime(120 * 1000) when using the Java SDK.
But then, we just get the same timeout error at a later point. So we can keep increasing timeout variable to a bigger number, but we believe it will happen again at a later point. Is the time required for inserting to CouchDB proportional to the number of records in CouchDB? Maybe updating the index will take more time when number of documents increase?
The runtime error log that we get (after 2-4 million or so records have been inserted) is as below:
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/core/committer.(*LedgerCommitter).CommitWithPvtData(0xc4222db8c0, 0xc451e4f470, 0xc4312ddd40, 0xdf8475800)
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/gossip/state.(*GossipStateProviderImpl).deliverPayloads(0xc4220c5a00)
October 5th 2018, 04:36:38.646 goroutine 283 [running]:
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/core/committer/committer_impl.go:105 +0x6b
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/privdata/coordinator.go:236 +0xc3b
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/state/state.go:771 +0x6c
October 5th 2018, 04:36:38.646
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/core/ledger/kvledger.(*kvLedger).CommitWithPvtData(0xc421fb1860, 0xc451e4f470, 0x0, 0x0)
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/gossip/privdata.(*coordinator).StoreBlock(0xc422286e60, 0xc42462cd80, 0x0, 0x0, 0x0, 0xc4312dde78, 0x7329db)
October 5th 2018, 04:36:38.646 github.com/hyperledger/fabric/gossip/state.(*GossipStateProviderImpl).commitBlock(0xc4220c5a00, 0xc42462cd80, 0x0, 0x0, 0x0, 0x0, 0x0)
October 5th 2018, 04:36:38.646 panic: Error during commit to txmgr:net/http: request canceled (Client.Timeout exceeded while reading body)
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/core/ledger/kvledger/kv_ledger.go:273 +0x870
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/state/state.go:558 +0x3c5
October 5th 2018, 04:36:38.646 /opt/gopath/src/github.com/hyperledger/fabric/gossip/state/state.go:239 +0x681
October 5th 2018, 04:36:38.646 created by github.com/hyperledger/fabric/gossip/state.NewGossipStateProvider
October 5th 2018, 04:36:03.645 2018-10-04 20:36:00.783 UTC [kvledger] CommitWithPvtData -> INFO 466e[0m Channel [mychannel]: Committed block [1719] with 4 transaction(s)
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.807 UTC [statecouchdb] commitUpdates -> WARN 465c[0m CouchDB batch document update encountered an problem. Retrying update for document ID:32216027-da66-4ecd-91a1-a37bdf47f07d
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.866 UTC [statecouchdb] commitUpdates -> WARN 4663[0m CouchDB batch document update encountered an problem. Retrying update for document ID:6eaed2ae-e5c4-48b1-b063-20eb3009969b
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.870 UTC [statecouchdb] commitUpdates -> WARN 4664[0m CouchDB batch document update encountered an problem. Retrying update for document ID:2ca2fbcc-e78f-4ed0-be70-2c4d7ecbee69
October 5th 2018, 04:35:56.644 [33m2018-10-04 20:35:55.904 UTC [statecouchdb] commitUpdates -> WARN 4667[0m CouchDB batch document update encountered an problem. ... and so on
[33m2018-10-04 20:35:55.870 UTC [statecouchdb] commitUpdates -> WARN 4664[0m CouchDB batch document update encountered an problem. Retrying update for document ID:2ca2fbcc-e78f-4ed0-be70-2c4d7ecbee69
The above suggests that POST http://localhost:5984/db/_bulk_docks failed and so the individual documents were tried separately.
Looking at the different parameters available to configure, increasing requestTimeout under the ledger section might be worth a shot.
This can be done by setting the following environment variable in the docker-compose for your peer container :
CORE_LEDGER_STATE_COUCHDBCONFIG_REQUESTTIMEOUT=100s
The name of the environment variable associated to a configuration parameter can be derived by looking at this answer.
Configuring CORE_CHAINCODE_EXECUTETIMEOUT and proposalWaitime might not have had an effect as some other connection downstream (here it being the http connection between peer and couchdb) was timing out and then the timeout exception being propagated up.
I noticed myrrix creates a file within a tmp directory that is like a model.
Can I start myrrix with this information in order to save time and not have to re-ingest the data.
Sat Jan 18 10:03:09 EST 2014 INFO Writing model to /tmp/DelegateGenerationManager7633240206665163912.bin.gz
Sat Jan 18 10:03:55 EST 2014 INFO Done, moving into place at /tmp/1390056408253-0/model.bin.gz
Sat Jan 18 10:03:57 EST 2014 INFO Pruning old entries...
Sat Jan 18 10:03:57 EST 2014 INFO Recomputing generation state...
Yes, you can. You just put a model.bin.gz in the local directory that it's running over. That could be a model you saved separately. You could also create one manually, although that would require some hacking on the code to serialize your own model.
I am okay wit the example data set from audioscrobbler, which is totals in 75K users and 50K items. But mine is to tiny ,since I am in the start of the road. So will be happy to know what are the minimum data set used in Myrrix. The reason of asking that is a warning:
INFO: Converged
Aug 14, 2013 10:15:41 PM net.myrrix.online.generation.DelegateGenerationManager$RefreshCallable runFactorization
INFO: Factorization complete
Aug 14, 2013 10:15:41 PM net.myrrix.online.generation.Generation recomputeSolver
WARNING: X'*X or Y'*Y has small inf norm (0.9254986853162671); try decreasing model.als.lambda
Aug 14, 2013 10:15:41 PM net.myrrix.online.generation.DelegateGenerationManager$RefreshCallable call
WARNING: Unable to compute a valid generation yet; waiting for more data
thank you for everybody who can assist
I was able to ingest a file with 10 lines of associations only.
By the way, Myrrix is migrating to Oryx now, you may ask Sean Owen on https://groups.google.com/a/cloudera.org/forum/#!forum/oryx-user