Fabric 1.1 with external CouchDB saves data also inside peers. Why?

Fabric 1.1 with external CouchDB saves data also inside peers. Why? - hyperledger-fabric

I am saving files inside my ledger with fabric 1.1 and leveldb. As expected, this makes the peers' docker containers to fastly run out of space. I thought that changing to couchdb was going to fix the problem (it transfers the problem to the couchdb container, but I can handle that), but to my surprise, I've checked that using couchdb in fact saves data to the couchdb containers, but it also saves data inside the peers!. For example, uploading a 1,3MB file to my app, configured to use couchdb, also creates a "blockfile" in /var/hyperledger/production/ledgersData/chains/chains/mychannel of 1.3MB inside the involved peers. How can this be? Is it possible to disable this behaviour and save data only in couch containers? (or mounted volumes for this containers), is this a bug fixed in newer fabric versions?. If not possible, how can I configure bigger peers?.
I know I can change the solution to hash attachments, save only references to this hashes inside my ledger and store data in an external data store, but I'm working on a project with this requirement and changing the approach is not a possibility.
Thanks.

The peer has both a file-based ledger (the "blockchain") as well as a state database which holds / caches the last know value for any given key.
State can be stored in either goleveldb or in CouchDB. The ledger is always stored on the peer filesystem. (Note that goleveldb data files are also stored on the peer filesystem).
The location is set via peer.fileSystemPath in core.yaml and the default value is /var/hyperledger/production. You can mount an external volume for this as well if you want to store the files on the host and not inside the container filesystem.

Related

Hyperledger starting project for my own use case

As of now I have used fabric-samples repo and used network.sh to start network . They already have connection-org.yaml file which has necessary information.
When I need to use fabric for my app , I know I need to start fabric network right ? Then I need to also create channel and user into it . How do I do it ? Should I just copy and paste that network.sh from fabric-samples ? What about connection-org.yaml? I think all of them is hardcoded right ? What should I do about it ?
Every tutorial has prebuilt these things never explaining what they are. Any help would be heavily appreciated

As you have mentioned that you have used Fabric Repo, I am expecting you to to be familiar with the Hyperledger Fabric Blockchain Framework.
Following factors related to the network should be decided first.
Channel name.
How many and Which are the Organizations are participating in the Consortium,
How many peers per Organization?
Ordering Network would RAFT based, But how many orderer nodes ?
Whether state database would use Couch DB or LevelDB
How the MSP Crypto would be generated ( Is Fabric CA going to be used[ If yes, then own root certificate/rootCA ?] or Cryptogen Tool)
Once the above has been laid out, then the next step is to start coding the network script.
The images should be already loaded into the local docker repository, and the Fabric binaries should be available in a location accessible to the script. If the docker images are not loaded, then the machine should have connectivity to internet and then to docker-hub.
It would be good to start with a docker based network setup.
The network and persistent data stores ( docker network, ports and volumes) should be planned.
Once that is sorted out, the coding of the docker compose files could start. Following are the points to be noted during this step.
Create a single compose with all the organizations Or create individual compose files for each organizations. Take a look at the docker compose yaml files present along with the network.sh to get an idea.
Decide on the docker subnet ( network reference)
Provide the same network reference against each service / each
individual compose files.
Provide the env variables for the below items.
Map the MSP folders.
Decide on the SSL as applicable
Provide CouchDB ports(if applicable), Peer Ports, Gossip ports,
Orderer ports etc.
If planning to use cryptogen, then create the config files as per the Org structure. If its CA, then write commands as per the org structure.
Now again refer to the network.sh script and try to figure out how the crypto is generated ( as applicable to your choice). Also refer to the cleanup part from network.sh to understand how it is being done, what is being removed, and what is being retained.
Every time the script bombs, make sure that you cleanup and then start. ie, all the docker containers and volumes to be removed. You could retain your MSP cryptos if you want to.
Locate the command to create the channel, and adding peers to the channel.
The content from env.sh is a good example on how to set the environment variables needed within your script.
Once all the members have joined the channel, the setup the anchor peers per organizations.
Write a version of the script after referring to the example.
By the end of proper execution of the steps above, the script should be able to get a Hyperledger Fabric network up and running.

Are docker volumes better option for write heavy operations than binding directories directly?

Reading through docker documentation I found this passage (located here):
Block-level storage drivers such as devicemapper, btrfs, and zfs perform better for write-heavy workloads (though not as well as Docker
volumes).
So does this mean that one should always use docker volumes when expecting lot's of persistent writing?

The container-local filesystem never stores persistent data, so you don't have a choice but to mount something into the container if you want data to live on after the container exits. The "block-level storage drivers" you quote discuss particular install-time options for how images and containers are stored, and aren't related to any particular volume or bind-mount implementation.
As far as performance goes, my general expectation is that the latency of disk I/O will far outweigh any overhead of any particular implementation. Without benchmarking any particular implementation, on a native Linux host, I would expect a named volume, a bind-mount, and writes to the container filesystem to be more or less similar.
From a programming point of view, you will probably get better long-term performance improvement from figuring out how to have fewer disk accesses (for example, by grouping together related database requests into a single transaction) than by trying to optimize the Docker-level storage.
The one prominent exception to this is that bind mounts on MacOS are known to be very slow and you should avoid them if your workload involves substantial disk access. (This includes both reading and writing, and includes some interpreted languages that want to read in every possible source file at startup time.) If you're managing something like database storage where you can't usefully directly access the files anyways, use a named volume. For your application code, COPY it into an image in a Dockerfile and do not overwrite it at run time.

should always use docker volumes when expecting lot's of persistent writing?
It depends.
Yes you want some kind of external to the container storage for any persistent data since data written inside the container is lost when that container is removed.
Whether that should be a host bind or named volume depends on how you need to manage that data. A host volume is a bind mount to the host filesystem. It gives you direct access to that data, but that direct access also comes with uid/gid permission issues and losses the initialization feature of named volumes.
Named volumes with all the defaults is just a bind mount to a folder under /var/lib/docker, so performance would be the same as a host volume of the underlying filesystem is the same. That said the named volume can be configured to mount just about anything you can do with the mount command.
Since each of these options can have varying underlying filesystem, and the performance difference comes from that underlying filesystem choice, there's no way to answer this in any generic sense. Hence, it depends.

Hyperledger fabric where the blockchain state are saved

I using the hyperledger fabric network with 2 organisation. Now my question, where does the fabric store the blockchain state. Because i am facing the issue, when i turn down the organisation using the docker, All state of blockchain are vanished/delete. How i can keep the track of or save blockchain state, so i don't want to start the blockchain all the time from state zero. Please suggest me.
even for the fabric blockchain explorer also.

You will need to use persistent volumes to ensure that the data is not stored on the container filesystem else it will be destroyed when the container(s) are destroyed.
For peers, the two key attributes in core.yaml are:
peer.fileSystemPath - this defaults to /var/hyperledger/production and is where the ledger, installed chaincodes, etc are kept. The corresponding environment variable is CORE_PEER_FILESYSTEMPATH.
peer.mspConfigPath - where the local MSP info is stored. The corresponding environment variable is CORE_PEER_MSPCONFIGPATH.
For orderers, the two key attributes in orderer.yaml are:
FileLedger.Location - this defaults to /var/hyperledger/production/orderer and is where the channel ledgers are stored. The corresponding environment variable is ORDERER_FILELEDGER_LOCATION.
General.LocalMSPDir - where the local MSP info is stored. The corresponding environment variable is ORDERER_GENERAL_LOCALMSPDIR.

Easily, you can map the content of the docker outside the docker and save it.
In example, for orderer, all its content is inside /var/hyperledger/production/orderer.
You can map this folder outside in a local folder.
In this way you will see the content of the docker folder even without logging the docker bash.
Now you can copy this content in another folder, let's say backup.
When you re-create the docker, you can map the backup folder so that it will start with the previous content you had inside.

Cloning Couch DB data from one server to another through file systems (without replicator)

We have two nodes with couchDB installed. One of the nodes have data on it, we want to copy the data from that instance to another instance of couch db. We want to avoid replicator due to volume of the data.
We tried copying data from %couchdb%/data/shards and %couchdb%/data/.shards to corresponding locations of target node as per one of the suggestions from CouchDB backups and cloning the database
but not able to see the Data in the server Fauxton UI. Can someone suggest what is missing?

Couchtransform lets you convert or just clone data from one db to another, its multi threaded and won't need to deal with massive files.

Move docker data volume containers between CoreOS hosts

For some scenarios a clustered file system is just too much. This is, if I got it right, the use case for the data volume container pattern. But even CoreOS needs updates from time to time. If I'd still like to minimise the down time of applications, I'd have to move the data volume container with the app container to an other host, while the old host is being updated.
Are there best practices existing? A solution mentioned more often is the "backup" of a container with docker export on the old host and docker import on the new host. But this would include scp-ing of tar-files to an other host. Can this be managed with fleet?

#brejoc, I wouldn't call this a solution, but it may help:
Alternative
1: Use another OS, which does have clustering, or at least - doesn't prevent it. I am now experimenting with CentOS.
2: I've created a couple of tools that help in some use cases. First tool, retrieves data from S3 (usually artifacts), and is uni-directional. Second tool, which I call 'backup volume container', has a lot of potential in it, but requires some feedback. It provides a 2-way backup/restore for data, from/to many persistent data stores including S3 (but also Dropbox, which is cool). As it is implemented now, when you run it for the first time, it would restore to the container. From that point on, it would monitor the relevant folder in the container for changes, and upon changes (and after a quiet period), it would back up to the persistent store.
Backup volume container: https://registry.hub.docker.com/u/yaronr/backup-volume-container/
File sync from S3: https://registry.hub.docker.com/u/yaronr/awscli/
(docker run yaronr/awscli aws s3 etc etc - read aws docs)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string