Google Cloud Storage: Where do I get per-bucket statistics [closed] - statistics

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to pull stats on a per bucket basis. Is this possible?

(updated answer 2014/09/23 to reflect changes in the gsutil command)
gsutil du displays the amount of space (in bytes) being used by the
objects in hierarchy under a given URL.
s gives a summary total instead of the size of each object.
h prints human readable sizes instead of bytes.
So:
$ gsutil du -sh gs://BUCKET_NAME
261.46 GB gs://BUCKET_NAME
... gives the total size of objects in the bucket. However, it is calculated on request and can take a long time for buckets with many objects.
For production use, enable Access Logs & Storage Data. The storage data logs will give you the average size in bytes/per hour for each bucket for the previous day.
The access logs give details about each request to your logged buckets.
There is also information on loading the logs into BigQuery for analysis.

Delivery of access logs can be enabled per bucket as documented. When bucket logging is enabled, log files are written to the user defined logging bucket on a best effort hourly basis. You can pull log files from there and parse and count with your tool of choice. If you don't want to run analytics yourself on the raw logs you can use a service such as Qloudstat. (Disclaimer: I work for the company behind.)

Related

Azure Service Bus message time to live setting [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to ask what is the best practice for Azure Service Bus message TTL (time to live) option - https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-expiration.
We use Azure Service Bus to import data from one system to another, amount of records is a couple of millions.
Briefly saying, this option tells ASB how much time a message can stay in a queue or a topic before it moved to dead letter queue(if it is configured) - https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues#moving-messages-to-the-dlq.
Even so, I cannot find how TTL value impacts on ASB throughput and performance. What is difference between 5 minutes, 1 hour and 20 hours set for TTL in terms of ASB queue/topic performance?
Thank you in advance
Time to live property is used to set the expiration time window for messages in Service Bus.
Based the time configured for TTL, the messages either moved to dead-letter or lost from the Queue. The usage of this property may differ based on the use cases.
For example, if I am sure that my system will not go down and will pick the messages as soon as it is en-queued, I will configure the TTL to very minimal time window say 1 minute (helps to verify the system is working fine by monitoring the dead-letter length of the Queue). If my system is not reliable or the system runs only once a day to process the messages, then I should have a higher value for this property, so that the messages will be available in the Queue for a longer time, letting the system to process.
Coming to the performance, there will not be much lack in the performance in the Queue due to the higher values of TTL.

How to set partition in azure event hub consumer java code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
i want to know the purpose of the hostname in EventProcessHost and how to set partition in consumer side . right now i am able to get data from consumer group but all partitions goes to the output stream .
Questions:
1.How to set partition via code java.
2.Use of hostname in EventProcessHost
3.example for multi consumer each has it's own partition in java code.
I highly appreciate any help.
There is a complete Java example, see the docs
You don't need to set a partition when you use an EventProcessHost. Instead, each instance will lease a partition it will work on. So if you created the event hub using ,say 4 partitions, you should instantiate 4x EventProcessHost to get better troughput. See the linked docs as well:
This tutorial uses a single instance of EventProcessorHost. To increase throughput, it is recommended that you run multiple instances of EventProcessorHost, preferably on separate machines. This provides redundancy as well. In those cases, the various instances automatically coordinate with each other in order to load balance the received events.
Leases are given out for a specific time only. After that another receiver can take over that lease. If you give it a while you should notice all instances will retrieve data.
About the hostname:
When receiving events from different machines, it might be useful to specify names for EventProcessorHost instances based on the machines (or roles) in which they are deployed.

Azure Disk Data Lost [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I am new to Azure. I have created a VM. and stored some very important files on temp storage, but after few days the temp disk is formatted. is there any option I can get my data back ??
Eshant
Just wanted to provide you with clear guidance regarding the disks and why you lost data. There are 3 Disks with in Azure VM, the are as follows
C Drive 127 GB Dedicated for OS will persist after reboot. This disk is dedicated for OS and shouldn't be used for any other purpose.
D Drive is Temporary Drive is only intended for storing temporary data. As you notice it only has page file on it and is not recommended for storing data because it is wiped clean on Stop and start, resize of VM , un/planned maintenance, and service healing. One key benefit of this disk is performance. I/O performance for temporary disks is higher than the IO permanence to OS disks, Data Disks. The size of the disk varies with VM Size. In your case the data is lost and cannot be recovered.
Data Disk. You need to add Data disks for any type of custom storage that needs to be persisted. Another point being ,the difference between OS and data disks is that, while both reside in blob storage, the host caching settings are different by default - OS disk is Read/Write host caching by default, data disks are None host caching by default.
One key point is , the C:\ and D:\ cost is included in the VM Price , Data disk will be charged on actual usage. Say if you allocate 100 GB and use only 10 GB. Then you be charged only for the 10 GB.
Regards
Krishna
No, there's no way to recover your data.
From this Microsoft article: Understanding the temporary drive on Windows Azure Virtual Machines
http://blogs.msdn.com/b/wats/archive/2013/12/07/understanding-the-temporary-drive-on-windows-azure-virtual-machines.aspx
Is there a way to recover data from the temporary drive?
There is no way to recover any data from the temporary drive.

Increase Azure Data Disk size [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I currently attached a data disk of 1 GB to Windows Extra Small VM running on Azure.
How do I increase the size of data disk without losing data?
Is it possible to attach many more disks to Extra Small VM especially from other storage accounts?
It is possible to attach more disks into an Extra Small VM.
It isn't possible to increase the size of data disk in Disk Management. But with some extra steps you can accomplish that by downloading the VHD into a larger disk attached to a different VM, mount it and extend it, then put it back where it was and attach it again to the original VM.
Please check Drew's description here
How do I increase the size of data disk without losing data?
I don't think you can increase the size of the data disk.
Is it possible to attach many more disks to Extra Small VM especially
from other storage accounts?
You can attach another data disk. You can attach up to 16 data disks to a VM. Since the data disk is essentially a page blob, you are only charged for the space you occupy and not the actual size of the disk thus it is advisable to attach a larger size data disk so that you don't run out of disk space. Though I have not tried attaching data disks from different storage accounts but I can't see a reason why it should not be possible (they should be in same data center though). However you may want to keep all your OS and Data Disks in the same storage account for improved latency purposes. HTH.

How stable is s3fs to mount an Amazon S3 bucket as a local directory [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
How stable is s3fs to mount an Amazon S3 bucket as a local directory in linux? Is it recommended/stable for high demand production environments?
Are there any better/similar solutions?
Update: Would it be better to use EBS and to mount it via NFS to all other AMIs?
There's a good article on s3fs here, which after reading I resorted to an EBS Share.
It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:
no file can be over 5GB
you can't partially update a file so changing a single byte will re-upload the entire file.
operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.
It therefore depends on what you are storing whether s3fs is a feasible option. If you're storing say, photos, where you want to write an entire file or read an entire file never incrementally change a file, then its fine, although one may ask, if you're doing this, then why not just use S3's API Directly?
If you're talking about appliation data, (say database files, logging files) where you want to make small incremental change then its a definite no - S3 Just doesn't work that way you can't incrementally change a file.
The article mentioned above does talk about a similar application - s3backer - which gets around the performance issues by implementing a virtual filesystem over S3. This gets around the performance issues but itself has a few issues of its own:
High risk for data corruption, due to the delayed writes
too small block sizes (e.g., the 4K default) can add significant
extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)
too large block sizes can add significant data transfer and storage
fees.
memory usage can be prohibitive: by default it caches 1000 blocks.
With the default 4K block size that's not an issue but most users
will probably want to increase block size.
I resorted to EBS Mounted Drived shared from an EC2 instance. But you should know that although the most performant option it has one big problem
An EBS Mounted NFS Share has its own problems - a single point of failure; if the machine that's sharing the EBS Volume goes down then you lose access on all machines which access the share.
This is a risk I was able to live with and was the option I chose in the end.
This is an old question so I'll share my experience over the past year with S3FS.
Initially, it had a number of bugs and memory leaks (I had a cron-job to restart it every 2 hours) but with the latest release 1.73 it's been very stable.
The best thing about S3FS is you have one less things to worry about and get some performance benefits for free.
Most of your S3 requests are going to be PUT (~5%) and GET (~95%). If you don't need any post-processing (thumbnail generation for example). If you don't need any post-processing, you shouldn't be hitting your web server in the first place and uploading directly to S3 (using CORS).
Assuming you are hitting the server probably means you need to do some post-processing on images. With an S3 API you'll be uploading to the server, then uploading to S3. If the user wants to crop, you'll need to download again from S3, then re-upload to server, crop and then upload to S3. With S3FS and local caching turned on this orchestration is taken care of for you and saves downloading files from S3.
On caching, if you are caching to an ephemeral drive on EC2, you get a the performance benefits that come with out and can purge your cache without having to worry about anything. Unless you run out of disk space, you should have no reason to purge your cache. This makes traversing operations like searching and filtering much easier.
The one thing I do wish it has was full sync with S3 (RSync style). That would make it an enterprise version of DropBox or Google Drive for S3 but without having to contend with the quotas and fees that come with it.

Resources