How can you get the garbage collection policies for Google Cloud Bigtable? - garbage-collection

The docs for garbage collection explain how to set the garbage collection policies with cbt but don't explain how to read the policies. The cbt reference doesn't seem to have any command to get garbage collection either. As far as I can tell, this isn't available in the GUI.

See this page about configuring garbage collection. It provides examples of how to set and update garbage-collection policies when you use Cloud Bigtable client libraries or the cbt command-line tool.
The answer to your question is that you can view garbage collection policies of certain tables by running this cbt command:
cbt ls TABLE_ID
Here's another reference written by other devs showing the sample output of the command.

Related

Bringing a MS Graph Search Custom Connector into working mode

Recently Microsoft published the Microsoft Search API (beta) which provides the possibility to index external systems by creating an MS Graph search custom connector.
I created such a connector that was successful so far. I also pushed a few items to the index and in the MS admin center, I created a result type and a vertical. Now I'm able to find the regarded external items in the SharePoint Online modern search center in a dedicated tab belonging to the search vertical created before. So far so good.
But now I wonder:
How can I achieve that the external data is continuously pushed to the MS Search Index? (How can this be implemented? Is there any tutorial or a sample project? What is the underlying architecture?)
Is there a concept of Full / Incremental / Continuous Crawls for a Search Custom Connector at all? If so, how can I "hook" into a crawl in order to update changed data to the index?
Or do I have to implement it all on my own? And if so, what would be a suitable approach?
Thank you for trying out the connector APIs. I am glad to hear that you are able to get items into the index and see the results.
Regarding your questions, the logic for determining when to push items, and your crawl strategy is something that you need to implement on your own. There is no one best strategy per se, and it will depend on your data source and the type of access you have to that data. For example, do you get notifications every time the data changes? If not, how do you determine what data has changed? If none of that is possible, you might need to do a periodic full recrawl, but you will need to consider the size of your data set for ingestion.
We will look into ways to reduce the amount of code you have to write in the future, but right now, this is something you have to implement on your own.
-James
I recently implemented incremental crawling for Graph connectors using Azure functions. I created a timer triggered function that fetches the items updated in the data source since the time of the last function run and then updates the search index with the updated items.
I also wrote a blog post around this approach considering a SharePoint list as the data source. The entire source code can be found at https://github.com/aakashbhardwaj619/function-search-connector-crawler. Hope it would be useful.

Azure "/_partitionKey"

I created multiple collections through code without realising the importance of having a partition key. I have since read the only way to add a partition key and redistribute the data is by deleting the collection and recreating it.
I don't really want to have to do this as I have quite a-lot of data already and want to avoid the downtime. When I look the Scale & Settings menu in Azure for each of my collections is see this below.
Can someone explain this - I thought my partition key was null but looks like MS have given me one called _partitionKey? Can I not just add _partitionKey to my documents, run a script to update them all to the key I want to use (e.g country)?
This is a new feature which allows non-partitioned collections (now called containers in the latest SDKs) to start using partitions with 0 downtime. The big caveat is that you need to be using the latest SDKs (which will be announced GA really soon (in fact most are already published, just waiting on doc publishing/etc.). Portal got the feature first since it's using the latest SDK under the covers already.

arangodb 3.1 foxx docs?

I think that arangodb is presently the best nosql db and that foxx microservices are a great resource.
Alas, the related docs that comes with the 3.xxx version can help build only a minimalistic service.
Also, many apps you can find as examples in the arangodb store have been developed with deprecated tools (eg. controllers, repositories).
And while the wizard available in the web interface easily allows to create a new service, I don't understand why a new collection, prefixed with the mount point, has to be created. So a complete REST API is generated with a great documentation, but it is absolutely useless unless I change the name of an already existing collection. Why is that ???
The generator is meant as a quick boilerplate generator to allow you to build prototypes more easily. In practice it's not a great starting point for real-world projects (especially if you already have created collections manually) but if you just quickly need a REST API you can expand with your own logic it can come in handy.
As you've read the docs I'm sure you've followed this Getting Started guide: https://docs.arangodb.com/3/Manual/Foxx/GettingStarted.html
In it, the reasoning for prefixed vs non-prefixed collection names is given as such:
Because we have hardcoded the collection name, multiple copies of the service installed alongside each other in the same database will share the same collection. Because this may not always be what you want, the Foxx context also provides the collectionName method which applies a mount point specific prefix to any given collection name to make it unique to the service. It also provides the collection method, which behaves almost exactly like db._collection except it also applies the prefix before looking the collection up.
On the technical side the documentation for the Context#collection method further specifies what the method does:
Passes the given name to collectionName, then looks up the collection with the prefixed name.
The documentation for Context#collectionName:
Prefixes the given name with the collectionPrefix for this service.
And finally Context#collectionPrefix:
The prefix that will be used by collection and collectionName to derive the names of service-specific collections. This is derived from the service's mount point, e.g. /my-foxx becomes my_foxx.
So, yes, if you just want to use a collection shared by all your services the unprefixed version (using the db object directly) is the way to go. But this often encourages tight coupling between different services, defeating the purpose of having them as separate services in the first place and becomes problematic when you need multiple instances of the same service but don't want them to share data, so most examples encourage you to use the module.context.collection method instead.

Partial syncing in pouchdb / couchdb with a particular scenario

I have been reading docs and articles on pouchdb/couchdb/cloudant. I am not able to create this simple architecture in my head. I need help!
So there are many users on the app. Each user has a separate database (which I read is the approach in pouch/couch/cloudant setup).
Now lets just focus on a single user. This user has some remote data already present on our server(couchdb). He has 3 separate docs stored.
He accesses docs 1 and docs 2 from browser 1. And docs 2 and docs 3 from browser 2.
Content in both the browsers must be in sync.
Should I be using Sync api of pouchdb? But as I read, it sync's the whole database. How can I use this api to sync only a subset of the central database. Is filtered replication answer here?
And also I don't want to push both the docs in a single call. He can access docs as he needs.
What is the correct approach to implement this logic with pouch/couch databases. If you can explain with a little code, that will be great. I just need basic ideas.
Is this kind of problem easily solvable in upcoming releases of CouchDB 2.0 and PouchDB-find.
Thanks a lot!
If you take a look at the PouchDB documentation, you should see the options.doc_ids. This parameter let you setup a replication on certain document ids. In your scenario, this would be solving your problem.

SQL Azure Profiling

I read on the MS site that SQL Azure does not support SQL Profiler. What are people using to profile queries running on this platform?
I haven't got too far playing around with SQL Azure as yet, but from what I understand there isn't anything you can use at the moment.
From MS (probably the article you read):
Because SQL Azure performs the
physical administration, any
statements and options that attempt to
directly manipulate physical resources
will be blocked, such as Resource
Governor, file group references, and
some physical server DDL statements.
It is also not possible to set server
options and SQL trace flags or use the
SQL Server Profiler or the Database
Tuning Advisor utilities.
If there were to be an alernative, I'd imagine it would require the ability to set trace flags which you can't do, hence I don't think there is an option at the moment.
Solution? I can only suggest you have a local development copy of the db so you can run profiler locally on it. I know that won't help with "live" issues/debugging/monitoring but it depends on what you need it for.
Edit:
Quote from MSDN forum:
Q: Is SQL Profiler supported in SQL
Azure?
A: We do not support SQL Profiler in
v1 of SQL Azure.
Now, you could interpret that as a hint that Profiler will be supported in future versions. I think it will be a big requirement to get a lot of people on board, using SQL Azure seriously.
Update as of 9/17/2015:
Microsoft just announced a new feature called Index Advisor:
How does Index Advisor work? Index Advisor continuously monitors your
database workload, performs the analysis and recommends new indexes
that can further improve the DB performance.
Recommendations are always kept up-to-date: As the DB workload and
schema evolves, Index Advisor will monitor the changes and adjust the
recommendations accordingly. Each recommendation comes with the
estimated impact to DB workload performance: You can use this
information to prioritize the most impactful recommendations first. In
addition, Index Advisor provides a very easy and powerful way of
creating the recommended indexes.
Creating new indexes only takes a couple of clicks. Index Advisor
measures the impact of newly created indexes and provides a report on
index impact to users. You can get started with Index Advisor and
improve your database performance with the following simple steps. It
literally takes five minutes to get accustomed with Index Advisor’s
simple and intuitive user interface. Let’s get started!
Original Answer:
SQL Azure now has some native profiling. See http://blogs.msdn.com/b/benko/archive/2012/05/19/cloudtip-14-how-do-i-get-sql-profiler-info-from-sql-azure.aspx for details.
Microsoft's stated position SQL Server Profiler is deprecated. As much as this is a bad idea, that's what they have said.
SQL Profile is already deprecated in SQL Server, and that’s part of
the reason that it doesn’t make sense to bring to SQL DB.
What this means is you are going back 20+ years in database performance monitoring and everyone is going to have to write their own perf monitoring scripts instead of having a standard factory delivered tool that's on every server you will go to. It's tantamount to deprecating "sp_help" and making every DBA write their own. Hope you know all your DMVs inside and out... INNER JOIN, OUTER JOIN, and CROSS APPLY syntax really well.
Update as of 2017/04/14:
Microsoft's Scott Guthrie today announced a lot of new features in SQLAzure(this is called sqlazure managed instance,which is currently in preview),which are expected to be present in SQLAzure in coming months..below are them
1.SQLAgent
2.SQLProfiler
3.SQLCLR
4.Service Broker
5.Logshipping,Transactional Replication
6.Native/Backup restore
7.Additional DMV's and Xevents
8.cross database querying
References:
https://youtu.be/0uT46lpjeQE?t=1415
I have tried today a new tool suggested by Microsoft that is called Azure Data Studio.
In this tool you can download an extension called Profiler and it seems to be working just as expected.
You can use Query store feature, look here for more details: http://azure.microsoft.com/blog/2015/06/08/query-store-a-flight-data-recorder-for-your-database/
The most close to SQL profiler, that I found working in Azure SQL, is SQL Workload Profiler
However note, that it’s beta version of a tool, created but a single person, and it is not too convinient to use.
SQL Azure offers following features to tune performance, profile queries in its own way, identity long running queries and much more
Intelligent Performance
Performance overview
Performance recommendations
Query Performance Insight
Automatic tuning

Resources