Get only X latest log items from SVNKit - svnkit

I want to know the last revisions of a SVN Tag. In my case, not the actual one, but one before (or said differently, the revision from which the repository was tagged from).
I have this code that gets me what I want:
Collection<SVNLogEntry> log = repository.log(new String[]{""}, null, 1, -1, false, false);
List<Long> revisions = log.stream().map(entry -> entry.getRevision()).sorted().collect(Collectors.toList());
System.out.println("Wanted Revision:" + revisions.get(revisions.size()-2));
However, with a big SVN Repository it is just too slow as I get alot of revisions I don't need. I can hardly set the startRevision because I have no info on where to get it from (programatically)...
I would be happy if I could get the X newest elements from the log, this way it would perform well and I could get the info I need, but I find no way to do such a thing, even though Eclipse does limit History even using SVNKit, so it must be possible to get the last N revisions without explicitly setting a startRevision.
What's the big secret I'm not understanding about the SVNKit API?
I have the following import:
compile group: 'org.tmatesoft.svnkit', name: 'svnkit', version: '1.8.12'
UPDATE:
I tried following approach:
repository.log(new String[]{""}, 1, -1, false, false, 15 /* limit */, false, null,
new ISVNLogEntryHandler()
{
#Override
public void handleLogEntry(SVNLogEntry logEntry) throws SVNException
{
logs.add(logEntry);
}
});
However, this does the filtering in Java (could also be expressed as a lambda), not from the repository, so it performs just the same as the sample above, plus it shows the first 15 results, not the last ones...

I am thinking there is no clean solution to this problem (though I do not know the SVN internals)...
What I found out that will work is the following:
SVNDirEntry info = repository.info("", -1);
Collection<SVNLogEntry> log = repository.log(new String[]{""}, null, info.getRevision() - 500, -1, false, false);
List<Long> revisions = log.stream().map(entry -> entry.getRevision()).sorted().collect(Collectors.toList());
revision = revisions.get(revisions.size()-2);
This way it will not retrieve so many results (only the last 500 revisions). For my example maybe a -10 or -50 would do it, but I want to be sure... Will still refactor it to be sure that if info.getRevision is < 500 it will take 1.
I am still not happy with this because I should be validating the size of the log, and if <2 get another 500, but I will live with this potential bug right now (it's a tool for my usage, not something I will deploy)... Dealing with such things is just too much coding for the purpose...

Related

How to request all chats/groups of the user through Telegram Database Library(TDLib) for Node.js

The official example from telegram explains that in order to use getChats() command, one needs to set two parameters 'offset_order' and 'offset_chat_id'.
I'm using this node.js wrapper for the TDLib.
So when I use getChats() with the following params:
'offset_order': '9223372036854775807',
'offset_chat_id': 0,
'limit': 100
just like it is explained in the official docs:
For example, to get a list of chats from the beginning, the
offset_order should be equal to 2^63 - 1
as a result I get 100 chats from the top of the user's list.
What I can't understand is how do I iterate through that list? How do I use the API pagination?
When I try to enter a legitimate chat_id from the middle of the first 100, I still get the same first 100, so it seems like it makes no difference.
If I change that offset_order to ANY other number, I get an empty list of chats in return...
Completely lost here, as every single example I found says the same thing as the official docs, ie how to get the first 100.
Had the same problem, have tried different approaches and re-read documentation for a long time, and here is a decision:
do getChats as you do with '9223372036854775807' offset_order parameter
do getChat request with id of the last chat you have got. It's an offline request, just to tdlib.
here you get a chat object with positions property - get a position from here, looks like this:
positions: [
{
_: 'chatPosition',
list: [Object],
order: '6910658003385450706',
is_pinned: false
}
],
next request - getChats - use the positions[0].order from (3) as offset_order
goto (2) if there are more chats
It wasn't easy to come to this, so would be glad if it helps anybody who came from google like me :)

Documentdb performance when using pagination

I have a working code on pagination which works great with azure search and sql, but when using it on documentdb it takes up to 60 seconds to load.
We beleive it's a latency issue, but I can't find a workaround to fasten it up,
any documentation, or ideas on where to start looking?
public PagedList(IQueryable<T> superset, int pageNumber, int pageSize, string sortExpression = null)
{
if (pageNumber < 1)
throw new ArgumentOutOfRangeException("pageNumber", pageNumber, "PageNumber cannot be below 1.");
if (pageSize < 1)
throw new ArgumentOutOfRangeException("pageSize", pageSize, "PageSize cannot be less than 1.");
// set source to blank list if superset is null to prevent exceptions
TotalItemCount = superset == null ? 0 : superset.Count();
if (superset != null && TotalItemCount > 0)
{
Subset.AddRange(pageNumber == 1
? superset.Skip(0).Take(pageSize).ToList()
: superset.Skip((pageNumber - 1) * pageSize).Take(pageSize).ToList()
);
}
}
While the LINQ provider for DocumentDB translates .Take() into a "TOP" SQL clause under certain circumstances, DocumentDB has no equivalent for Skip. So, I'm a little surprised it works at all but I suspect that the provider is rerunning the query from scratch to simulate Skip. In the comments here is a discussion led by a DocumentDB product manager on why they chose not to implement SKIP. tl;dr; It doesn't scale for NoSQL databases. I can confirm this with MongoDB (which does have a skip functionality). Later pages simply scan and throw away earlier documents. The later in the list you go, the slower it gets. I suspect that the LINQ implementation is doing something similar except client-side.
DocumentDB does have a mechanism for getting documents in chunks but it works a bit differently than SKIP. It uses a continuation token. You can even set a maxPageSize, however there is no guarantee that you'll get that number back.
I recommend that you implement a client-side cache of your own and use a fairly large maxPageSize. Let's say each page in your UI is 10 rows and your cache currently has 27 rows in it. If the user selects page 1 or page 2, you have enough rows to render the result from the data already cached. If the user select page 7, then you know that you need at least 70 rows in your cache. Use the last continuation token to get more until you have at least 70 rows in your cache and then render rows 61-70. On the plus side, continuation tokens are long lived so you can use them later based upon user input.

CouchDB: Single document vs "joining" documents together

I'm tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to my idea, lets assume we have a stackoverflow page stored in a CouchDB. In essence it consists of the actual question on top, answers and commets. Those are basically three layers.
There are two ways of storing it. Either within a single document containing a suitable JSON representation of the data, or store each part of the entry within a separate document combining them later through a view (similar to this: http://www.cmlenz.net/archives/2007/10/couchdb-joins)
Now, both approaches may be fine, yet both have massive downsides from my current point of view. Storing a busy document (many changes through multiple users are expected) as a signle entity would cause conflicts to happen. If user A stores his/her changes to the document, user B would receive a conflict error once he/she is finished typing his/her update. I can imagine its possible to fix this without the users knowledge through re-downloading the document before retrying.
But what if the document is rather big? I'll except them to become rather blown up over time which would put quite some noticeable delay on a save process, especially if the retry process has to happen multiple times due to many users updating a document at the same time.
Another problem I'd see is editing. Every user should be allowed to edit his/her contributions. Now, if they're stored within one document it might be hard to write a solid auth handler.
Ok, now lets look at the multiple documents approach. Question, Answers and Comments would be stored within their own documents. Advantage: only the actual owner of the document can cause conflicts, something that won't happen too often. Being rather small elements of the whole, redownloading wouldn't take much time. Furthermore the auth routine should be quite easy to realize.
Now here's the downside. The single document is real easy to query and display. Having a lot of unsorted snippets laying around seems like a messy thing since I didn't really get the actual view to present me with a 100% ready to use JSON object containing the entire item in an ordered and structured format.
I hope I've been able to communicate the actual problem. I try to decide which solution would be more suitable for me, which problems easier to overcome. I imagine the first solution to be the prettier one in terms of storage and querying, yet the second one the more practical one solvable through better key management within the view (I'm not entirely into the principle of keys yet).
Thank you very much for your help in advance :)
Go with your second option. It's much easier than having to deal with the conflicts. Here are some example docs how I might structure the data:
{
_id: 12345,
type: 'question',
slug: 'couchdb-single-document-vs-joining-documents-together',
markdown: 'Im tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to...' ,
user: 'roman-geber',
date: 1322150148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 23456,
type: 'answer'
question: 12345,
markdown: 'Go with your second option...',
user : 'ryan-ramage',
votes: 100,
date: 1322151148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 45678,
type: 'comment'
question: 12345,
answer: 23456,
markdown : 'I really like what you have said, but...' ,
user: 'somedude',
date: 1322151158041,
'jquery.couch.attachPrevRev' : true
}
To store revisions of each one, I would store the old versions as attachments on the doc being edited. If you use the jquery client for couchdb, you get it for free by adding the jquery.couch.attachPrevRev = true. See Versioning docs in CouchDB by jchris
Create a view like this
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, null, null], null);
if (doc.type == 'answer') emit([doc.question, doc._id, null], null);
if (doc.type == 'comment') emit([doc.question, doc.answer, doc._id], null) ;
}
}
And query the view like this
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{},{}]&include_docs=true
(Note: I have not url encoded this query, but it is more readable)
This will get you all of the related documents for the question that you will need to build the page. The only thing is that they will not be sorted by date. You can sort them on the client side (in javascript).
EDIT: Here is an alternative option for the view and query
Based on your domain, you know some facts. You know an answer cant exist before a question existed, and a comment on an answer cant exist before an answer existed. So lets make a view that might make it faster to create the display page, respecting the order of things:
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, doc.date], null);
if (doc.type == 'answer') emit([doc.question, doc.date], null);
if (doc.type == 'comment') emit([doc.question, doc.date], null);
}
}
This will keep all the related docs together, and keep them ordered by date. Here is a sample query
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{}]&include_docs=true
This will get back all the docs you will need, ordered from oldest to newest. You can now zip through the results, knowing that the parent objects will be before the child ones, like this:
function addAnswer(doc) {
$('.answers').append(answerTemplate(doc));
}
function addCommentToAnswer(doc) {
$('#' + doc.answer).append(commentTemplate(doc));
}
$.each(results.rows, function(i, row) {
if (row.doc.type == 'question') displyQuestionInfo(row.doc);
if (row.doc.type == 'answer') addAnswer(row.doc);
if (row.doc.type == 'comment') addCommentToAnswer(row.doc)
})
So then you dont have to perform any client side sorting.
Hope this helps.

Creating a pagination index in CouchDB?

I'm trying to create a pagination index view in CouchDB that lists the doc._id for every Nth document found.
I wrote the following map function, but the pageIndex variable doesn't reliably start at 1 - in fact it seems to change arbitrarily depending on the emitted value or the index length (e.g. 50, 55, 10, 25 - all start with a different file, though I seem to get the correct number of files emitted).
function(doc) {
if (doc.type == 'log') {
if (!pageIndex || pageIndex > 50) {
pageIndex = 1;
emit(doc.timestamp, null);
}
pageIndex++;
}
}
What am I doing wrong here? How would a CouchDB expert build this view?
Note that I don't want to use the "startkey + count + 1" method that's been mentioned elsewhere, since I'd like to be able to jump to a particular page or the last page (user expectations and all), I'd like to have a friendly "?page=5" URI instead of "?startkey=348ca1829328edefe3c5b38b3a1f36d1e988084b", and I'd rather CouchDB did this work instead of bulking up my application, if I can help it.
Thanks!
View functions (map and reduce) are purely functional. Side-effects such as setting a global variable are not supported. (When you move your application to BigCouch, how could multiple independent servers with arbitrary subsets of the data know what pageIndex is?)
Therefore the answer will have to involve a traditional map function, perhaps keyed by timestamp.
function(doc) {
if (doc.type == 'log') {
emit(doc.timestamp, null);
}
}
How can you get every 50th document? The simplest way is to add a skip=0 or skip=50, or skip=100 parameter. However that is not ideal (see below).
A way to pre-fetch the exact IDs of every 50th document is a _list function which only outputs every 50th row. (In practice you could use Mustache.JS or another template library to build HTML.)
function() {
var ddoc = this,
pageIndex = 0,
row;
send("[");
while(row = getRow()) {
if(pageIndex % 50 == 0) {
send(JSON.stringify(row));
}
pageIndex += 1;
}
send("]");
}
This will work for many situations, however it is not perfect. Here are some considerations I am thinking--not showstoppers necessarily, but it depends on your specific situation.
There is a reason the pretty URLs are discouraged. What does it mean if I load page 1, then a bunch of documents are inserted within the first 50, and then I click to page 2? If the data is changing a lot, there is no perfect user experience, the user must somehow feel the data changing.
The skip parameter and example _list function have the same problem: they do not scale. With skip you are still touching every row in the view starting from the beginning: finding it in the database file, reading it from disk, and then ignoring it, over and over, row by row, until you hit the skip value. For small values that's quite convenient but since you are grouping pages into sets of 50, I have to imagine that you will have thousands or more rows. That could make page views slow as the database is spinning its wheels most of the time.
The _list example has a similar problem, however you front-load all the work, running through the entire view from start to finish, and (presumably) sending the relevant document IDs to the client so it can quickly jump around the pages. But with hundreds of thousands of documents (you call them "log" so I assume you will have a ton) that will be an extremely slow query which is not cached.
In summary, for small data sets, you can get away with the page=1, page=2 form however you will bump into problems as your data set gets big. With the release of BigCouch, CouchDB is even better for log storage and analysis so (if that is what you are doing) you will definitely want to consider how high to scale.

how about run Subsonic version 2 and 3 together?

Today I compile subsonic version with different namespace (as subsonic3) and place it together with subsonic version 2.
Also create all files (for both versions, in different dlls) for my database with success, and make simple tests, and seams that everything works perfect.
This is my simple test code, and you can see 3 and 2 run together.
var PageTitlesOn3 = from p in ATH_Store_Product.All()
where p.ProductID == 1
select p;
foreach (ATH_Store_Product Ena in PageTitlesOn3)
{
txtOnMe.Text += string.Format("<br />{0} ) {1}",
Ena.ProductID, Ena.ProductName);
}
AthStoreProductCollection AllMyTitlesAgainOn2 = new AthStoreProductCollection()
.Where(AthStoreProduct.Columns.ProductID, 1)
.Load();
foreach (AthStoreProduct Ena in AllMyTitlesAgainOn2)
{
txtOnMe.Text += string.Format("<br />{0} ) {1}",
Ena.ProductID, Ena.ProductName);
}
I did that, because I like to move to subsonic 3, but I need to do that one piece at the time, some part code, then some other part etc.
My project is a little bit huge, and run online, and I do not won't to stop it run.
This way, placing subsonic 2 and 3, working together, I can start migrating my code in time.
Now my question is if I am doing any big error, and if they going to crash after run together all their time, if I eat a lot of resources, etc.
I have looked the source code of subsonic, but I am not that familiar (yet) with what really do, what and how is open the database etc…
Now, my second question is, why not subsonic ver3, permanently renamed to subsonic3, and other users do the same think that I did.
Thank you in advanced.
I can't see anything wrong here, after all, it is as if you have two DAL's, one of which it tends to disappear in favor of the other.

Resources