Repeating 503's messages when querying DBpedia - dbpedia

I'm conducting a series of queries to DBpedia SPARQL endpoint (from inside a loop). The code looks more or less like this:
for (String citySplit : citiesSplit) {
RepositoryConnection conn = dbpediaEndpoint.getConnection();
String sparqlQueryLat = " SELECT ?lat ?lon WHERE { "
+ "<http://dbpedia.org/resource/" + citySplit.trim().replaceAll(" ", "_") + "> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat . "
+ "<http://dbpedia.org/resource/" + citySplit.trim().replaceAll(" ", "_") + "> <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?lon ."
+ "}";
TupleQuery queryLat = conn.prepareTupleQuery(QueryLanguage.SPARQL, sparqlQueryLat);
TupleQueryResult resultLat = queryLat.evaluate();
}
The problem is that, after a few iterations, I get a 503 message:
httpclient.wire.header - << "HTTP/1.1 503 Service Temporarily Unavailable[\r][\n]"
(...)
org.openrdf.query.QueryInterruptedException
at org.openrdf.http.client.HTTPClient.getTupleQueryResult(HTTPClient.java:1041)
at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:438)
at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:413)
at org.openrdf.repository.http.HTTPTupleQuery.evaluate(HTTPTupleQuery.java:41)
If I understand correctly, this 503 message is from DBpedia. Am I right?
The number of consecutive queries that manage to succeed is variable. Sometimes it runs for 13 seconds before getting the message, sometimes 15 minutes.
In any case, I don't think this is normal.
What could be happening?

The Accessing the DBpedia Data Set over the Web page of the DBpedia wiki says, in section 1.1. Public SPARQL Endpoint says:
Fair Use Policy: Please read this post for information about restrictions on the public DBpedia endpoint. These might also be usefull [sic]: 1, 2.
The linked post says that the public DBpedia SPARQL endpoint implements rate limiting.
The http://dbpedia.org/sparql endpoint has both rate limiting on the number of connections/sec you can make, as well as restrictions on resultset and query time, as per the following settings:
[SPARQL]
ResultSetMaxRows = 2000
MaxQueryExecutionTime = 120
MaxQueryCostEstimationTime = 1500
These are in place to make sure that everyone has a equal chance to de-reference data from dbpedia.org, as well as to guard against badly written queries/robots.
I think that it is likely that you are hitting that limit.

Related

what code-instrument should be added to register each http event in MeterRegistry with specific tag & minute value. Event requests are in millions

I need to analyse one http event value which should not be greater than 30mins. & 95% event should belong to this bucket. If it fails send the alert.
My first concern is to get the right metrics in /actuator/prometheus
Steps I took:
As in every http request event, I am getting one integer value called eventMinute.
Using micrometer MeterRegistry, I tried below code
// MeterRegistry meterRegistry ...
meterRegistry.summary("MINUTES_ANALYSIS", tags);
where tag = EVENT_MINUTE which receives some integer value in each
http event.
But this way, it floods the metrics due to millions of event.
Guide me a way please, i am beginner to this. Thanks!!
The simplest solution (which I would recommend you start with) would be to just create 2 counters:
int theThing = //getTheThing()
if(theThing > 30) {
meterRegistry.counter("my.request.counter.abovethreshold").inc()
}
meterRegistry.counter("my.request.counter.total").inc()
You would increment the counter that matches your threshold and another that tracks all requests (or reuse another meter that does that for you).
Then it is simple to setup a chart or alarm:
my_request_counter_abovethreshold/my_request_counter_total < .95
(I didn't test the code. It might need a tiny bit of tweaking)
You'll be able to do a similar thing with DistributionSummary by setting various SLOs (I'm not familiar with them to be able to offer one), but start with something simple first and if it is sufficient, you won't need the other complexity.
There are certain ways to solve this problem
1 ; here is a function which receives tags, name of metrics and a value
public void createOrUpdateHistogram(String metricName, Map<String, String> stringTags, double numericValue)
{
DistributionSummary.builder(metricName)
.tags(tags)
//can enforce slo if required
.publishPercentileHistogram()
.minimumExpectedValue(1.0D) // can take this based on how you want your distibution
.maximumExpectedValue(30.0D)
.register(this.meterRegistry)
.record(numericValue);
}
then it produce metrics like
delta_bucket{mode="CURRENT",le="30.0",} 11.0
delta_bucket{mode="CURRENT", le="+Inf",} 11.0
so as infinte also hold the less than value, so subtract the le=30 from le=+Inf
Another ways could be
public void createOrUpdateHistogram(String metricName, Map<String, String> stringTags, double numericValue)
{
Timer.builder(metricName)
.tags(tags)
.publishPercentiles(new double[]{0.5D, 0.95D})
.publishPercentileHistogram()
.serviceLevelObjectives(new Duration[]{Duration.ofMinutes(30L)})
.minimumExpectedValue(Duration.ofMinutes(30L))
.maximumExpectedValue(Duration.ofMinutes(30L))
.register(this.meterRegistry)
.record((long)timeDifference, TimeUnit.MINUTES);
}
it will only have two le, the given time and +inf
it can be change based on our requirements also it gives us quantile.

Prepared statement for updating a map column in camel-cassandraql is failing

I got this exception - "no viable alternative at input '?'", i feel it is because of "+" query statement.
private static final String CQL_BEAN = "cql:bean:cassandraCluster";
String updataQuery = "UPDATE user_preference SET preference = preference + ? WHERE user_id = ? AND tenant_id = ? IF EXISTS";
.to(CQL_BEAN + "/" + cassandraProperties.getKeyspaceName() + "?cql=" + this.updataQuery + "&prepareStatements=false")
Update: it could be because you are using prepareStatement=false - it looks like that in this case it won't substitute placeholders... Although I'm not expert in this integration.
....
What do you want to achieve with this syntax? Update only records that were inserted previously?
Usually LWTs are used only in very limited number of situations as they require coordination between nodes in the cluster, and seriously degrade performance. More details on LWTs you can find in documentation.

How to retrieve all documents in couchdb database without causing out of memory

I have a coucdb database which contains about 200000 tweets, keys are tweet ID. I have a query which needs to retrieve all documents to look for some information. I'm using lightcouch to work with couchdb in a java web app. If I create a dbClient like this:
List<JsonObject>tweets = dbClient.view("_all_docs").query(JsonObject.class);
and then loop through tweets, for each JsonObject in tweets, use
JsonObject tweetJson = dbClient.find(JsonObject.class, tweet.get("id").toString().replaceAll("\"", ""));
to retrieve each tweet one by one it took extremely long time for 200000 documents. If I load all documents in one single query using includeDocs(true)
List<JsonObject>allTweets = dbClient.view("_all_docs").includeDocs(true).query(JsonObject.class);
it caused outofmemory exception since the number of documents are too large. So how can i deal with this problem? I'm thinking about using limit(5000) to retrieve 5000 documents for each time and loop through whole database, but I don't know how to write the loop to continue to retrieve the next 5000 after the first 5000 docs. One possible solution is using startKey and endKey but I'm confused how to use them when the key is tweet ID.
Use queryPage but make sure to use a String as the Key
See: https://github.com/lightcouch/LightCouch/issues/26#event-122327174
0.1.6 still seems to show this behaviour.
A workaround that I found for this goes something like this:
changes = DbClient.changes()
.since(null) // or... since(since) if you want an offset
.includeDocs(true);
int size = 1;
getCursor("0");
while (size > 0 ) {
ChangesResult resultSet = changes.limit(40000).getChanges();
List<ChangesResult.Row> rowList = resultSet.getResults();
for (ChangesResult.Row feed: rowList) {
<instantiate your object via gson>
.
.
.
}
getCursor(resultSet.getLastSeq());
size = rowList.size();
}

Batchjob: number of record receivable by a subquery

I'm relatively new to Apex, but I have some questions about a batch job that I am creating. I want to make a query with a subquery (please see the code). Every Portal_c can have more than 200 Exporte_r.
global Database.QueryLocator start(Database.BatchableContext BC) {
String query = 'SELECT Id, Name, (SELECT Id FROM Exporte__r) FROM Portal__c';
return Database.getQueryLocator(query);
}
global void execute(Database.BatchableContext BC, List<Portal__c> scope) {
for (Portal__c portal : scope) {
// doesn't work -> First error: Aggregate query has too many rows for direct assignment, use FOR loop
// when using FOR loop -> System.QueryException: invalid query locator
//List<Export__c> relatedExports = portal.Exporte__r;
// grab all the related Export__c records using 'getSObjects' to avoid errors described above
Export__c[] relatedExports = portal.getSObjects('Exporte__r');
if (relatedExports != null) {
for (Export__c exp : relatedExports) {
// do something
}
}
}
}
I have the following questions:
If I use List<Export__c> relatedExports = portal.Exporte__r (which I commented out) to get the sub query records then I will receive the error message: “Aggregate query has too many rows for direct assignment, use FOR loop”. The error message makes no sense for me as the SOQL is done already before. Is there any explaination?
With the solution above the maximal amount of records from type Exporte_r received per Portal_c with the sub query is 199 though I have more than 200 for some records of Portal__c, why is it limited to that number? It seems all records above 199 are ignored in this case.
Is there any possibility to receive more than 199 records from a sub query? I have tried to change the batch size but it seems it is independent of the number of records receivable by the sub query. Any idea?
Many thanks!
As per the salesforce doc http://www.salesforce.com/us/developer/docs/apexcode/Content/langCon_apex_loops_for_SOQL.htm
You might get a QueryException in a SOQLfor loop with the message
Aggregate query has too many rows for direct assignment, use FOR loop.
This exception is sometimes thrown when accessing a large set of child
records of a retrieved sObject inside the loop, or when getting the
size of such a record set. To avoid getting this exception, use a for
loop to iterate over the child records, as follows.
Integer count=0;
for (Contact c : returnedAccount.Contacts) {
count++;
// Do some other processing
}

crm 2011 when update a field in account entity there is no change

Well, my problem is the follow. I create a entity. i call this new_logpuntossegmentacin that has a relation 1 to ∞ with account, when i put in the registration plugin message, create i hope that the follow code fill out the field puntosacumulados but nothing happens:
cli is a Account from a List
total is a Decimal
total = total1 + total2 + total3 + total4 + total5 + total6;
cli.new_puntosacumulados.Insert(i, total.ToString());
svcContext.UpdateObject(cli);
svcContext.SaveChanges();
i++;
if (!String.IsNullOrEmpty(total.ToString()))
{
tracingService.Trace("Response = {0}", total.ToString());
}
tracingService.Trace("Done.");
A couple of questions which may give a little more context:-
1) When you say nothing happens, do you mean that the value is not updated in the database or it doesn't appear updated on the form? If the latter then it may be when the plug in is firing (pre vs post).
2) Could you perhaps post the rest of the method as it may be useful to get some context on some of the other parameters, e.g. what is "i" iterating over here?
Thanks

Resources