Based on my limited searching, it seems GraphQL can only support equal filtering. So,
Is it possible to do Github GraphQL searching with the filtering conditions of,
stars > 10
forks > 3
total commit >= 5
total issues >= 1
open issues <= 60
size > 2k
score > 5
last update is within a year
I.e., filtering will all above conditions. Is it possible?
When querying for repositories, you can apply a filter only for a certain number of the fields in your list:
number of stars
number of forks
size
last update
Although you cannot specify them in the query filter, you can include other fields in your query and verify the values in the client application:
total number of issues
number of open issues
While, in theory, you can also query for the number of commits, applying your specific parameter arguments, that query returns a server error, it most probably times out. For that reason, those lines are commented out.
Here's the GraphQL query:
query {
search(
type:REPOSITORY,
query: """
stars:>10
forks:>3
size:>2000
pushed:>=2018-08-08
""",
last: 100
) {
repos: edges {
repo: node {
... on Repository {
url
allIssues: issues {
totalCount
}
openIssues: issues(states:OPEN) {
totalCount
}
# commitsCount: object(expression: "master") {
# ... on Commit {
# history {
# totalCount
# }
# }
# }
}
}
}
}
}
The specification for repository queries can be found here: https://help.github.com/en/articles/searching-for-repositories#search-by-repository-size
This is not an answer but an update of what I've collected so far.
According to "Select * for Github GraphQL Search", not all above criteria might be available in the Repository edge. Namely, the "total commit", "open issues" and "score" might not be available.
The purpose of the question is obviously to find the valuable repositories and weed off the lower-quality ones. I've collected all the available fields that might be helpful for such assessment here.
A copy of it as of 2018-03-18:
query SearchMostTop10Star($queryString: String!, $number_of_repos:Int!) {
search(query: $queryString, type: REPOSITORY, first: $number_of_repos) {
repositoryCount
edges {
node {
... on Repository {
name
url
description
# shortDescriptionHTML
repositoryTopics(first: 12) {nodes {topic {name}}}
primaryLanguage {name}
languages(first: 3) { nodes {name} }
releases {totalCount}
forkCount
pullRequests {totalCount}
stargazers {totalCount}
issues {totalCount}
createdAt
pushedAt
updatedAt
}
}
}
}
}
variables {
"queryString": "language:JavaScript stars:>10000",
"number_of_repos": 3
}
Anyone can try it out as per here.
Related
I have a requirement to find count of comments in all open pull request of a repository
Only way i know is to get all open pull request from a repo and iterate each pull request and perform call like this
GET /repos/:owner/:repo/pulls/:pull_number/comments
and sum up those responses, But it is too costly
I also tried with this method ( Find review comments on all PR in a repo)
GET /repos/:owner/:repo/pulls/comments
and passed state = open as query param like this
https://api.github.com/repos/angular/angular/pulls/comments?per_page=30&state=open
But it returns review comments of all pull requests
Any efforts will be appreciated
Using Github API Rest v3, you can use a search query like this :
https://api.github.com/search/issues?q=is:pr%20state:open%20repo:angular/angular&per_page=100
You can use GraphQL API v4 using the following query :
{
repository(owner: "angular", name: "angular") {
pullRequests(states: OPEN, first: 100) {
nodes {
title
comments {
totalCount
}
}
}
}
}
Try it in the explorer
Or using a search query like this :
{
search(type: ISSUE, query: "is:pr state:open repo:angular/angular", first: 100) {
nodes {
... on PullRequest {
title
comments {
totalCount
}
}
}
}
}
Try it in the explorer
If you want reviews count & reviews comment as well you could use :
{
search(type: ISSUE, query: "is:pr state:open repo:angular/angular", first: 100) {
nodes {
... on PullRequest {
title
comments {
totalCount
}
reviews(first: 100) {
totalCount
nodes {
comments {
totalCount
}
}
}
}
}
}
}
Try it in the explorer
Using curl :
repo_owner=angular
repo_name=angular
token=YOUR_TOKEN
curl -s -H "Authorization: bearer $token" -d '
{
"query": "query {repository(owner: \"'$repo_owner'\", name: \"'$repo_name'\") {pullRequests(states: OPEN, first: 100) {nodes {title comments {totalCount}}}}}"
}
' https://api.github.com/graphql
I want to implement a capsule that does a calculation if the user provides the full input necessary for the calculation or asks the user for the necessary input if the user doesn't provide the full input with the very first request. Everything works if the user provides the full request. If the user doesn't provide the full request but Bixby needs more information, I run into some strange behavior where the Calculation is being called more than once and Bixby takes the necessary information for the Calculation from a result of another Calculation, it looks like in the debug graph.
To easier demonstrate my problem I've extended the dice sample capsule capsule-sample-dice and added numSides and numDice to the RollResultConcept, so that I can access the number of dice and sides in the result.
RollResult.model.bxb now looks like this:
structure (RollResultConcept) {
description (The result object produced by the RollDice action.)
property (sum) {
type (SumConcept)
min (Required)
max (One)
}
property (roll) {
description (The list of results for each dice roll.)
type (RollConcept)
min (Required)
max (Many)
}
// The two properties below have been added
property (numSides) {
description (The number of sides that the dice of this roll have.)
type (NumSidesConcept)
min (Required)
max (One)
}
property (numDice) {
description (The number of dice in this roll.)
type (NumDiceConcept)
min (Required)
max (One)
}
}
I've also added single-lines in RollResult.view.bxb so that the number of sides and dice are being shown to the user after a roll.
RollResult.view.bxb:
result-view {
match {
RollResultConcept (rollResult)
}
render {
layout {
section {
content {
single-line {
text {
style (Detail_M)
value ("Sum: #{value(rollResult.sum)}")
}
}
single-line {
text {
style (Detail_M)
value ("Rolls: #{value(rollResult.roll)}")
}
}
// The two single-line below have been added
single-line {
text {
style (Detail_M)
value ("Dice: #{value(rollResult.numDice)}")
}
}
single-line {
text {
style (Detail_M)
value ("Sides: #{value(rollResult.numSides)}")
}
}
}
}
}
}
}
Edit: I forgot to add the code that I changed in RollDice.js, see below:
RollDice.js
// RollDice
// Rolls a dice given a number of sides and a number of dice
// Main entry point
module.exports.function = function rollDice(numDice, numSides) {
var sum = 0;
var result = [];
for (var i = 0; i < numDice; i++) {
var roll = Math.ceil(Math.random() * numSides);
result.push(roll);
sum += roll;
}
// RollResult
return {
sum: sum, // required Sum
roll: result, // required list Roll
numSides: numSides, // required for numSides
numDice: numDice // required for numDice
}
}
End Edit
In the Simulator I now run the following query
intent {
goal: RollDice
value: NumDiceConcept(2)
}
which is missing the required NumSidesConcept.
Debug view shows the following graph, with NumSidesConcept missing (as expected).
I now run the following query in the simulator
intent {
goal: RollDice
value: NumDiceConcept(2)
value: NumSidesConcept(6)
}
which results in the following Graph in Debug view:
and it looks like to me that the Calculation is being done twice in order to get to the Result. I've already tried giving the feature { transient } to the models, but that didn't change anything. Can anybody tell me what's happening here? Am I not allowed to use the same primitive models in an output because they will be used by Bixby when trying to execute an action?
I tried modifying the code as you have but was unable to run the intent (successfully).
BEGIN EDIT
I added the additional lines in RollDice.js and was able to see the plan that you are seeing.
The reason for the double execution is that you ran the intents consecutively and Bixby derived the value of the NumSidesConcept that you did NOT specify in the first intent, from the second intent, and executed the first intent.
You can verify the above by providing a different set of values to NumSidesConcept and NumDiceConcept in each of the intents.
If you had given enough time between these two intents, then the result would be different. In your scenario, the first intent was waiting on a NumSidesConcept to be available, and as soon as the Planner found it (from the result of the second intent), the execution went through.
How can you avoid this? Make sure that you have an input-view for each of the inputs so Bixby can prompt the user for any values that did not come through the NL (or Aligned NL).
END EDIT
Here is another approach that will NOT require changing the RollResultConcept AND will work according to your expectations (of accessing the number of dice and sides in the result-view)
result-view {
match: RollResultConcept (rollResult) {
from-output: RollDice(action)
}
render {
layout {
section {
content {
single-line {
text {
style (Detail_M)
value ("Sum: #{value(rollResult.sum)}")
}
}
single-line {
text {
style (Detail_M)
value ("Rolls: #{value(rollResult.roll)}")
}
}
// The two single-line below have been added
single-line {
text {
style (Detail_M)
value ("Dice: #{value(action.numDice)}")
}
}
single-line {
text {
style (Detail_M)
value ("Sides: #{value(action.numSides)}")
}
}
}
}
}
}
}
Give it a shot and let us know if it works!
I working in a firebase database. I need to limit the length of a string field. How do I do that?
The path to the field is:
Col1/doc1///description
That is, starting with the collection col1, then into doc1, then for all collections under doc1, and all documents under that collection, the description field needs to be limited to 100 characters.
Can someone please explain to me how to do this? Thanks
I tried to get the validation from the other answer to work but was unsuccessful. Firestore just doesn't like checking .length on resource data values.
I searched around and this article helped me:
https://fireship.io/snippets/firestore-rules-recipes/
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /posts/{postId} {
allow read: if true;
allow write: if request.auth != null
&& request.auth.uid == request.resource.data.uid
&& request.resource.data.body.size() > 0
&& request.resource.data.body.size() < 255
&& request.resource.data.title.size() > 0
&& request.resource.data.title.size() < 255;
}
}
}
For Cloud Firestore you can validate that the description field is no longer than 100 characters with:
service cloud.firestore {
match /databases/{database}/documents {
match /col1/doc1 {
allow write: if resource.data.description.length <= 100;
match /subcollection1/{doc=**} {
allow write: if resource.data.description.length <= 100;
}
}
}
}
This applies to col1/doc and all documents in subcollection1. Note that these rules will not limit the length of the description, since security rules cannot modify the data that is written. Instead the rules reject writes where the description is longer than 100 characters.
There is no way (that I know of) to apply rules to each subcollection of only one document. The closest I know is to apply it to all documents and their subcollections:
match /col1/(document=**} {
allow write: if resource.data.description.length <= 100;
}
This applies the validation to all documents in col1 and in all subcollections under that.
I have parquet data partitioned by date & hour, folder structure:
events_v3
-- event_date=2015-01-01
-- event_hour=2015-01-1
-- part10000.parquet.gz
-- event_date=2015-01-02
-- event_hour=5
-- part10000.parquet.gz
I have created a table raw_events via spark but when I try to query, it scans all the directories for footer and that slows down the initial query, even if I am querying only one day worth of data.
query:
select * from raw_events where event_date='2016-01-01'
similar problem : http://mail-archives.apache.org/mod_mbox/spark-user/201508.mbox/%3CCAAswR-7Qbd2tdLSsO76zyw9tvs-Njw2YVd36bRfCG3DKZrH0tw#mail.gmail.com%3E ( but its old)
Log:
App > 16/09/15 03:14:03 main INFO HadoopFsRelation: Listing leaf files and directories in parallel under: s3a://bucket/events_v3/
and then it spawns 350 tasks since there are 350 days worth of data.
I have disabled schemaMerge, and have also specified the schema to read as, so it can just go to the partition that I am looking at, why should it print all the leaf files ?
Listing leaf files with 2 executors take 10 minutes, and the query actual execution takes on 20 seconds
code sample:
val sparkSession = org.apache.spark.sql.SparkSession.builder.getOrCreate()
val df = sparkSession.read.option("mergeSchema","false").format("parquet").load("s3a://bucket/events_v3")
df.createOrReplaceTempView("temp_events")
sparkSession.sql(
"""
|select verb,count(*) from temp_events where event_date = "2016-01-01" group by verb
""".stripMargin).show()
As soon as spark is given a directory to read from it issues call to listLeafFiles (org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala). This in turn calls fs.listStatus which makes an api call to get list of files and directories. Now for each directory this method is called again. This hapens recursively until no directories are left. This by design works good in a HDFS system. But works bad in s3 since list file is an RPC call. S3 on other had supports get all files by prefix, which is exactly what we need.
So for example if we had above directory structure with 1 year worth of data with each directory for hour and 10 sub directory we would have , 365 * 24 * 10 = 87k api calls, this can be reduced to 138 api calls given that there are only 137000 files. Each s3 api calls return 1000 files.
Code:
org/apache/hadoop/fs/s3a/S3AFileSystem.java
public FileStatus[] listStatusRecursively(Path f) throws FileNotFoundException,
IOException {
String key = pathToKey(f);
if (LOG.isDebugEnabled()) {
LOG.debug("List status for path: " + f);
}
final List<FileStatus> result = new ArrayList<FileStatus>();
final FileStatus fileStatus = getFileStatus(f);
if (fileStatus.isDirectory()) {
if (!key.isEmpty()) {
key = key + "/";
}
ListObjectsRequest request = new ListObjectsRequest();
request.setBucketName(bucket);
request.setPrefix(key);
request.setMaxKeys(maxKeys);
if (LOG.isDebugEnabled()) {
LOG.debug("listStatus: doing listObjects for directory " + key);
}
ObjectListing objects = s3.listObjects(request);
statistics.incrementReadOps(1);
while (true) {
for (S3ObjectSummary summary : objects.getObjectSummaries()) {
Path keyPath = keyToPath(summary.getKey()).makeQualified(uri, workingDir);
// Skip over keys that are ourselves and old S3N _$folder$ files
if (keyPath.equals(f) || summary.getKey().endsWith(S3N_FOLDER_SUFFIX)) {
if (LOG.isDebugEnabled()) {
LOG.debug("Ignoring: " + keyPath);
}
continue;
}
if (objectRepresentsDirectory(summary.getKey(), summary.getSize())) {
result.add(new S3AFileStatus(true, true, keyPath));
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: fd: " + keyPath);
}
} else {
result.add(new S3AFileStatus(summary.getSize(),
dateToLong(summary.getLastModified()), keyPath,
getDefaultBlockSize(f.makeQualified(uri, workingDir))));
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: fi: " + keyPath);
}
}
}
for (String prefix : objects.getCommonPrefixes()) {
Path keyPath = keyToPath(prefix).makeQualified(uri, workingDir);
if (keyPath.equals(f)) {
continue;
}
result.add(new S3AFileStatus(true, false, keyPath));
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: rd: " + keyPath);
}
}
if (objects.isTruncated()) {
if (LOG.isDebugEnabled()) {
LOG.debug("listStatus: list truncated - getting next batch");
}
objects = s3.listNextBatchOfObjects(objects);
statistics.incrementReadOps(1);
} else {
break;
}
}
} else {
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: rd (not a dir): " + f);
}
result.add(fileStatus);
}
return result.toArray(new FileStatus[result.size()]);
}
/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
def listLeafFiles(fs: FileSystem, status: FileStatus, filter: PathFilter): Array[FileStatus] = {
logTrace(s"Listing ${status.getPath}")
val name = status.getPath.getName.toLowerCase
if (shouldFilterOut(name)) {
Array.empty[FileStatus]
}
else {
val statuses = {
val stats = if(fs.isInstanceOf[S3AFileSystem]){
logWarning("Using Monkey patched version of list status")
println("Using Monkey patched version of list status")
val a = fs.asInstanceOf[S3AFileSystem].listStatusRecursively(status.getPath)
a
// Array.empty[FileStatus]
}
else{
val (dirs, files) = fs.listStatus(status.getPath).partition(_.isDirectory)
files ++ dirs.flatMap(dir => listLeafFiles(fs, dir, filter))
}
if (filter != null) stats.filter(f => filter.accept(f.getPath)) else stats
}
// statuses do not have any dirs.
statuses.filterNot(status => shouldFilterOut(status.getPath.getName)).map {
case f: LocatedFileStatus => f
// NOTE:
//
// - Although S3/S3A/S3N file system can be quite slow for remote file metadata
// operations, calling `getFileBlockLocations` does no harm here since these file system
// implementations don't actually issue RPC for this method.
//
// - Here we are calling `getFileBlockLocations` in a sequential manner, but it should not
// be a big deal since we always use to `listLeafFilesInParallel` when the number of
// paths exceeds threshold.
case f => createLocatedFileStatus(f, fs.getFileBlockLocations(f, 0, f.getLen))
}
}
}
To clarify Gaurav's answer, that code snipped is from Hadoop branch-2, Probably not going to surface until Hadoop 2.9 (see HADOOP-13208); and someone needs to update Spark to use that feature (which won't harm code using HDFS, just won't show any speedup there).
One thing to consider is: what makes a good file layout for Object Stores.
Don't have deep directory trees with only a few files per directory
Do have shallow trees with many files
Consider using the first few characters of a file for the most changing value (such as day/hour), rather than the last. Why? Some object stores appear to use the leading characters for their hashing, not the trailing ones ... if you give your names more uniqueness then they get spread out over more servers, with better bandwidth/less risk of throttling.
If you are using the Hadoop 2.7 libraries, switch to s3a:// over s3n://. It's already faster, and getting better every week, at least in the ASF source tree.
Finally, Apache Hadoop, Apache Spark and related projects are all open source. Contributions are welcome. That's not just the code, it's documentation, testing, and, for this performance stuff, testing against your actual datasets. Even giving us details about what causes problems (and your dataset layouts) is interesting.
How would I execute a query equivalent to "select top 10" in couch db?
For example I have a "schema" like so:
title body modified
and I want to select the last 10 modified documents.
As an added bonus if anyone can come up with a way to do the same only per category. So for:
title category body modified
return a list of latest 10 documents in each category.
I am just wondering if such a query is possible in couchdb.
To get the first 10 documents from your db you can use the limit query option.
E.g. calling
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?limit=10
You get the first 10 documents.
View rows are sorted by the key; adding descending=true in the querystring will reverse their order. You can also emit only the documents you are interested using again the querystring to select the keys you are interested.
So in your view you write your map function like:
function(doc) {
emit([doc.category, doc.modified], doc);
}
And you query it like this:
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?startkey=["youcategory"]&endkey=["youcategory", date_in_the_future]&limit=10&descending=true
here is what you need to do.
Map function
function(doc)
{
if (doc.category)
{
emit(['category', doc.category], doc.modified);
}
}
then you need a list function that groups them, you might be temped to abuse a reduce and do this, but it will probably throw errors because of not reducing fast enough with large sets of data.
function(head, req)
{
% this sort function assumes that modifed is a number
% and it sorts in descending order
function sortCategory(a,b) { b.value - a.value; }
var categories = {};
var category;
var id;
var row;
while (row = getRow())
{
if (!categories[row.key[0]])
{
categories[row.key[0]] = [];
}
categories[row.key[0]].push(row);
}
for (var cat in categories)
{
categories[cat].sort(sortCategory);
categories[cat] = categories[cat].slice(0,10);
}
send(toJSON(categories));
}
you can get all categories top 10 now with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories
and get the docs with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories?include_docs=true
now you can query this with a multiple range POST and limit which categories
curl -X POST http://localhost:5984/database/_design/doc/_list/top_ten/by_categories -d '{"keys":[["category1"],["category2",["category3"]]}'
you could also not hard code the 10 and pass the number in through the req variable.
Here is some more View/List trickery.
slight correction. it was not sorting untill I added the "return" keyword in your sortCategory function. It should be like this:
function sortCategory(a,b) { return b.value - a.value; }