How can I search for code fragments on github.com? When I search for MSG_PREPARE in the repository ErikZalm/Marlin github shows up nothing.
I'm using the repository code search syntax described on https://github.com/search with
repo:ErikZalm/Marlin MSG_PREPARE
No results, but MSG_PREPARE can be found inside this repository here. Am I missing something? Is there no code search on github.com?
At the time of writing this answer, compared to time this question was asked i.e. about 8 years ago, github has come a good way, though still not to the length which you are looking at.
GitHub code searches are limited on the following rules: https://docs.github.com/en/github/searching-for-information-on-github/searching-code . Quoting the same:
Code in forks is only searchable if the fork has more stars than the parent repository.
Forks with fewer stars than the parent repository are not indexed for code search.
To include forks with more stars than their parent in the search results, you will need to add fork:true or fork:only to your query.
For more information, see "Searching in forks."
So we can search within the fork using the fork:true option, though as expected, since the repo ErikZalm/Marlin is low on star count as compared to parent MarlinFirmware/Marlin, the code in the fork is still not indexed. Hence the advance search shows no good except a match to the repo.
Though, if you perform the same search on the parent, it would show the matches on the code. Here are the matches for MSG_PREPARE in the parent repo MarlinFirmware/Marlin
Fortunately, one company which I know working on this domain is SourceGraph: https://about.sourcegraph.com/
Hence, you can easily search what you intended with SourceGraph:
Here are the matches for MSG_PREPARE in the ErikZalm/Marlin using SourceGraph Cloud
Update July 2013: "Preview the new Search API"
The GitHub search API on code now supports fragments, through text-match metadata.
Some API consumers will want to highlight the matching search terms when displaying search results. The API offers additional metadata to support this use case. To get this metadata in your search results, specify the text-match media type in your Accept header. For example, via curl, the above query would look like this:
curl -H 'Accept: application/vnd.github.preview.text-match+json' \
https://api.github.com/search/code?q=octokit+in:file+extension:gemspec+-repo:octokit/octokit.rb&sort=indexed
This produces the same JSON payload as above, with an extra key called text_matches, an array of objects. These objects provide information such as the position of your search terms within the text, as well as the property that included the search term.
Original answer (November 2012)
I don't think there is anything that you would have missed.
If you search for SdFile, you would find results in .pde file, but none in cpp files like in this SdFile.cpp file.
The search was introduced 4 years ago (November 2008), but, as mentioned in "Search a github repository for the file defining a given function", GitHub repository code is simply not fully indexed.
Related
I used to use Bugzilla and its very powerful search engine.
But the project and its bug tracker have been moved to Gitlab.
When trying to search (in online Gitlab), for the project, all issues whose title includes some item like "./", or ".*." (Kronecker product), or "//" (1-line comments), etc, no issue is returned, while many issues matching the query actually exist! I tried with "\.\*\." and other trials, with no more success.
What should be the query syntax to return the right list?
When querying "operator" (with the double quotes for exact matching), when validating the query, quotes disappear, and i get a list of issues whose title includes operand, or operation, or oper, etc. How can i get only issues exactly matching "operator" ?
Is is possible to filter issues with a title matching a regular expression?
All this (and much more) was possible and very useful with Bugzilla. And for the time being, i am quite handicapped and loose a lot of time when Searching things for the project on Gitlab.
Thanks for any hints.
I would like to retrieve users with repositories that contain a README file that contains text that is matched by a string passed in the query. Is this possible using the GitHub API?
In addition, I would like to include location and language in the query.
thanks.
This is not straightforward using the available API now. However, you can use the API to get what you want.
Be warned that there are over 10 million repositories on Github - it will take a long time. As you can only retrieve a list of 100 repositories per query, you need to use pagination -> more than 100000 requests to get all the repositories. A user is limited to 5000 requests per hour, then you are "banned" for another hour. This will take more than 40 hours, if you're using just one user credentials.
Steps:
Get the JSON with all the repositories (https://developer.github.com/v3/repos/#list-all-public-repositories)
Use pagination to fetch 100 objects per query (https://developer.github.com/v3/#link-header)
Decode the json and retrieve the list of repositories
For each repository you need to get the repository url object from the JSON, which gives you the link to the repository.
Now you need to get the README contents. There are two ways :
a) You use the Github API, by using the repo url and sending a GET request for : https://api.github.com/repos/:owner/:repo/readme( https://developer.github.com/v3/repos/contents/#get-the-readme) and then either decode the file (it is encoded using Base64) or you follow the html property of the JSON e.g "html": "https://github.com/pengwynn/octokit/blob/master/README.md". If there is no README, you will get a 404 Not found code, so you can easily proceed to the next repository.
b) You just make the URL for the README using step 4 that gives you e.g. https://api.github.com/repos/octocat/Hello-World ; and you parse it and transform it into https://github.com/octocat/Hello-World/README.MD ; however this would be more complicated, in case there is no README.
Search through the file for your specific text, and record or not if you have found the text.
Iterate until you went through all the repositories.
Advanced things - if you plan on running this more often, I can strongly recommend to use caching https://developer.github.com/v3/#conditional-requests ; You basically store the date + time when you have done the query, and use it later to see if anything has changed in the repository. This will eliminate many of your subsequent queries if you need to have an up-to-date information. You will still have to retrieve the whole list of repositories though. (but then you only do your search for updated repositories)
Of course to make it faster, you can improve this algorithm to make it parallel - you retrieve 100 repositories, then proceed to retrieve the next 100, and in the meanwhile you search if the first 100 repositories contain a README file and if that readme has what you are searching for, and so on. This will make things faster, most certainly. You will need to use some sort of a buffer, as you do not know which terminates faster (getting the repositories list, or searching through them)
Hope it helps.
I'm not sure if this is the right forum for this question. Saw quite a few Q&A related to search on GitHub, hence posting here.
E.g. Search code inside a Github project
GitHub advanced search allows terms like stars:>100 but the query term is restricted to repository names only. Is it possible to search for a term inside the files (code) & sort by stars? My aim is to see which popular repos are using a particular keyword in their code. It would be very useful if GitHub's advanced search options for Repositories worked for Code also.
It is not possible to sort by stars when searching inside code.
From an e-mail from github.com support:
Code search does not support sorting by number of stars, but I will definitely add your +1 to that suggestion internally! I can't say if or when a change will happen, but your feedback is in the right hands.
You may want to vote for this feature: Add Stars count filter.
The feedback is related to this announcement: Improving GitHub code search
I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).
For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.
Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.
One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?
I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.
I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.
This is actually something I tried for a project recently (using BeautifulSoup and Python). The solution that worked for me was to workout CSS selectors (which can map to jQuery selectors) that targeted the elements that contained the values I was looking for. In my case I was able to narrow down the full document to just the elements that contained what I was looking for but if you couldn't get exactly what you where after you could combine this with some extra lactic like test to see if it looks like a price (via regex) or test what it is next to.
When using Advanced Database Crawler for searching in Sitecore is it possible to combine a FieldValueSearchParam with a NumericRangeSearchParam.
For example, I would like to search for all items with a price between 100 and 200 (NumericRangeSearchParam) and in category t-shirts (FieldValueSearchParam).
I can add refinements using RelatedIds and TemplateIds but that is not enough as i need to check if it is in a specific field using:
refinements.Add("category", id);
Yes, all types of search parameters in a combined query are possible with the new version of the ADC, v2
Here are some links to get you started:
SVN source code for the v2 branch (the latest version)
A video by the author, Alex Shyba, on tools he's been working on. At some point in this video he specifically demos the features of the v2 code base and how the code works. One example is him demoing combining different search params and being able to use logical operators like AND and OR with them
Here's a direct link to a demo page (and code behind) in the above referenced source code that shows combining several types of search together. You should use this as a reference to the video example above.