How to get the number of clicks on a link from wikipedia? - string

Now we have the code that scrapes the links in an article. We need also the number of clicks on a link. Can some one help?
Sow far we have this code:
String[] articles = {"Abdominal_pain"};
void setup() {
for (int i = 0; i < articles.length; i++) {
String article = articles[i];
String start = "20160101"; // YYYYMMDD
String end = "20170101"; // YYYYMMDD
// documentation: https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
// >> https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&meta=&titles=Albert+Einstein&pllimit=500
String query = "https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&meta=&titles="+article+"&pllimit=500";
String[] lines = loadStrings(query);
for (int j = 0; j < lines.length; j++) {
String line = lines[j];
if (line.contains("\"title\":")) {
println(line);
// java string split
}
}
}
}

The query you're using apparently gives you a bunch of articles that your main article "Abdominal_pain" links to.
You need to go a step further and loop through all of those links. You can make your life a lot easier by using JSONObjects instead of parsing Strings like you're currently doing. Check out the loadJSONArray() function for more info, but basically you'd do this:
JSONArray links = loadJSONArray(query);
for (int i = 0; i < values.size(); i++) {
JSONObject link = values.getJSONObject(i);
String title = link.getString("title");
//fetch the info for that title
}
Once you have the title, you can then fetch the information for that page. An example query url is https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Abdominal_pain/daily/20151010/20151012 which returns this JSON:
{"items":[{"project":"en.wikipedia","article":"Abdominal_pain","granularity":"daily","timestamp":"2015101000","access":"all-access","agent":"all-agents","views":1134},{"project":"en.wikipedia","article":"Abdominal_pain","granularity":"daily","timestamp":"2015101100","access":"all-access","agent":"all-agents","views":1160},{"project":"en.wikipedia","article":"Abdominal_pain","granularity":"daily","timestamp":"2015101200","access":"all-access","agent":"all-agents","views":1313}]}
You'll have to do some aggregating to get the totals, or maybe the total is somewhere else in the API.
You're going to have to do a little bit of research on exactly what the API can return. Reading through the documentation is a big part of programming. Luckily the Wikipedia API has great documentation, and that's where you should be looking.
I'd recommend trying something out and posting another question, along with an MCVE, if you get stuck. Good luck.
See also: How to use Wikipedia API to get the page view statistics of a particular page in wikipedia?

Related

Inventory Assignment Sublist doesn't react to Suitescript

I'm currently having some trouble with Client-Side Scripting of an Inventory Detail Subrecord on an Assembly Build. As you know, Assembly Builds have two Inventory Details. The one in the top right corner works as expected, and I can access every field with Suitescript.
I'm having trouble with the bottom Inventory Detail though. I'm able to access it and use nlapiGetFieldValue() just fine. However, when I access the sublist, I am only able to look up the value of 'id'.
These are the fields that are supposed to exist, along with a less documented one called "receiptinventorynumber".
Here is my code:
//get the line items in the bottom inventory details
var bottom_line_items = [];
for(var line_index = 0; line_index < nlapiGetLineItemCount("component"); line_index++)
{
var bottom_inv_detail = nlapiViewLineItemSubrecord("component", 'componentinventorydetail', line_index+1);
if(bottom_inv_detail != null)
{
var bottom_line_count = bottom_inv_detail.getLineItemCount('inventoryassignment');
for(var index =0; index < bottom_line_count; index++)
{
bottom_inv_detail.selectLineItem('inventoryassignment', index+1);
var sn = bottom_inv_detail.getCurrentLineItemValue('inventoryassignment', 'receiptinventorynumber');
bottom_line_items.push(sn);
}
}
}
console.log(bottom_line_items);
Here is the result of executing it in the browser console:
As you can see, 'id', and 'internalid' work. 'receiptinventorynumber' does not. Neither do any of the other fields.
Because of my use case, I cannot wait for the record to be saved on the server. I have to catch this client side. Any suggestions are appreciated.
It has been a long time since I have worked with Inventory Detail subrecord, but I think there is another field called 'assigninventorynumber'. Have you tried using that?
Just was able to answer my own question, but it did end up involving a server-side search. I'm pretty sure it works as I wanted it to, but I am still involved in testing it. In essence, I grabbed the field 'issueinventorynumber', which had an id. That id was the internalid of a 'Inventory Serial Number', which I was able to perform a search for to get the actual number. Here's the resulting code:
//get the line items in the bottom inventory details
var bottom_line_ids = [];
for(var line_index = 0; line_index < nlapiGetLineItemCount("component"); line_index++)
{
var bottom_inv_detail = nlapiViewLineItemSubrecord("component", 'componentinventorydetail', line_index+1);
if(bottom_inv_detail != null)
{
var bottom_line_count = bottom_inv_detail.getLineItemCount('inventoryassignment');
for(var index =0; index < bottom_line_count; index++)
{
bottom_inv_detail.selectLineItem('inventoryassignment', index+1);
var sn = bottom_inv_detail.getCurrentLineItemValue('inventoryassignment', 'issueinventorynumber');
bottom_line_ids.push(sn);
}
}
}
//do search to identify numbers of bottom serial numbers
var columns = [new nlobjSearchColumn('inventorynumber')];
var filters = []
for(var index = 0; index < bottom_line_ids.length; index++)
{
filters.push(['internalid', 'is', bottom_line_ids[index]]);
filters.push('or');
}
//remove the last 'or'
if(filters.length > 0)
{
filters.pop();
}
var search = nlapiCreateSearch('inventorynumber', filters, columns);
var results = search.runSearch().getResults(0,1000);
bottom_line_items = []
if(results.length != bottom_line_ids.length)
{
//if you get to this point, pop an error as the 'issueinventorynumber' we pulled is associated with multiple serial numbers
//you can see which ones by doing a 'Inventory Serial Number' Saved Search
//this is a serious problem, so we'd have to figure out what to do from there
}
for(var index = 0; index < results.length; index++)
{
bottom_line_items.push(results[index].getValue('inventorynumber'));
}
console.log(bottom_line_items);

Distinct values in Azure Search Suggestions?

I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).
I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.
How can I solve this problem ?
The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.
See this forum thread for a more in-depth discussion.
Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.
I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).
My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:
public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
var suggestions = new List<MachineSuggestionDTO>();
var sp = new SuggestParameters
{
UseFuzzyMatching = true,
Top = 100 // Get maximum result for a chance to reduce search calls.
};
// Add searchfields if set
if (searchFields != null && searchFields.Count() != 0)
{
sp.SearchFields = searchFields;
}
// Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
while (suggestions.Count < take)
{
if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
{
// If no more suggestions is found, we break the while-loop
break;
}
}
// Since the list might me bigger then the take, we return a narrowed list
return suggestions.Take(take).ToList();
}
private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);
if(response.Results.Count > 0){
// Fix filter if search is triggered once more
if (!string.IsNullOrEmpty(sp.Filter))
{
sp.Filter += " and ";
}
foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
{
var d = result.Document;
suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });
// Add found UserIdentity to filter
sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
}
// Remove end of filter if it is run once more
if (sp.Filter.EndsWith(" and "))
{
sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
}
}
// Returns false if no more suggestions is found
return response.Results.Count > 0;
}
public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
{
SuggestParameters sp = new SuggestParameters()
{
UseFuzzyMatching = fuzzy,
Top = 100
};
if (highlights)
{
sp.HighlightPreTag = "<em>";
sp.HighlightPostTag = "</em>";
}
var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);
// Convert the suggest query results to a list that can be displayed in the client.
return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
}
After getting top 100 and using distinct it works for me.
You can use the Autocomplete API for that where does the grouping by default. However, if you need more fields together with the result, like, the partNo plus description it doesn't support it. The partNo will be distinct though.

Sitecore HOWTO: Search item bucket for items with specific values

I have an item bucket with more then 30 000 items inside. What I need is to quickly search items that have particular field set to particular value, or even better is to make something like SELECT WHERE fieldValue IN (1,2,3,4) statement. Are there any ready solutions?
I searched the web and the only thing I found is "Developer's Guide to Item
Buckets and Search" but there is no code examples.
You need something like this. The Bucket item is an IIndexable so it can be searched using Sitecore 7 search API.
This code snippet below can easily be adapted to meet your needs and it's just a question of modifying the where clause.if you need any further help with the sitecore 7 syntax just write a comment on the QuickStart blog post below and I'll get back to you.
var bucketItem = Sitecore.Context.Database.GetItem(bucketPath);
if (bucketItem != null && BucketManager.IsBucket(bucketItem))
{
using (var searchContext = ContentSearchManager.GetIndex(bucketItem as IIndexable).CreateSearchContext())
{
var result = searchContext.GetQueryable<SearchResultItem().Where(x => x.Name == itemName).FirstOrDefault();
if(result != null)
Context.Item = result.GetItem();
}
}
Further reading on my blog post here:
http://coreblimey.azurewebsites.net/sitecore-7-search-quick-start-guide/
Using Sitecore Content Editor:
Go to the bucket item then In search tab, start typing the following (replace fieldname and value with actual field name and value):
custom:fieldname|value
Then hit enter, you see the result of the query, you can multiple queries at once if you want.
Using Sitecore Content Search API:
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Linq;
using Sitecore.ContentSearch.SearchTypes;
using Sitecore.ContentSearch.Linq.Utilities
ID bucketItemID = "GUID of your bucket item";
ID templateID = "Guid of your item's template under bucket";
string values = "1,2,3,4,5";
using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
{
var predicate = PredicateBuilder.True<SearchResultItem>();
predicate = PredicateBuilder.And(item => item.TemplateId == new ID(templateID)
&& item.Paths.Contains(bucketItemID));
var innerPredicate = PredicateBuilder.False<SearchResultItem>();
foreach(string val in values.Split(','))
{
innerPredicate = PredicateBuilder.False<SearchResultItem>();
innerPredicate = innerPredicate.Or(item => item["FIELDNAME"] == val);
}
predicate = predicate.And(innerPredicate);
var result = predicate.GetResults();
List<Item> ResultsItems = new List<Item>();
foreach (var hit in result.Hits)
{
Item item = hit.Document.GetItem();
if(item !=null)
{
ResultsItems .Add(item);
}
}
}
The following links can give good start with the Search API:
http://www.fusionworkshop.co.uk/news-and-insight/tech-lab/sitecore-7-search-a-quickstart-guide#.VPw8AC4kWnI
https://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/06/sitecore-7-poco-explained.aspx
https://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/05/sitecore-7-predicate-builder.aspx
Hope this helps!

NodeJS for-loop unsuccessful at trimming urls that end in with numbers

I'm trying to take a group of Facebook Page urls and extract only the entity title of the page. Ie for 'https://www.facebook.com/BalanceSpaBoca' I'm looking only for 'BalanceSpaBoca.' This script works great for most of the sample data I'm using (the testFBurls array), printing only the trimmed string. For others, though, it prints both the trimmed string and the original string. It seems like all of the urls that get printed twice end with a string of numbers, but I'm not sure why that should make any difference in how the program runs.
var testFBurls = [
'http://www.facebook.com/pages/A-Yoga-Way/361702000576231',
'http://www.facebook.com/aztigurbansalon',
'https://www.facebook.com/pages/Azzurri-Salon-Spa/542579982495983',
'https://www.facebook.com/BalanceSpaBoca',
'https://www.facebook.com/BocaAmericanNailsandSpa',
'http://www.facebook.com/beachyogagirl',
'https://www.facebook.com/pages/Beauty-of-Wax/156355679240',
'http://www.facebook.com/beehivefitness.boca',
'https://www.facebook.com/pages/Believe-Day-Spa-Boutique/197615685896',
'https://www.facebook.com/photo.php?fbid=10151725966640897&set=a.10151725965355897.1073741828.197615685896&type=1&theater',
'http://facebook.com/pages/bigfoot-spa/1486364798260300',
'http://www.facebook.com/bloheartsyou',
'http://www.facebook.com/pages/The-Wellness-Center-Of-Boca-Raton/170371382995576',
'https://www.facebook.com/TherapyBodyBalanced',
'https://www.facebook.com/pages/BodyVital-Massage/177664492277158',
'https://www.facebook.com/bodyworkmall',
'https://www.facebook.com/pages/The-Bombay-Room-Yoga-Studio/148731658497764',
];
var possibleFBurlStarts = [
"https://www.facebook.com/",
"http://www.facebook.com/",
"https://www.facebook.com/pages/",
"http://www.facebook.com/pages/",
];
for (var count=0; count<testFBurls.length; count++){
var currentURL = testFBurls[count];
if (currentURL.indexOf(".com/photo") > -1) {
testFBurls.splice(i, 1);
i--;
}
for (var i=0; i < possibleFBurlStarts.length; i++){
var indexOfSubstring = currentURL.indexOf(possibleFBurlStarts[i]);
if (indexOfSubstring > -1) {
var res = currentURL.replace(possibleFBurlStarts[i], "");
}
}
if (count == testFBurls.length-1){
console.log(testFBurls);
}
}
Here's my console output
pages/A-Yoga-Way/361702000576231
A-Yoga-Way/361702000576231
aztigurbansalon
pages/Azzurri-Salon-Spa/542579982495983
Azzurri-Salon-Spa/542579982495983
BalanceSpaBoca
BocaAmericanNailsandSpa
beachyogagirl
pages/Beauty-of-Wax/156355679240
Beauty-of-Wax/156355679240
beehivefitness.boca
pages/Believe-Day-Spa-Boutique/197615685896
Believe-Day-Spa-Boutique/197615685896
bloheartsyou
pages/The-Wellness-Center-Of-Boca-Raton/170371382995576
The-Wellness-Center-Of-Boca-Raton/170371382995576
TherapyBodyBalanced
pages/BodyVital-Massage/177664492277158
BodyVital-Massage/177664492277158
bodyworkmall
pages/The-Bombay-Room-Yoga-Studio/148731658497764
The-Bombay-Room-Yoga-Studio/148731658497764
Notice that the first url is listed twice (first in its original form, and secondly in its truncated form), but then the second url (the third line in the output) is listed in truncated form alone. Any ideas what is causing this disparity? Only the truncated url should be printed.
You're modifying the array you're iterating through while you're iterating through it: testFBurls.splice(i, 1); which is typically a not-great thing to do. In any case, I think you should be able to accomplish your goal a lot easier with a simple regular expression:
for (var count=0; count<testFBurls.length; count++){
var matches = testFBurls[count].match(/^https?\:\/\/www\.facebook\.com\/(?:pages\/)?([^\/]+)/);
if (matches) {
console.log('found it:', matches[1]);
}
}

how best managing paging with subsonic 3.003

I'm really engaged with subsonic but I'm not sure how make it work with paging
I mean how can I get "the page" in a list or how is the best way to managing
the total table in my base, page by page
You'll see I tried three things:
m02colegio is an class generated from activerecord
IList<m02colegio> loscolegios;
loscolegios = m02colegio.GetPaged(0, 80).ToList();
----------- and:
SubSonic.Schema.PagedList<m02colegio> loscolegios;
loscolegios = m02colegio.GetPaged(0, 80);
----------- and:
var paged = m02colegio.GetPaged(0,80).All<m02colegio>(x=>x.m02ccolnom.Contains(" "));
// 'cause i dont know how to tell it to consider all records
loscolegios = m02colegio.All().ToList();
but after every try I don't get any exception and loscolegios always is NULL
I need to access the records in this manner
so, what is the best way?
how can I get the first page and then how advance among pages??
public ActionResult Index(int? page)
{
if (!validateInt(page.ToString()))
page = 0;
else
page = page - 1;
if (page < 0) page = 0;
const int pagesize = 9;
IQueryable<m02colegio> Mym02colegio = m02colegio.All().Where(x => x.category == "test").OrderBy(x => x.id);
ViewData["numpages"] = m02colegio.All().Where(x => x.category == "test").OrderBy(x => x.id).Count() / pagesize;
ViewData["curpage"] = page;
return View(new PagedList<material>(Mym02colegio, page ?? 0, pagesize));
}
that is in a MVC sense however it gives you the idea, Index accepts a null or a page number
you get all the records then return a pagelist of the records you got.
I'm not sure if this is a bug that's been fixed in the current github source or if it's by design but I've found that GetPaged only works with a 1 based index for the first argument. So if you do the following you should find it works as you'd expect:
IList<m02colegio> loscolegios = m02colegio.GetPaged(1, 80);

Resources