Which is the correct way to to retrieve all the informations from a dbpedia page? - dbpedia

I am working with dbpedia. In my work, my program need to read a dbpedia json file like(http://dbpedia.org/data/Germany.json) and extract all the information as a key value pair, same as dbpedia page(http://dbpedia.org/page/Germany). But I am facing some problem. For example, if you see the json file(please use some json viewer to make it human readable.), if i want to get the language(search language in the file), you will see it is in the json array, so i have to extract that information from the Array. On the other hand, if you search seeAlso, then you will find that you have to go one level up and find the information. Further more , there are some information in the HTML page(http://dbpedia.org/page/Germany) but that is not found in the metadata json
file(http://dbpedia.org/data/Germany.json). For example, "birthPlace" is in html page but not in the json file. I am totally confused that, how i will code that can read and store(as key value mapping) the data as like as seen in the html page.

DBpedia data is organized by resource, where each "resource" is a page on Wikipedia and (presumably) a thing in the real world. Each resource is referred to with a URL. The JSON file contains a whole bunch of resources (such as http://dbpedia.org/resource/Opel_Kadett_C) that have some link with the resource you're interested in, http://dbpedia.org/resource/Germany. I think this is supposed to include all the information at http://dbpedia.org/page/Germany, but clearly some entries -- such as db:Anja_Kling -- are missing. I'm not sure why that is, but it might be a bug -- if you don't get a better answer here, you should try e-mailing your questions to the dbpedia-discussion mailing list at https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion. Hope that helps!

Related

Assign multiple photos to blog article in Shopify

Is it possible to extend Shopify objects in Liquid? I'm trying to find a way to have there be more than 1 photo associated with articles belonging to a specific blog. I know I can allow users to upload more photos using the settings schema, but I need to access the photo URLs outside of the Blog template/section the same way I'm able to access its direct properties (something like article.images[2]). My understanding is that anything saved from Settings is only accessible from within the Section in which it was defined. Is that accurate?
I have the idea of saving a list of URLs as the article's content and just parse them out of article.content (and hide the list using CSS when the page is displayed), but I'm not seeing any way to do the parsing (no regEx).
I thought of using tags too, but there will be hundreds of articles and potentially several images associated with each article. I"m not sure if there is a max number of tags, but even if there isn't, it seems hack-y (and probably a bit inefficient to create tags that aren't shared by multiple articles. IDK...
Does anyone have any ideas for a good way to do this?
Metafields are done for these type of cases:
https://shopify.dev/api/storefront/2022-01/objects/metafield

How to test a rdfa parser?

I'm trying to find a way to check if my rdfa-parser (written in nodejs) is working.
So I have an rdfa-parser, which should print all triples, found in a file or url (with rdfa-syntax).
So far I know, that there are testsuits for RDFa-parsing (http://rdfa.info/test-suite/rdfa1.1/html5/manifest), but I'm not sure how to use them.
Is there a good webpage, where this is described? Or can anyone help me in another way?
There should be some information at the rdfa.info/tests site. Basically, you need a service that will accept a GET request, where the "uri" query parameter points to the input file. The service then parses the file, and returns some other form of RDF, typically N-Triples. More information on the Github page: https://github.com/rdfa/rdfa-website/blob/master/README.md

Getting thumbnails in OpenSearchServer search results

I need an alternative to Google Custom Search for a website I look after, it has to be something that will crawl a website, index it, allow fiddling of priorities, and then allow search queries via REST or something similar and return XML or JSON etc. It needs to run on a Windows Server instance.
So, I'm up and running with http://www.opensearchserver.com/ and it seems to do the trick, but can't, for the life of me, work out how to get thumbnail images in the results? I've searched the documentation and read everything I could, but can't find out how to do this (or how to get my head around it).
I'm crawling standard web pages and they all have thumbnail meta data, which I'm assuming should be able to be parsed somehow for results and included in the JSON results?
Any pointers at all would be very helpful, thanks!
I figured this out, in case anyone else is struggling, here's how I did it. The answer is in the documentations, it's just not that simple.
Read: http://www.opensearchserver.com/documentation/faq/crawling/how_to_extract_specific_information_from_web_pages.md - it contains the method
Assume you set up a 'web crawler' index.
Assuming you're using a meta thumbnail like this:
<meta name="thumbnail" content="http://my_cdn.com/news/images/29637.jpg">
Go into Schema / Fields. Add a new field called 'thumbnail' with index no, store yes, vector no, analyser Text, copy of blank. Save that.
Now go to schema / parser list, edit HTML parser. Go to 'field mapping', now add a new regex for the thumbnail in the html. We map from the 'htmlSource' to the thumbnail' with the matching regex.
My imperfect regex (that works though) is:
htmlSource -> linked in: thumbnail -> captured by:
(?s)<meta name="thumbnail" content="(.*?)">
Now SAVE this and go to crawl/manual crawl, enter a url that has a thumbnail and then check if the field now appears in the list below when it's read. If not check your regex, and check you actually saved the HTML Parser changes.
To get the thumb in your results, simply add the fieldname to the JSON you send with the query:
"returnedFields": [ "
"url",
"thumbnail"
],

Extract parameters and result contents from website

I have a website where I can input a list of strings and it'll display the results of each in the same format (basically a table).
What I want to do is to be able to save the results as well as their corresponding parameters (the input string that I searched) and output them into a file to analyze later. So basically capture my input and the output it returns. It's kind of like, if I search "stack" on google, I want my output file to be "stack" and all the displayed results from the search.
I've done some research on web and screen scraping, but I can't find anything that fits my needs. I looked into the curl function in php, but it looks like it can only get the contents of a specific URL, which I don't have since I'll be repeating the searches frequently.
I also looked into the HTML Agility Pack and HttpWatch, but they don't seem to be able to extract contents this dynamically.
I was wondering if there are any ideas or tips that I could use. I was thinking maybe a plugin or application that I could write that captures the parameters of my request (input strings) and the results sent from the server, but I'm not really sure how to do this, any tips? Or maybe there's an existing one that I wasn't able to find?
Thanks in advance!

Data Set that can be used for statistics

I need some raw data to visualize it with google charts and some other APIs. Problem is that i some raw data that includes timestamps too.
For example visitors visiting a website i.e. from which device (mobile/computer etc) they accessed website, at what time (hours:minutes:second:miliseconds) and what which links they visited etc. Please help me to know if someone knows about such kind of dummy raw data on web.
You can build your own dataset using Google Spreadsheets.
For example, consider the spreadsheet from the link below:
https://docs.google.com/spreadsheet/pub?key=0Aj9J3uCNjN9_dG1rdmNtTlhyNWpkTUVHVHBwRzNWX2c&output=html
If you tweak the link, it can provide you with the JSON representation of the data.
https://docs.google.com/spreadsheet/tq?key=0Aj9J3uCNjN9_dG1rdmNtTlhyNWpkTUVHVHBwRzNWX2c&pub1
Basically, what you have to do in order to get the JSON response is replace the "pub" element to "tq" and remove the "output=html" element at the end adding "pub1" instead.
With this procedure you should be able to create your own datasource for your tests.
You can find more information on the Google Chart API documentation:
https://developers.google.com/chart/interactive/docs/spreadsheets
Hope it helps

Resources