How to set up a local DBPedia Server and using it - node.js

I am wondering if there is a tutorial on how i could get a local copy of the dbpedia knowledge base up and running.
I understand that i'll have to download all the .nt files that i would like to include from the dbpedia website. But i don't really get how to utilize them.
From other questions i found out, that there are different interfaces of different languages for using this. But i don't know how to find the right one for node.js or java. I found a lot of libraries that use rdf data and sparql query language but i don't get how to connect them with the .nt files.
Can anyone give a short introduction on where to start to setup the knowledge base for i.e. node.js ?

Related

How to write own video service?

I need to write own video hosting with player on client side.
My required approaches:
The user can upload video to hosting
The user can watch any video from hosting
I don't ask to write me solution, I am asking for help where should I start from to lean about it? Which technologies or frameworks should I learn for my task to realize it using python?
P.S. Each detail will be very useful, especially some links to articles because I couldn't find by myself not knowing accurately what do I need to search.
Added
Now, I think to store videos in the file system directly and use postgresql to store additional information about videos and users. Of course, large services use Hadoop, BigTable and etc but for my task so solution will be enough I think.
When the user uploaded a new video, my server saves it into a temporary directory and puts in the processing queue. Small programs takes new videos one by one, generates thumbnails and decrease a quality of videos and moves it to the base storage. Is it a good idea?
But I still can't get how make a video streaming
Ok So I dont want to encourage the behavior of people thinking SO is a codewriting service, But this is a truly legitimate answer. So first of all, you want to choose a language. Currently I'd recommend the use of javascript and Node.js (Java needs to die). However, IDK Node as well as I know Python. Python is an all purpose language yadda yadda yadda blah blah blah. Whats important in this case is your framework of choice (or library). The libraries that allow you to make websites in python (or make it easier to do so) are very interesting. There are several but my favorite is Python Flask. Python flask is actually very similar to Node.js + Express.js. Use this link to get started. Take a few days to learn the insides of this framework. VERY MODULAR, VERY POWERFUL. Using basing logic and database knowledge, one can easily accomplish a simple file upload and authentication-using, web service. However I Know of 2 really good guides that will help you with the streaming of the videos. I mean yes, You don't really need to know this. You could potentially load the requested video using a <video> tag, but streaming is a much MUCH
more favorable solution. Take some time to learn about video streaming and compression, and after you think about it, check out these links: AUDIO STREAMING GIST and MIGUEL GRINBERG FLASK VIDEO STREAMING BLOG POST
Good luck with flask and
Pro tip: learn about http(s) and the get and post methods
You would never imagine how many times I struggled with a bad request error or a method not allowed because I didn't do my research

What skill set is needed to set up Solr or ElasticSearch?

Two clients of mine are evaluating setting up a search server, either Solr or ElasticSearch. We're wondering what programming languages (if any) and development environments are necessary to get the search servers running. Can it be done by people mostly familiar with front end technologies (HTML/CSS/JavaScript) or is more serious coding skill needed (e.g. understanding of multithreading/ advanced debugging/ other "pro-level" concepts)?
If only light programming skills are needed I'm playing with the thought of suggesting to set it up myself. I have very little Java knowledge but have basic understanding of C, ActionScript, Pascal and even Simula in addition to aforementioned front end technologies. I know basic search architecture from my time in FAST (an enterprise search vendor).
Best, Bjørn
Bit of a broad question but i'll try to give it a shot:
You don't need any programming language in particular. They're both stand alone servers which have API's which are addressable from any programming language.
ElasticSearch has a really nice API that's JSON/REST based.
SOLR's API is a lot more clunky, but also supports XML.
(If I have a choice I tend to go for ElasticSearch, unless there's a really specialized feature I need that's only in SOLR).
Getting up and running doesn't really require any knowledge of any programming language in particular.
The only time you NEED java is when you decide you end up needing custom plugins to SOLR/ElasticSearch itself.
You don't need any specific IDE's beyond those matching your programming language of choice.
When trying to figure out what's going on inside my elasitc search server I do like elastic search HEAD:
http://mobz.github.io/elasticsearch-head/
Hope this helps.
As pointed out already, this is quite a broad question, most likely get closed. But I'll give it a go too.
Both ElasticSearch and Solr are quite easy to get started with. They come as a zip/tar.gz archive that you can extract.
Both require JVM, so you need Java setup.
Once setup, playing with either is quite easy, you do not need any advanced programming skills to play around with it. Solr comes with an Admin UI page, that allows you to execute queries.
Elastic Search has clients as #Constantijin has pointed out. Elastic-head is an excellent choice.
You will need quite a detailed understanding of the Lucene ecosystem, its architecture, plugins etc. Given you have an understanding of another Search Engine, the concepts around indexing and text processing should be easy enough for you.
If you want to write something more advanced than the Admin UI, and you can use Javascript.
You can use AjaxSolr for making ajax requests to your Solr instance
For ElasticSearch, you can try using Elastic.js.
Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary.
However, Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:
A distributed real-time document store where every field is indexed and searchable
A distributed search engine with real-time analytics
Capable of scaling to hundreds of servers and petabytes of structured and
unstructured data
I would like add more details regarding how to used ElasticSearch in php language check out - http://www.multidots.com/what-is-elasticsearch
[How to integrate ElasticSearch with PHP?][1]
By using curl, you can use ElasticSearch with your favorite programming language. Here is the example of simple curl request with ElasticSearch.
- PHP Sample Script:
You can find PHP client api on github:
[https://github.com/elastic/elasticsearch-php][2]
Check out Best Article on Elasticsearch - http://www.multidots.com/what-is-elasticsearch

searching and retrieving data using node.js?

I was wondering is node.js good fit for searching massive amount of data, i know its main use is for asynchronous sceanrious like chat, ftp and real time etc. I was thinking of using node.js with mongodb to search 300,000 records of books for the library at my university, and see if it would oppose to using php & mysql. any advice would be great thanks.
Node.js would be a fine application interface for searching data .. but practically, so would PHP or many other languages :).
Your backend data storage solution (MySQL, MongoDB, ..) is a harder choice and really depends on the how you want to index and search the data.
If your main goal is search you probably want to look into a search application based on something like Apache Lucene. These typically use a relational database backend, although some newer efforts like ElasticSearch do have growing community support for ingesting data from sources like MongoDB (ref: MongoDB River Plugin for ElasticSearch).
Since you mentioned book search and libraries, you might also want to look into ILS (Integrated Library Search) applications which may already solve that problem. There are several open source products such as Koha and Evergreen.
Look at MongooseJS
Absolute perfect fit in my opinion.

Suggestions for designing a search system for files stored in S3

We are working for a client to redesign an existing system which basicaly deals with a lot of files.
The files(more than 5 million) are currently stored on the servers filesystem.The client wants the new system to store the file in S3.
The files also have metadata associated(name,authors name,price ,description etc.).
The search functionality is also to be redesigned.The following are the basic requirements
Full text search should be available on file descriptions.
Filtering should be possible on other attributes of files.
Also , based on the file description, the system should also be able to give recommendation for similar files.
I do not have experience with creating such solution before,so asking for help and suggestion.
I was thinking on the lines of following solutions:
Store the file meta data in MongoDB ,and use the search functionality (http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo)
Use Amazon DynamoDB.It provides api to scan/query the dataset.
Use Lucene/Solr(I havent worked with these yet,I still need to look deeper)
There was this project that I found,that is very similar to what I require
http://www.thriftdb.com - On the home page it says its a datastore with search builtin.
Please let me know if this question should be a community wiki.
Thanks in advance.
You're in luck, announced today:
http://aws.amazon.com/about-aws/whats-new/2012/04/11/aws-announces-cloudsearch/
About searching files and filtering by attributes, the best would be Sphinx Search Engine which is used in filestube (google was using it also years ago).
I dont know if it will work on amazon servers.
Amazon has a custom AMI for Lucene/Solr and we have been happily using it in our projects. Lucene has a powerful indexing capability and executes at exceptional speeds. I would strongly recommend using Apache Lucene/Solr for all your search needs.

Best text search engine for integrating with custom web app?

We have a web app that allows users to upload documents, create their own documents, and so on. Uploaded files are stored on Amazon S3, created information is stored in a MySQL database. What I'm looking for is some sort of search engine, where I feed it all of our text documents, each with a unique ID, and it builds an index or whatever. Later, I can give it search queries, and it will pull out the best matching documents (via their ID), along with snippets of matching text.
Basically we want to allow our users to search through their repository of uploaded stuffs, along with anything that other users have marked as public. The solution should run on a standard Linux server, and ideally it would be open source, but I'll also consider paid solutions if they aren't outrageously priced.
So far, I've found three potential candidates:
MySQL Full Text Search - some reports I've read are that it's very slow
Apache Lucene - unfortunately written in Java, but I'll use it if I have to. Supposedly fast
Sphinx - doesn't seem to be as popular, ideally whatever solution I find will have lots of community support.
Please let me know if there are any other good choices that I've overlooked, or if you have experience with any of the above.
Take a look at Solr. It's based on Lucene, so it's very fast, and it's really easy to use from any platform.
Sphinx may be worth your consideration, as it works well with several common RDMS (notably MySQL)
There is also Xapian which is fast and is quite customizable.
It has support for custom indexers allowing one to index data that is not stored in a database which might be useful for your documents stored on S3.
I imagine that Google will have a solution that meets your needs. Start here: Google Enterprise
There is a Ruby port of Lucene called "Ferret". In addition to the Ruby API, you can get at the underlying c implementation called "cFerret".
Lucene is very good. And although it was originally written in java there is a php implementation http://framework.zend.com/manual/en/zend.search.lucene.html

Resources