DBPedia Identifying URIs - dbpedia

Currently I'm using the dbpedia ontology ontology
to identify the type hierarchy mapping based properties to extract attributes from the entities and to identify relationships between them. To identify the type of an entity i use the dataset infobox types / instance types Using the java.net.URlDecoder for identifying an entity's name I get a lot of not found errors in the batch chain i build. Is there anything I'm doing wrong on a conceptual level? (Basically it works but some entities can't be resolved to a type defaulting in unknown). Do I need to use any additional datasets for identiying dbpedia types?
Update for clarity: I'm using the type instances like this:
<http://dbpedia.org/resource/Autism> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Disease> .
I'm extracting Autism and Disease to establish that Autism is a disease. To map the attributes to autism I'm using:
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/diseasesdb> "1142"#en .
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/icd9> "299.00"#en .
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/omim> "209850"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/medlineplus> "001526"#en .
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/emedicineSubject> "med"#en .
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/emedicineTopic> "3202"#en .
<http://dbpedia.org/resource/Autism> <http://dbpedia.org/ontology/meshId> "D001321"#en .
<http://dbpedia.org/resource/Autism> <http://xmlns.com/foaf/0.1/name> "Autism"#en .
If the last statement in the triple is not an attribute I'm creating a relationship between those elements. What should I do to identify an URI?
<http://dbpedia.org/resource/Autism>
Just stripping off the tags and extracting the word Autism doesn't seem to to the deal?

Related

JDL pattern is not correct in Java #Pattern

When I applied pattern in JDL, the generated entity classes has #Patternannotation, but the value for that annotation is not the exact pattern which applied in JDL.
For example, if I've defined patterns as pattern('/[^\\s]+.*[^\\s]+/') and in java
it reflects as
#Pattern(regexp = "[^\\\\s]+.*[^\\\\s]+")
If you noticed in java class, there are 4 (slash) which indeed should be 2 only. Because of this functionality is getting failed.
It looks to me like you are trying to use regex control characters in your pattern, which do not need to be doubled up in your JDL: see https://www.jhipster.tech/jdl/entities-fields, especially the part under "Regular Expressions" where it says: "/.../ the pattern is declared inside two slashes... \ anti-slashes needn’t be escaped"
So it's acting correctly. Since you have double-backslants in your JDL, Java is correctly interpreting it with quadruple-backslants. Your solution is just to use single backslants in your JDL, as per the documentation.

How do I combine lenses which contain context?

I'm starting from authorityL via authorityHostL to hostBSL - I know you can combine lenses via
(authorityL . authorityHostL . hostBSL)
but that fails with Couldn't match type ‘Authority’ with ‘Maybe Authority’. How do I properly deal with the Maybe here?
You can add a _Just in between to only focus in on the successful results.
(authorityL . _Just . authorityHostL . hostBSL)
does the trick like it's said in the comments.

How to extract DBPedia categories through DBPedia Spotlight?

I'm trying to extract the types and their respective levels from an entity named through DBPediaSpotlight. I already looked in forums, the documentation of the git hub and found nothing. I would like to know one way to do this extraction. Thank you!
Given that your desired root is <http://www.w3.org/2002/07/owl#Thing>, you're actually looking for the rdf:type tree (not Wikipedia Categories, as such).
The typing of <http://dbpedia.org/resource/Semantic_Web> seems a bit odd, so I've used <http://dbpedia.org/resource/Cat> below. You'll note that the data does not always include a tree of the sort you wish.
This will get explicit rdf:type statements --
SELECT ?type
WHERE
{ <http://dbpedia.org/resource/Cat> a ?type
}
-- and this will climb to the top of any rdf:type trees --
SELECT ?type
WHERE
{ <http://dbpedia.org/resource/Cat> a+ ?type
}
A query to build the full tree would be rather more complex, but is entirely possible.
As mentioned here, you may need this in SPARQL to fetch the categories from DBpedia URI
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT ?subject
WHERE { dbr:Semantic_Web dct:subject ?subject }
LIMIT 100
which might be retrieved in various serializations.
For example in JSON

How to create a union of FSTs from an FST archive (FAR)?

I currently have a (natural language) corpus, and these are the steps already taken:
Generated the symbol table after concatenating the corpus into one big file:
$ ngramsymbols <corpus.txt >corpus.syms
Given this symbol table, converted the corpus to a binary FST archive (FAR):
$ farcompilestrings -symbols=corpus.syms -keep_symbols=1 corpus.txt > corpus.far
I want to take the union of all the FSTs in the FAR, and compute the highest-weight path from start state to final state. To test from shell, this is what I did:
$ farextract corpus.far # generates fst files corpus-01, corpus-02, ...
$ fstarcsort --sort_type=olabel corpus.txt-01 1.fst
$ fstarcsort --sort_type=ilabel corpus.txt-02 2.fst
$ fstunion 1.fst 2.fst 12.fst
But I keep running into the following error:
WARNING: CompatSymbols: first symbol table present but second missing
ERROR: Union: input/output symbol tables of 1st argument do not match input/output symbol tables of 2nd argument
This error, of course, persists if I try to run a binary operation without sorting the FSTs first.
I think I am not sorting the FSTs correctly, or ... I have completely misunderstood how to use the symbol tables. Any idea why the union (or any other binary operation, for that matter) is failing like this?
When you extract the components from the far archive the symbol table is attached to the first fst from the archive. When combining FSTs the symbols table embedded into the individual FSTs an need to match each other. For example, the union operation would need the input symbols across the components to be the same each other, and the output symbol across the components to be the same each other. Composition needd the output symbols of the left machine to match the input symbols of the right machine.
You can clear symbols from an FST using the fstsymbols command:
fstsymbols --clear_isymbols ---clear_osymbols with-syms.fst > no-syms.fst
Removing the symbols from corpus.txt-01 should solve this problem. Alternatively, you can compile the far file without the --keep_symbol flag.
For the union command you don't need sort the arcs from the component machines before combing them, however you would normally need to sort them before composing them.
If you text corpus is large you might find it much quicker just to directly construct
the unioned FST direcly from the text file using the C++ interface or some other bindings such as pyfst.

XSD Problem: How to only restrict an element/attribute to be used under another schema

I need to solve this and looks like I need help.
Here is the problem definition
We have an existing schema X [X is an industry standard schema] for which we are building some extensions in a new schema Y (with a different target namespace].
Now the problem is we want to restrict usage of the elements/attributes of Schema Y to be only as members of defined elements/types of X. [schema validation should fail in case of invalid usage].
How do we achieve this? What is the best way to do this?
RM
I'd say it depends on how much you are modifying and what the schemas look like. One aspect will be if your extensions are near the root or near the leafs.
Here is a general approach for near the root.
X.xsd
element name=foo type=fooType
complexType fooType
sequence
element name=bar type=BarType
Y.xsd
import X.xsd namespace=xns
element name=foo type=foo2Type
complexType foo2Type
sequence
element name=bar type=xns:BarType
element name=baz type=BazType
here you have added a new element baz of your own definition but bar will contain all the children required by the industry standard.
Import X into Y (with the import element).

Resources