What changed in the SPARQL code for this parliamentary term? - text

I have been successfully retrieving data from the following Open-Link Dataset: http://linkedpolitics.ops.few.vu.nl/web/html/home.html
for the 5th, 6th and 7th parliamentary term of the EP which I am then cleaning in STATA.
However, the coding seems to differ for the 8th term because I get a lot less speeches when I use the lpv:translatedText function that I have used before. I can't help but think that a LOT more should come up in the timeframe I am specifying than what the SPARQL endpoint returns. Can anyone help me figure out what I am doing wrong?
Here is the code I used for National parties (here with the dates for anything after the 7th term):
SELECT DISTINCT ?name ?countryname ?birth ?gender ?partyname ?start ?end ?date ?speechnr ?parlterm ?dictionary
WHERE {
?speech lpv:translatedText ?text.
?speech dcterms:date ?date.
?speech lpv:docno ?speechnr.
?speech lpv:speaker ?speaker.
?speaker lpv:name ?name.
?speaker lpv:dateOfBirth ?birth.
?speaker lpv:gender ?gender.
?speaker lpv:politicalFunction ?function.
?function lpv:institution ?party.
?party rdf:type lpv:NationalParty.
?party rdfs:label ?partyname.
?function lpv:beginning ?start.
?function lpv:end ?end.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?countryname.
BIND("8" as ?parlterm)
BIND("representation" as ?dictionary)
FILTER ( ?date > "2014-07-01"^^xsd:date )
FILTER(langMatches(lang(?text), "en"))
FILTER(CONTAINS(?text, 'female representation') || CONTAINS(?text, 'women’s representation') || CONTAINS(?text, 'equal representation') || CONTAINS(?text, 'gender representation') || CONTAINS(?text, 'women in science') || CONTAINS(?text, 'women in business') || CONTAINS(?text, 'women’s leadership'))
} ORDER BY ?date ?speechnr
and here is the code I used for the FEMM committee (again anything after 7th parliamentary term):
SELECT DISTINCT ?name ?countryname ?birth ?gender ?start_com ?end_com ?date ?speechnr ?parlterm ?dictionary ?FEMM
WHERE {
?speech lpv:translatedText ?text.
?speech dcterms:date ?date.
?speech lpv:docno ?speechnr.
?speech lpv:speaker ?speaker.
?speaker lpv:name ?name.
?speaker lpv:dateOfBirth ?birth.
?speaker lpv:gender ?gender.
?speaker lpv:politicalFunction ?function.
?function lpv:institution ?institution.
?institution rdfs:label ?committee.
FILTER CONTAINS (?committee, "Committee on Women's Rights and Gender Equality")
BIND("Yes" as ?FEMM).
?function lpv:beginning ?start_com.
?function lpv:end ?end_com.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?countryname.
BIND("8" as ?parlterm)
BIND("representation" as ?dictionary)
FILTER ( ?date > "2014-07-01"^^xsd:date )
FILTER(langMatches(lang(?text), "en"))
FILTER(CONTAINS(?text, 'female representation') || CONTAINS(?text, 'women’s representation') || CONTAINS(?text, 'equal representation') || CONTAINS(?text, 'gender representation') || CONTAINS(?text, 'women in science') || CONTAINS(?text, 'women in business') || CONTAINS(?text, 'women’s leadership'))
} ORDER BY ?date ?speechnr
Thank you.

Related

Big storage overhead when moving from Oracle to YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I'm doing the Oracle DB migration to YB and when testing and noticed after Import the size of the YB is tripled on each node, meaning 50GB of Oracle DB ended up occupying 150GB on each node for 3 node cluster. Isn't YugabyteDB able to compress natively? Please note that RF is set to 3. Just curious to know about a significant increase in storage requirement.
This is my table schema:
CREATE TABLE "CT2_INVOICELINE_ORIGINAL"
( "INVOICEID" NUMBER(10,0) NOT NULL ENABLE,
"INVOICELINEID" NUMBER(10,0) NOT NULL ENABLE,
"INVOFIELD01" VARCHAR2(100),
"INVOFIELD02" VARCHAR2(100),
"INVOFIELD03" VARCHAR2(100),
"INVOFIELD04" VARCHAR2(100),
"INVOFIELD05" VARCHAR2(100),
"INVOFIELD06" VARCHAR2(100),
"INVOFIELD07" VARCHAR2(100),
"INVOFIELD08" VARCHAR2(100),
"INVOFIELD09" VARCHAR2(100),
"INVOFIELD10" VARCHAR2(100),
"INVOFIELD11" VARCHAR2(100),
"INVOFIELD12" VARCHAR2(100),
"INVOFIELD13" VARCHAR2(100),
"INVOFIELD14" VARCHAR2(100),
"INVOFIELD15" VARCHAR2(100),
"INVOFIELD16" VARCHAR2(100),
"INVOFIELD17" VARCHAR2(100),
"INVOFIELD18" VARCHAR2(100),
"INVOFIELD19" VARCHAR2(100),
"INVOFIELD20" VARCHAR2(100),
"INVOFIELD21" VARCHAR2(100),
"INVOFIELD22" VARCHAR2(100),
"INVOFIELD23" VARCHAR2(100),
"INVOFIELD24" VARCHAR2(100),
"INVOFIELD25" VARCHAR2(100),
"INVOFIELD26" VARCHAR2(100),
"INVOFIELD27" VARCHAR2(100),
"INVOFIELD28" VARCHAR2(100),
"INVOFIELD29" VARCHAR2(100),
"INVOFIELD30" VARCHAR2(100),
"INVOFIELD31" VARCHAR2(100),
"INVOFIELD32" VARCHAR2(100),
"INVOFIELD33" VARCHAR2(100),
"INVOFIELD34" VARCHAR2(100),
"INVOFIELD35" VARCHAR2(100),
"INVOFIELD36" VARCHAR2(100),
"INVOFIELD37" VARCHAR2(100),
"INVOFIELD38" VARCHAR2(100),
"INVOFIELD39" VARCHAR2(100),
"INVOFIELD40" VARCHAR2(100),
"INVOFIELD41" VARCHAR2(100),
"INVOFIELD42" VARCHAR2(100),
"INVOFIELD43" VARCHAR2(100),
"INVOFIELD44" VARCHAR2(100),
"INVOFIELD45" VARCHAR2(100),
"INVOFIELD46" VARCHAR2(100),
"INVOFIELD47" VARCHAR2(100),
"INVOFIELD48" VARCHAR2(100),
"INVOFIELD49" VARCHAR2(100),
"INVOFIELD50" VARCHAR2(100),
"INVOFIELD51" VARCHAR2(100),
"INVOFIELD52" VARCHAR2(100),
"INVOFIELD53" VARCHAR2(100),
"INVOFIELD54" VARCHAR2(100),
"INVOFIELD55" VARCHAR2(100),
"INVOFIELD56" VARCHAR2(100),
"INVOFIELD57" VARCHAR2(100),
"INVOFIELD58" VARCHAR2(100),
"INVOFIELD59" VARCHAR2(100),
"INVOFIELD60" VARCHAR2(100)
) SEGMENT CREATION IMMEDIATE
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 26255360 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "MISEK_DATA" ;
The source table does have 1 - 3 characters on most of the columns and around 10+ null columns as well.
When putting only 1 char in the varchar(100) you have 4x storage overhead on YugabyteDB. So if you have really small or null values this can be explained.
This is because of the way YugabyteDB currently stores data, where each column is a separate key-value in the underlying storage engine (rocksdb), while Oracle stores the full row as packed in a single tuple. Explained in detail in the docs: https://docs.yugabyte.com/latest/architecture/docdb/persistence/
“Packed row” format is under development that will fix this overhead soon: https://github.com/yugabyte/yugabyte-db/issues/3520

SPARQL Filter in-Text multiple terms

I would like to filter speeches for certain terms from a dictionary. Ideally, the outcome table would return the speeches from the dataset, containing one or more of the defined terms.
I have tried two versions so far.
One where I immediately try to match the respective terms:
SELECT ?name ?gender ?partyname ?countryname ?date ?speechnr
WHERE {
?speech tpf:match (lpv:text 'domestic abuse OR domestic violence OR intimate partner violence' ?text).
?speech lpv:spokenAs ?function.
?function lpv:institution ?party.
?party rdf:type lpv:NationalParty.
?party rdfs:label ?partyname.
?speech lpv:docno ?speechnr.
?speech dcterms:date ?date.
?speech lpv:speaker ?speaker.
?speaker lpv:name ?name.
?speaker lpv:gender ?gender.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?countryname.
FILTER ( ?date >= "1999-07-20"^^xsd:date && ?date <= "2004-07-19"^^xsd:date )
} ORDER BY ?date ?speechnr LIMIT 10
and another where I try to filter for the terms in the end:
SELECT ?name ?gender ?partyname ?countryname ?date ?speechnr ?text
WHERE {
?speech lpv:spokenText ?text.
?speech lpv:spokenAs ?function.
?function lpv:institution ?party.
?party rdf:type lpv:NationalParty.
?party rdfs:label ?partyname.
?speech lpv:docno ?speechnr.
?speech dcterms:date ?date.
?speech lpv:speaker ?speaker.
?speaker lpv:name ?name.
?speaker lpv:gender ?gender.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?countryname.
FILTER ( ?date >= "1999-07-20"^^xsd:date && ?date <= "2004-07-19"^^xsd:date )
FILTER(langMatches(lang(?text), "en"))
FILTER(?text = 'domestic abuse' || 'domestic violence' || 'intimate partner violence')
} ORDER BY ?date ?speechnr LIMIT 10
I also tried the following filter but don't know how I can include an OR there:
FILTER(CONTAINS(?text, 'domestic abuse')).
The Problems:
if I use OR I only get returned speeches in which one of the given terms is contained.
Using the logical || did not even return speeches with the looked-for terms for some reason.
additionally, for terms like 'domestic violence', I only want them to be returned if adjacent (e.g., not just 'violence').
Sorry for this long text, I would really appreciate your help.
The website if needed https://linkedpolitics.project.cwi.nl/web/html/home.html

Is there a compatibility list for Angular / Angular-CLI and Node.js?

I periodically run into the problem, having to spin up old Angular projects with deprecated dependencies of Angular.
Because I unsually run the latest Node.js version (at least lates LTS version) I often had the problem, that I wasn't able to get the old projects running.
I solved this by using a node version manager, but still I often have the problem that I'm not sure what is the best Node.js version to use for Angular Version X.
Sadly the official release notes handle this topic shabbily and are not a true help, especially if you like to know as of which Angular Version you can't use a specific Node.js version anymore...
Is there a complete compatibility list to check which Angular version is compatible with which Node.js version?
One more way to get to this details is by running npx ngvm compat
Wondering what NGVM is? Check it out https://youtu.be/tWCM69cucOA?t=1975
Angular CLI version
Angular version
Node.js version
TypeScript version
RxJS version
~15.0.0
~15.0.0
^14.20.0 || ^16.13.0 || ^18.10.0
~4.8.4
^6.5.5 || ^7.4.0
~14.2.0
~14.2.0
^14.15.0 || ^16.10.0
>= 4.6.4 < 4.9.0
^6.5.5 || ^7.4.0
~14.1.3
~14.1.3
^14.15.0 || ^16.10.0
>= 4.6.4 < 4.8.0
^6.5.5 || ^7.4.0
~14.0.7
~14.0.7
^14.15.0 || ^16.10.0
>= 4.6.4 < 4.8.0
^6.5.5 || ^7.4.0
~13.3.0
~13.3.0
^12.20.2 || ^14.15.0 || ^16.10.0
>= 4.4.4 < 4.7.0
^6.5.5 || ^7.4.0
~13.2.6
~13.2.7
^12.20.2 || ^14.15.0 || ^16.10.0
>= 4.4.4 <= 4.5.5
^6.5.5 || ^7.4.0
~13.1.4
~13.1.3
^12.20.2 || ^14.15.0 || ^16.10.0
>= 4.4.4 <= 4.5.5
^6.5.5 || ^7.4.0
~13.0.4
~13.0.3
^12.20.2 || ^14.15.0 || ^16.10.0
~4.4.4
^6.5.5 || ^7.4.0
~12.2.18
~12.2.17
^12.14.1 || ^14.15.0
>= 4.2.4 <= 4.3.5
^6.5.5 || ^7.0.1
~12.1.4
~12.1.5
^12.14.1 || ^14.15.0
>= 4.2.4 <= 4.3.5
^6.5.5
~12.0.5
~12.0.5
^12.14.1 || ^14.15.0
~4.2.4
^6.5.5
~11.2.19
~11.2.14
^10.13.0 || ^12.11.1
>= 4.0.8 <= 4.1.6
^6.5.5
~11.1.4
~11.1.2
^10.13.0 || ^12.11.1
>= 4.0.8 <= 4.1.6
^6.5.5
~11.0.7
~11.0.9
^10.13.0 || ^12.11.1
~4.0.8
^6.5.5
~10.2.4
~10.2.5
^10.13.0 || ^12.11.1
>= 3.9.4 <= 4.0.8
^6.5.5
~10.1.7
~10.1.6
^10.13.0 || ^12.11.1
>= 3.9.4 <= 4.0.8
^6.5.5
~10.0.8
~10.0.14
^10.13.0 || ^12.11.1
~3.9.4
^6.5.5
~9.1.15
~9.1.13
^10.13.0 || ^12.11.1
>= 3.6.5 <= 3.8.3
^6.5.5
~9.0.7
~9.0.7
^10.13.0 || ^12.11.1
>= 3.6.5 <= 3.7.7
^6.5.5
~8.3.29
~8.2.14
^10.9.0
~3.5.3
^6.4.0
~8.2.2
~8.2.14
^10.9.0
~3.4.5
^6.4.0
~8.1.3
~8.1.3
^10.9.0
~3.4.5
^6.4.0
~8.0.6
~8.0.3
^10.9.0
~3.4.5
^6.4.0
~7.3.9
~7.2.15
^8.9.4 || ^10.9.0
~3.2.4
^6.3.3
~7.2.4
~7.2.15
^8.9.4 || ^10.9.0
~3.2.4
^6.3.3
~7.1.4
~7.1.4
^8.9.4 || ^10.9.0
~3.1.6
^6.3.3
~7.0.7
~7.0.4
^8.9.4 || ^10.9.0
~3.1.6
^6.3.3
~6.2.9
~6.1.10
^8.9.4
~2.9.2
^6.2.2
~6.1.5
~6.1.10
^8.9.4
~2.7.2
^6.2.2
~6.0.8
~6.0.9
^8.9.4
~2.7.2
^6.0.0
~1.7.4
~5.2.11
^6.9.5 || ^8.9.4
~2.5.3
<= 5.5.12 < 6.0.0
~1.6.7
~5.2.11
^6.9.5 || ^8.9.4
~2.5.3
<= 5.5.12 < 6.0.0
~1.5.6
>= 5.0.5 <= 5.1.3
^6.9.5 || ^8.9.4
>= 2.4.2 <= 2.5.3
<= 5.5.12 < 6.0.0
~1.4.10
>= 4.2.6 <= 4.4.7
^6.9.5 || ^8.9.4
~2.4.2
^5.0.3
~1.3.2
>= 4.2.6 <= 4.4.7
^6.9.5
~2.4.2
^5.0.3
~1.2.7
>= 4.0.3 <= 4.1.3
^6.9.5
~2.3.4
^5.0.3
~1.1.3
>= 4.0.3 <= 4.1.3
^6.9.5
~2.3.4
^5.0.3
~1.0.6
>= 4.0.3 <= 4.1.3
^6.9.5
~2.2.2
^5.0.3
1.0.0-rc.4
~2.4.10
^6.9.5
~2.0.10
^5.0.3
1.0.0-beta.30
~2.3.1
^6.9.5
~2.0.10
^5.0.3
1.0.0-beta.22-1 (package name: angular-cli)
~2.2.4
^6.9.5
~2.0.10
^5.0.3
1.0.0-beta.20-1 (package name: angular-cli)
~2.1.2
^6.9.5
~2.0.10
^5.0.3
1.0.0-beta.17 (package name: angular-cli)
~2.0.2
^6.9.5
~2.0.10
^5.0.3
Credits: https://gist.github.com/LayZeeDK/c822cc812f75bb07b7c55d07ba2719b3 by Lars Gyrup Brink Nielsen
I acknowledge that this does not actually answer your question. But it does provide some relevant information for current version (which is what brought me here).
Here is the official word from Angular on current version:
https://angular.io/guide/setup-local
"Angular requires a current, active LTS, or maintenance LTS version of
Node.js."
In the notes you will see a link to a package.json file that contains an "engines" section. For Angular 11 it says:
"engines": {
"node": ">= 10.13.0",
"npm": ">= 6.11.0",
"yarn": ">= 1.13.0"
},
It might be possible to look at the released version of package.json in GitHub and determine the engines.node setting?
I have a similar problem. I uninstalled angular cli and reinstaled a previous version several times, until reaching angular cli v9, but when I try to run "ionic serve" I still get the same message "The Angular CLI requires a minimum Node.js version of either v14.15 or 16.10". Always the same message though I now have Angular CLU v9 and node v12.20.0 (I can't install a higher version of node.js because I have windows 7 in an old notebook and I can't buy a new one).

Tabulator Tables With ExpressJS

I'm very new to developing and I am trying to get tabulator to work in a nodejs environment using expressjs. I've learned that I can't run the script on the client side because there is no require function available. I know there are ways around that but I figured I would try to run in on the server side. I've used express generator with --view=pug. I've added div(id=example-table) in the index.pug. I have installed tabulator tables using npm install tabulator-tables. I tried to use the following in app.js:
var Tabulator = require('tabulator-tables');
app.get('/', function() {
var table = new Tabulator("#example-table", {
height:205,
layout:"fitColumns", //fit columns to width of table (optional)
columns:[
{title:"Name", field:"name", width:150},
{title:"Age", field:"age", align:"left", formatter:"progress"},
{title:"Favourite Color", field:"col"},
{title:"Date Of Birth", field:"dob", sorter:"date", align:"center"},
],
rowClick:function(e, row){
alert("Row " + row.getData().id + " Clicked!!!!");
},
});
var tabledata = [
{id:1, name:"Oli Bob", age:"12", col:"red", dob:""},
{id:2, name:"Mary May", age:"1", col:"blue", dob:"14/05/1982"},
{id:3, name:"Christine Lobowski", age:"42", col:"green", dob:"22/05/1982"},
{id:4, name:"Brendon Philips", age:"125", col:"orange", dob:"01/08/1980"},
{id:5, name:"Margret Marmajuke", age:"16", col:"yellow", dob:"31/01/1999"},
];
table.setData(tabledata);
});
Does nothing. Site loads with the everything else in the layout.pug and index.pug and source file shows <div id=example-table></div>, so I know that part at least worked.
I just switched to Vue, got it working.

Spark number of partitions logic for Join operator

I would like to understand how spark calculate number of partitions when joining data.
I am using spark 1.6.2 with yarn and Hadoop.
I have a code
val df1 = .....
val df2 = .... .cache() //small cached dataframe
//cartesian join
val joined = df1 join(broadcast(df2)) persist(StorageLevel.MEMORY_AND_DISK_SER)
println(df1.rdd.partitions.size ) //prints 10
println(df2.rdd.partitions.size ) //prints 28
println(joined.rdd.partitions.size ) //prints 33
can someone explain why the result is 33 ?
EDIT
== Optimized Logical Plan ==
Project [key1#6L,key2#9,key3#21L,temp_index#33L,CASE WHEN (key1_type#2 = business) THEN ((rand#179 * 2000.0) + 10000.0) ELSE ((rand#179 * 2000.0) + 5000.0) AS amount#180]
+- Project [key1_type#2,key1#6L,key2#9,key3#21L,temp_index#33L,randn(-5800712378829663042) AS rand#179]
+- Join Inner, None
:- InMemoryRelation [key1_type#2,key1#6L,key2#9,key3#21L], true, 10000, StorageLevel(true, true, false, false, 1), BroadcastNestedLoopJoin BuildRight, Inner, None, None
+- BroadcastHint
+- InMemoryRelation [temp_index#33L], true, 10000, StorageLevel(true, true, false, true, 1), Project [id#32L AS temp_index#33L], None
== Physical Plan ==
Project [key1#6L,key2#9,key3#21L,temp_index#33L,CASE WHEN (key1_type#2 = business) THEN ((rand#179 * 2000.0) + 10000.0) ELSE ((rand#179 * 2000.0) + 5000.0) AS amount#180]
+- Project [key1_type#2,key1#6L,key2#9,key3#21L,temp_index#33L,randn(-5800712378829663042) AS rand#179]
+- BroadcastNestedLoopJoin BuildRight, Inner, None
:- InMemoryColumnarTableScan [key1_type#2,key1#6L,key2#9,key3#21L], InMemoryRelation [key1_type#2,key1#6L,key2#9,key3#21L], true, 10000, StorageLevel(true, true, false, false, 1), BroadcastNestedLoopJoin BuildRight, Inner, None, None
+- InMemoryColumnarTableScan [temp_index#33L], InMemoryRelation [temp_index#33L], true, 10000, StorageLevel(true, true, false, true, 1), Project [id#32L AS temp_index#33L], None

Resources