I want to do OpenIE extraction and make the input file, but I cannot figure out the way to convert the raw sentence. My example is below:
Example(input raw sentence):
32.7 % of all households were made up of individuals and 15.7 % had someone living alone who was 65 years of age or older .
output:
32.7 % of all households were made up of individuals and 15.7 % had someone living alone who was 65 years of age or older . were made up of 32.7 % of all households individuals
32.7 % of all households were made up of individuals and 15.7 % had someone living alone who was 65 years of age or older . had 15.7 % of all households someone living alone who was 65 years of age or older
I know that the output file is the extracted one but I cannot find the code or the way to convert like output sentences
Is there any method to do so?
Please help!
I've already tried Ollie and OpenIE-5 but the output format is not that I want
Related
A car's fuel consumption may be expressed in many different ways. For example, in Europe, it is shown as the amount of fuel consumed per 100 kilometers.
In the USA, it is shown as the number of miles traveled by a car using one gallon of fuel.
Your task is to write a pair of functions converting l/100km into mpg, and vice versa.
The functions:
are named l100kmtompg and mpgtol100km respectively;
take one argument (the value corresponding to their names)
Complete the code in the editor.
Run your code and check whether your output is the same as ours.
Here is some information to help you:
1 American mile = 1609.344 metres;
1 American gallon = 3.785411784 litres.
def l100kmtompg(liters):
def mpgtol100km(miles):
I know the question is confusion , cuz I spent hours doing this. ngl this is the real question
3.9 is the value given
so for first function
100*0.625/(3.9 *0.265)
2nd def(60.3)
(3.78/(60.3 * 1.6))*100
I'm a newb to the field of ML. I'm trying to learn about NLTK and SciKit, as a part of learning process I'm trying to convert certain text into a structured data but I'm lost after a certain point.
Input: Price variation every year: 2018 2017 2016 Grocery $ 100 $ 150 $ 200 Gas $ 40 $ 50 $ 60 Utilities $ 36 $ 33 $ 31 Clothes $ 100 $ 100 $ 110
Output:
I really appreciate if anyone could provide me pointers, documents or links I could make use of to convert the text into a structured format.
I tried using nltk by tokenizing and getting the POS tags, it is a brute force method and was not successful. I tried splitting the string at regular intervals but is not fruitful as it won't work if I change the string.
Here is an example where the authors have taken this image:
and analyzed using spatstat. To do that, they extracted the coordinates as it could be seen here. I would like to do the same and I am wondering how one could get a sample point pattern out of an image like this. Directly converting the image into a ppp object creates lots of artifacts.
The manual in their R package BioC2015Oles doesn't seem to exist. Thanks.
The spatstat package assumes you have already processed your image and ended up with a list of coordinates and possibly some attributes (marks) associate with each pair (x,y). Ideally you should also have a window of observation indicating where the points potentially can occur. There is a raster/image formats in spatstat called im which can store an image/raster, but it is many used to store auxiliary information from the experiment, which can be used to explain the occurrence or absence of points in areas of the observation window and not to do image processing per se.
To convert a noisy microscope image to e.g. a list of cell centres people usually use various image processing tools and techniques (watershed, morphological opening and closing, etc.). The presentation you refer to seems to build on the R package EBimage (which is on BioConductor and does have a manual), and you can try to extract the cells using that. Alternatively there are other packages in R or entirely different open source systems focusing on image analysis such as QuPath, ImageJ and many others. I cannot really guide you as to which one is the better tool for your task.
You can do
library(raster)
x <- brick('tLh2E.jpg')
#plotRGB(x)
All cells with coordinates:
xyz = rasterToPoints(x)
head(xyz)
# x y tLh2E.1 tLh2E.2 tLh2E.3
#[1,] 0.5 1103.5 222 222 224
#[2,] 1.5 1103.5 214 214 216
#[3,] 2.5 1103.5 223 223 225
#[4,] 3.5 1103.5 220 220 222
#[5,] 4.5 1103.5 197 197 199
#[6,] 5.5 1103.5 198 198 200
Or a sample:
s1 <- sampleRandom(x, 100, xy=TRUE)
s2 <- sampleRegular(x, 100, xy=TRUE)
To look at the locations of the samples
plotRGB(x)
points(s1[, 1:2])
points(s2[, 1:2], col='red', pch=20)
To create an image from a regular sample
r <- sampleRegular(x, 1000, asRaster=TRUE)
plotRGB(r)
For stratified sampling you would have to define the regions. You could draw them with raster::drawPoly() followed by rasterize, or model them see raster::predict. Here a very simple, and perhaps not very good, approach based on eye-balling. It turns out that the second, "green", layer (from the red-green-blue image) has most of the information. This gets you close:
r <- reclassify(x[[2]], rbind(c(0,100,1), c(100,175,2), c(175,255,3)))
plot(r, col=c('red', 'blue','gray'))
You can now do the following to find out which color each point has:
extract(r, s1[,1:2])
I am trying to summarise some text using Gensim in python and want exactly 3 sentences in my summary. There doesn't seem to be an option to do this so I have done the following workaround:
with open ('speeches//'+speech, "r") as myfile:
speech=myfile.read()
sentences = speech.count('.')
x = gensim.summarization.summarize(speech, ratio=3.0/sentences)
However this code is only giving me two sentences. Furthermore, as I incrementally increase 3 to 5 still nothing happens.
Any help would be most appreciated.
You may not be able use 'ratio' for this. If you give ratio=0.3, and you have 10 sentences (assuming count of words in each sentence is same), your output will have 3 sentences, 6 for 20 and so on.
As per gensim doc
ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary.
Instead you might want to try using word_count, summarize(speech, word_count=60)
This question is a bit old, in case you found a better solution, pls share.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
In my thesis I'm trying to discover which factors influence the CSR (corporate social responsibility, GSE_RAW) behavior of companies. Two groups of possible factors / variables have been identified: company-specific and country-specific.
First, company-specific variables are (among others)
MKT_AVG_LN: the marketvalue of the company
SIGN: the number of CSR treaties the company has signed
INCID: the number of reported CSR incidents the company has been involved in
Second, each of the 4,000 companies in the dataset is headquartered in one of 35 countries. For each country, I have gathered some country-specific data, among others:
LAW_FAM: the legal family the countries' legal system stems from (either French, English, Scandinavian, or German)
LAW_SR: relative protection the countries' company law gives to shareholders (for instance, in case of company default)
LAW_LE: the relative effectiveness of the countries' legal system (higher value means more effective, thus for instance less corrupted)
COM_CLA: a measurement for the intensity of internal market competition
GCI_505: mesurement for the quality of primary education
GCI_701: measurement for the quality of secondary education
HOF_PDI: power distance (higher value means more hierarchical society)
HOF_LTO: country time orientation (higher means more long-term orientation)
DEP_AVG: the countries' GDP per capita
CON_AVG: the countries' average inflation over the 2008-2010 timeframe
In order to make an analysis on this data, I "raised" the country-level data to the company-level. For instance, if Belgium has a COM_CLA value of 23, then all Belgian companies in the dataset have their COM_CLA value set to 23. The variable LAW_FAM is split up into 4 dummy variables (LAW_FRA, LAW_SCA, LAW_ENG, LAW_GER), giving each company a 1 for one of these dummies.
This all results in a dataset like this:
COMPANY MKT_AVG_LN .. INCID .. LAW_FRA LAW_SCA .. LAW_SR LAW_LE COM_CLA .. etc
------------------------------------------------------------------------------
1 1.54 55 0 1 34 65 53
2 1.44 16 0 1 34 65 53
3 0.11 2 0 1 34 65 53
4 0.38 12 1 0 18 40 27
5 1.98 114 1 0 18 40 27
. . . . . . . .
. . . . . . . .
4,000 0.87 9 0 1 5 14 18
Here, companies 1 to 3 are from the same country A, and 4 and 5 from country B.
First, I tried analyzing using OLS, but the model seemed very "unstable", as is shown below. The first model has a r-squared of .516:
Adding only two variables changes many of the beta's and significance levels, as well as the r-squared (.591). Of course the r-squared increases when variables are added, but this is quite an increase from .516:
Eventually, it was suggested in another post that I should not use OLS here but mixed models, because of the categorical countly-level data. However, I am confused as to how perform this in SPSS. The examples I found online are not comparable to mine, so I don't know what to fill in, amongst others, in the below mixed model dialogue:
Could somebody using SPSS please help me explain how to perform this analysis so that I may come to a regression model (CSR = b1*MKT_AVG_LN + b2*SIGN + ... + b13*CON_AVG) so that I can conclude wheter CSR is determined by company-features or country-features (or by neither or both)?
I believe I have to insert the company-level variables as covariates and the country-level variables as factors. Is this correct? Second, I am unsure what to do with the LAW_SCA to LAW_ENG dummy variables.
Any help is greatly appreciated!