Clone an existing document to a new sibling class document using mongoengine - mongoengine

I have the following classes
class ParentDocument(Document):
.
.
.
class Child1Document(ParentDocument):
.
.
.
class Child2Document(ParentDocument):
.
.
.
Now let's say that I have a document of type Child1Document. Is it possible to clone it to a new document of type Child2Document?
I have tried to do:
doc1 = Child1Document()
doc1.attr1 = foo
doc1.save()
doc2 = Child2Document()
doc2 = doc1
but this converts doc2 to a Child1Document type. Is there a way to copy all the contents of doc1 to doc2 without converting doc2?

Yes it is possible, but you need to use deepcopy
You code would look something like this:
from copy import deepcopy
doc1 = Child1Document()
doc1.attr1 = foo
doc1.save()
doc2 = deepcopy(doc1)
doc2.id = None
doc2.save()
Cloned!

Related

Concatenate two spacy docs together?

How do I concatenate two spacy docs together? To merge them into one?
import spacy
nlp = spacy.load('en')
doc1 = nlp(u'This is the doc number one.')
doc2 = nlp(u'And this is the doc number two.')
new_doc = doc1+doc2
Of course that will return an error as a doc object is not concatenable by default. Is there a straightforward solution to do that?
I looked at this:
https://github.com/explosion/spaCy/issues/2229
The issue seems closed so it sounds like they have implemented a solution but I cannot find a simple example of that being used.
What about this:
import spacy
from spacy.tokens import Doc
nlp = spacy.blank('en')
doc1 = nlp(u'This is the doc number one.')
doc2 = nlp(u'And this is the doc number two.')
# Will work for few Docs, but see further recommendations below
docs=[doc1, doc2]
# `c_doc` is your "merged" doc
c_doc = Doc.from_docs(docs)
print("Merged text: ", c_doc.text)
# Some quick checks: should not trigger any error.
assert len(list(c_doc.sents)) == len(docs)
assert [str(ent) for ent in c_doc.ents] == [str(ent) for doc in docs for ent in doc.ents]
For "a lot" of different sentences, it might be better to use nlp.pipe as shown in the documentation.
Hope it helps.

python regex to get key value pair

var abcConfig={
"assetsPublicPath":"https://abcd.cloudfront.net/"
}
var abcConfig1={
"assetsPublicPath1":"https://abcd.cloudfront.net/1"
}
I am looking for a regex which generates key value pair
[
abcConfig: {"assetsPublicPath": "https://abcd.cloudfront.net/"},
abcConfig1: {"assetsPublicPath1": "https://abcd.cloudfront.net/1"}
]
Using your sample, I came up with the following example code in Python:
import re
rs = r'var\s+(?P<key>.*)\s*=\s*(?P<val>\{\s*.*?\s*\})\s*'
m = re.finditer(rs, test)
for mm in m:
gd = mm.groupdict()
print(gd['key'], gd['val'])
This captures the key and value from the text into named groups, which get stored in the groupdict. You could then use these values to build your dictionary or do whatever you need to do.

Get full paper content from PubMed via API and list of IDs

I'm hoping to query the PubMed API based on a list of paper IDs and return the title,abstract and content.
So far I have been able to do the first three things doing the following:
from metapub import PubMedFetcher
pmids = [2020202, 1745076, 2768771, 8277124, 4031339]
fetch = PubMedFetcher()
title_list = []
abstract_list = []
for pmid in pmids:
article = fetch.article_by_pmid(pmid)
abstract = article.abstract # str
abstract_list.append(abstract)
title = article.title # str
title_list.append(title)
OR get the full paper content, but the query is based on keywords rather than IDs
email = 'myemail#gmail.com'
pubmed = PubMed(tool="PubMedSearcher", email=email)
## PUT YOUR SEARCH TERM HERE ##
search_term = "test"
results = pubmed.query(search_term, max_results=300)
articleList = []
articleInfo = []
for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
articleDict = article.toDict()
articleList.append(articleDict)
# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
pubmedId = article['pubmed_id'].partition('\n')[0]
# Append article info to dictionary
articleInfo.append({u'pubmed_id':pubmedId,
u'title':article['title'],
u'keywords':article['keywords'],
u'journal':article['journal'],
u'abstract':article['abstract'],
u'conclusions':article['conclusions'],
u'methods':article['methods'],
u'results': article['results'],
u'copyrights':article['copyrights'],
u'doi':article['doi'],
u'publication_date':article['publication_date'],
u'authors':article['authors']})
Any help is appreciated!

How do I build a string of variable names?

I'm trying to build a string that contains all attributes of a class-object. The object name is jsonData and it has a few attributes, some of them being
jsonData.Serial,
jsonData.InstrumentSerial,
jsonData.Country
I'd like to build a string that has those attribute names in the format of this:
'Serial InstrumentSerial Country'
End goal is to define a schema for a Spark dataframe.
I'm open to alternatives, as long as I know order of the string/object because I need to map the schema to appropriate values.
You'll have to be careful about filtering out unwanted attributes, but try this:
' '.join([x for x in dir(jsonData) if '__' not in x])
That filters out all the "magic methods" like __init__ or __new__.
To include those, do
' '.join(dir(jsonData))
These take advantage of Python's dir method, which returns a list of all attributes of an object.
I don't quite understand why you want to group the attribute names in a single string.
You could simply have a list of attribute names as the order of a python list is persist.
attribute_names = [x for x in dir(jsonData) if '__' not in x]
From there you can create your dataframe. If you don't need to specify the SparkTypes, you can just to:
df = SparkContext.createDataFrame(data, schema = attribute_names)
You could also create a StructType and specify the types in your schema.
I guess that you are going to have a list of jsonData records that you want to consider as Rows.
Let's considered it as a list of objects, but the logic would still be the same.
You can do that as followed:
my_object_list = [
jsonDataClass(Serial = 1, InstrumentSerial = 'TDD', Country = 'France'),
jsonDataClass(Serial = 2, InstrumentSerial = 'TDI', Country = 'Suisse'),
jsonDataClass(Serial = 3, InstrumentSerial = 'TDD', Country = 'Grece')]
def build_record(obj, attr_names):
from operator import attrgetter
return attrgetter(*attr_names)(obj)
So the data attribute referred previously would be constructed as:
data = [build_record(x, attribute_names) for x in my_object_list]

how to get specific elements from Collection tdl with stanford nlp parser

I am using the nlp parser stanord.
I want to extract some elements like nsubj and more from Collection tdl.
My code is:
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();
but my problem is I don't know how to compare the elements.that I get from the Collection.
Thanks a lot for helping!
It is a collection of TypedDependency and can then be examined or manipulated in all the usual Java ways. For example, this code prints out just the nsubj relations:
Collection<TypedDependency> tdl = gs.typedDependenciesCCprocessed(true);
for (TypedDependency td : tdl) {
if (td.reln().equals(EnglishGrammaticalRelations.NOMINAL_SUBJECT)) {
System.out.println("Nominal Subj relation: " + td);
}
}

Resources