unable to get rid of all emojis - python-3.x

I need help removing emojis. I looked at some other stackoverflow questions and this is what I am de but for some reason my code doesn't get rid of all the emojis
d= {'alexveachfashion': 'Fashion Style * Haute Couture * Wearable Tech * VR\n👓👜⌚👠\nSoundCloud is Live #alexveach\n👇New YouTube Episodes ▶️👇', 'andrewvng': 'Family | Fitness | Friends | Gym | Food', 'runvi.official': 'Accurate measurement via SMART insoles & real-time AI coaching. Improve your technique & BOOST your performance with every run.\nSoon on Kickstarter!', 'triing': 'Augmented Jewellery™️ • Montreal. Canada.', 'gedeanekenshima': 'Prof na Etec Albert Einstein, Mestranda em Automação e Controle de Processos, Engenheira de Controle e Automação, Técnica em Automação Industrial.', 'jetyourdaddy': '', 'lavonne_sun': '☄️🔭 ✨\n°●°。Visual Narrative\nA creative heart with a poetic soul.\n————————————\nPARSONS —— Design & Technology', 'taysearch': 'All the World’s Information At Your Fingertips. (Literally) Est. 1991🇺🇸 🎀#PrincessofSearch 🔎Sample 👇🏽 the Search Engine Here 🗽', 'hijewellery': 'Fine 3D printed jewellery for tech lovers #3dprintedjewelry #wearabletech #jewellery', 'yhanchristian': 'Estudante de Engenharia, Maker e viciado em café.', 'femka': 'Fashion Futurist + Fashion Tech Lab Founder #technoirlab + Fashion Designer / Parsons & CSM Grad / Obsessed with #fashiontech #future #cryptocurrency', 'sinhbisen': 'Creator, TRiiNG, augmented jewellery label ⭕️ Transhumanist ⭕️ Corporeal cartographer ⭕️', 'stellawearables': '#StellaWearables ✉️Info#StellaWearables.com Premium Wearable Technology That Monitors Personal Health & Environments ☀️🏝🏜🏔', 'ivoomi_india': 'We are the manufacturers of the most innovative technologies and user-friendly gadgets with a global presence.', 'bgutenschwager': "When it comes to life, it's all about the experience.\nGoogle Mapper 🗺\n360 Photographer 📷\nBrand Rep #QuickTutor", 'storiesofdesign': 'Putting stories at the heart of brands and businesses | Cornwall and London | #storiesofdesign', 'trume.jp': '草創期から国産ウオッチの製造に取り組み、挑戦を続けてきたエプソンが世界に放つ新ブランド「TRUME」(トゥルーム)。目指すのは、最先端技術でアナログウオッチを極めるブランド。', 'themarinesss': "I didn't choose the blog life, the blog life chose me | Aspiring Children's Book Author | www.slayathomemum.com", 'ayowearable': 'The world’s first light-based wearable that helps you sleep better, beat jet lag and have more energy! #goAYO Get yours at:', 'wearyourowntechs': 'Bringing you the latest trends, Current Products and Reviews of Wearable Technology. Discover how they can enhance your Life and Lifestyle', 'roxfordwatches': 'The Roxford | The most stylish and customizable fitness smartwatch. Tracks your steps/calories/dist/sleep. Comes with FOUR bands, and a travel case!', 'playertek': "Track your entire performance - every training session, every match. \nBecause the best players don't hide.", '_kate_hartman_': '', 'hmsmc10': 'Health & Wellness 🍎\nBoston, MA 🏙\nSuffolk MPA ‘17 🎓 \n.\nJust Strong Ambassador 🏋🏻\u200d♀️', 'gadgetxtreme': 'Dedicated to reviewing gadgets, technologies, internet products and breaking tech news. Follow us to see daily vblogs on all the disruptive tech..', 'freedom.journey.leader': '📍MN\n🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 \n📧Ashleybp5#gmail.com \n#homeschool #bossmom #builder #momlife', 'arts_food_life': 'Life through my phone.', 'medgizmo': 'Wearable #tech: #health #healthcare #wellness #gadgets #apps. Images/links provided as information resource only; doesn’t mean we endorse referenced', 'sawearables': 'The home of wearable tech in South Africa!\n--> #WearableTech #WearableTechnology #FitnessTech Find your wearable #', 'shop.mercury': 'Changing the way you charge.⚡️\nGet exclusive product discounts, and help us reach our goal below!🔋', 'invisawear': 'PRE-ORDERS NOW AVAILABLE! Get yours 25% OFF here: #girlboss #wearabletech'}
for key in d:
print("---with emojis----")
print(d[key])
print("---emojis removed----")
x=''.join(c for c in d[key] if c <= '\uFFFF')
print(x)
output example
---with emojis----
📍MN
🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿
📧Ashleybp5#gmail.com
#homeschool #bossmom #builder #momlife
---emojis removed----
MN
Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN.
Ashleybp5#gmail.com
#homeschool #bossmom #builder #momlife
---with emojis----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!🔋
---emojis removed----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!

There is no technical definition of what an "emoji" is. Various glyphs may be used to render printable characters, symbols, control characters and the like. What seems like an "emoji" to you may be part of normal script to others.
What you probably want to do is to look at the Unicode category of each character and filter out various categories. While this does not solve the "emoji"-definition-problem per se, you get much better control over what you are actually doing without removing, for example, literally all characters of languages spoken by 2/3 of the planet.
Instead of filtering out certain categories, you may filter everything except the lower- and uppercase letters (and numbers). However, be aware that ꙭ is not "the googly eyes emoji" but the CYRILLIC SMALL LETTER DOUBLE MONOCULAR O, which is a normal lowercase letter to millions of people.
For example:
import unicodedata
s = "🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿"
# Just filter category "symbol"
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', ))
print(t)
...results in
Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN.
This may not be emoji-free enough, yet the • is technically a form of punctuation. So filter this as well
# Filter symbols and punctuations. You may want 'Cc' as well,
# to get rid of control characters. Beware that newlines are a
# form of control-character.
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', 'Po'))
print(t)
And you get
Wife Homeschooling Mom to 5 D Y I lover Small town living in MN

Related

is there a method to detect person and associate a text?

I have a text like :
Take a loot at some of the first confirmed Forum speakers: John
Sequiera Graduated in Biology at Facultad de Ciencias Exactas y
Naturales,University of Buenos Aires, Argentina. In 2004 obtained a
PhD in Biology (Molecular Neuroscience), at University of Buenos
Aires, mentored by Prof. Marcelo Rubinstein. Between 2005 and 2008
pursued postdoctoral training at Pasteur Institute (Paris) mentored by
Prof Jean-Pierre Changeux, to investigate the role of nicotinic
receptors in executive behaviors. Motivated by a deep interest in
investigating human neurological diseases, in 2009 joined the
Institute of Psychiatry at King’s College London where she performed
basic research with a translational perspective in the field of
neurodegeneration.
Since 2016 has been chief of instructors / Adjunct professor at University of Buenos Aires, Facultad de Ciencias Exactas y Naturales.
Tom Gonzalez is a professor of Neuroscience at the Sussex Neuroscience, School of Life Sciences, University of Sussex. Prof.
Baden studies how neurons and networks compute, using the beautiful
collection of circuits that make up the vertebrate retina as a model.
I want to have in output :
[{"person" : "John Sequiera" , "content": "Graduated in Biology at Facultad...."},{"person" : "Tom Gonzalez" , "content": "is a professor of Neuroscience at the Sussex..."}]
so we want to get NER : PER for person and in content we put all contents after detecting person until we found a new person in the text ...
it is possible ?
i try to use spacy to extract NER , but i found a difficulty to get content :
import spacy
​
nlp = spacy.load("en_core_web_lg")
doc = nlp(text)
​
for ent in doc.ents:
print(ent.text,ent.label_)

How to resolve pandas length error for rows/columns

I have raised the SO Question here and blessed to have an answer from #Scott Boston.
However i am raising another question about an error ValueError: Columns must be same length as key as i am reading a text file and all the rows/columns are not of same length, i tried googling but did not get an answer as i don't want them to be skipped.
Error
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
My pandas dataframe generator
#!/usr/bin/python3
import pandas as pd
#
cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False)
cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':'
df = cvc_file.set_index('cols').transpose() #set_index and transpose
print(df)
Result
$ ./read_cvc.py
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
cols ab ad an ed eg et en eck ell it id ig im ish ob og ock ut ub ug um un ud uck ush
0 cab bad ban bed beg bet den beck bell bit bid big dim fish cob bog dock but cub bug bum bun bud buck gush
1 dab dad can fed keg get hen deck cell fit did dig him dish gob cog lock cut hub dug gum fun cud duck hush
2 gab had fan led leg jet men neck dell hit hid fig rim wish job dog rock gut nub hug hum gun dud luck lush
3 jab lad man red peg let pen peck jell kit kid gig brim swish lob fog sock hut rub jug mum nun mud muck mush
4 lab mad pan wed NaN met ten check sell lit lid jig grim NaN mob hog tock jut sub lug sum pun spud puck rush
5 nab pad ran bled NaN net then fleck tell pit rid pig skim NaN rob jog block nut tub mug chum run stud suck blush
File contents
$ cat kids_cvc
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than
ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag
ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap
am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham
ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track
ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash
ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed
eg: beg, keg, leg, peg
et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret
en: den, hen, men, pen, ten, then, when
eck: beck, deck, neck, peck, check, fleck, speck, wreck
ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell
it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit
id: bid, did, hid, kid, lid, rid, skid, slid
ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig
im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim
ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip
ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick
ish: fish, dish, wish, swish
in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin
ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot
ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob
og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog
op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop
ock: dock, lock, rock, sock, tock, block, clock, flock, rock, shock, smock, stock
ut: but, cut, gut, hut, jut, nut, rut, shut
ub: cub, hub, nub, rub, sub, tub, grub, snub, stub
ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug
um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
Note:
It's making the first row/column as a master one which has 13 values and skipping all the columns which are more than 13 columns.
I couldn't figure out a pandas way to extend the columns, but converting the rows to a dictionary made things easier.
ss = '''
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
.......
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
'''.strip()
with open ('kids.cvc','w') as f: f.write(ss) # write data file
######################################
import pandas as pd
dd = {}
maxcnt=0
with open('kids.cvc') as f:
lines = f.readlines()
for line in lines:
line = line.strip() # remove \n
len1 = len(line) # words have leading space
line = line.replace(' ','')
cnt = len1 - len(line) # get word (space) count
if cnt > maxcnt: maxcnt = cnt # max word count
rec = line.split(':') # header : words
dd[rec[0]] = rec[1].split(',') # split words
for k in dd:
dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column
df = pd.DataFrame(dd) # convert dictionary to dataframe
print(df.to_string(index=False))
Output
ab at ad an ag ap am ack ash ed eg et en eck ell it id ig im ip ick ish in ot ob og op ock ut ub ug um un ud uck ush
cab bat bad ban bag cap bam back bash bed beg bet den beck bell bit bid big dim dip kick fish bin cot cob bog cop dock but cub bug bum bun bud buck gush
dab cat dad can gag gap dam hack cash fed keg get hen deck cell fit did dig him hip lick dish din dot gob cog hop lock cut hub dug gum fun cud duck hush
gab fat had fan hag lap ham jack dash led leg jet men neck dell hit hid fig rim lip nick wish fin got job dog mop rock gut nub hug hum gun dud luck lush
jab hat lad man lag map jam lack gash red peg let pen peck jell kit kid gig brim nip pick swish pin hot lob fog pop sock hut rub jug mum nun mud muck mush
lab mat mad pan nag nap ram pack hash wed met ten check sell lit lid jig grim rip sick sin jot mob hog top tock jut sub lug sum pun spud puck rush
nab pat pad ran rag rap yam rack lash bled net then fleck tell pit rid pig skim sip tick tin lot rob jog chop block nut tub mug chum run stud suck blush
tab rat sad tan sag sap clam sack mash bred pet when speck well sit skid rig slim tip wick win not sob log crop clock rut grub pug drum sun thud tuck brush
blab sat tad van tag tap cram tack rash fled set wreck yell wit slid wig swim zip brick chin pot blob blog drop flock shut snub rug glum spun yuck crush
crab vat glad clan wag yap scam black sash pled vet dwell knit zig trim chip chick grin rot glob clog flop rock stub tug plum stun chuck flush
grab brat plan brag zap slam crack clash sled wet shell quit twig whim clip click shin tot knob frog glop shock drug scum cluck slush
scab chat scan drag chap spam shack crash shed yet smell slit drip flick skin blot slob plop smock plug slum pluck
stab flat than flag clap swam snack flash fret spell spit flip quick spin knot snob shop stock slug stuck
slab gnat snag flap tram stack slash swell grip slick thin plot slop snug truck
spat stag slap wham quack smash ship stick twin shot stop
snap track skip thick slot
trap slip trick spot

How to reconstruct original text from spaCy tokens, even in cases with complicated whitespacing and punctuation

' '.join(token_list) does not reconstruct the original text in cases with multiple whitespaces and punctuation in a row.
For example:
from spacy.tokenizer import Tokenizer
from spacy.lang.en import English
nlp = English()
# Create a blank Tokenizer with just the English vocab
tokenizerSpaCy = Tokenizer(nlp.vocab)
context_text = 'this is a test \n \n \t\t test for \n testing - ./l \t'
contextSpaCyToksSpaCyObj = tokenizerSpaCy(context_text)
spaCy_toks = [i.text for i in contextSpaCyToksSpaCyObj]
reconstruct = ' '.join(spaCy_toks)
reconstruct == context_text
>False
Is there an established way of reconstructing original text from spaCy tokens?
Established answer should work with this edge case text (you can directly get the source from clicking the 'improve this question' button)
" UNCLASSIFIED U.S. Department of State Case No. F-2014-20439 Doc No. C05795279 Date: 01/07/2016\n\n\n RELEASE IN PART\n B5, B6\n\n\n\n\nFrom: H <hrod17#clintonemail.com>\nSent: Monday, July 23, 2012 7:26 AM\nTo: 'millscd #state.gov'\nCc: 'DanielJJ#state.gov.; 'hanleymr#state.gov'\nSubject Re: S speech this morning\n\n\n\n Waiting to hear if Monica can come by and pick up at 8 to take to Josh. If I don't hear from her, can you send B5\nsomeone else?\n\n Original Message ----\nFrom: Mills, Cheryl D [MillsCD#state.gov]\nSent: Monday, July 23, 2012 07:23 AM\nTo: H\nCc: Daniel, Joshua J <Daniel1.1#state.gov>\nSubject: FW: S speech this morning\n\nSee below\n\n B5\n\ncdm\n\n Original Message\nFrom: Shah, Rajiv (AID/A) B6\nSent: Monday, July 23, 2012 7:19 AM\nTo: Mills, Cheryl D\nCc: Daniel, Joshua.'\nSubject: S speech this morning\n\nHi cheryl,\n\nI look fwd to attending the speech this morning.\n\nI had one last minute request - I understand that in the final version there is no reference to the child survival call to\naction, but their is a reference to family planning efforts. Could you and josh try to make sure there is some specific\nreference to the call to action?\n\nAlso, in terms of acknowledgements it would be good to note torn friedan's leadership as everyone is sensitive to our ghi\ntransition and we want to continue to send the usaid-pepfar-cdc working together public message. I don't know if he is\nthere, but wanted to flag.\n\nLook forward to it.\n\nRaj\n\n\n\n\n UNCLASSIFIED U.S. Department of State Case No. F-2014-20439 Doc No. C05795279 Date: 01/07/2016\n\x0c"
You can very easily accomplish this by changing two lines in your code:
spaCy_toks = [i.text + i.whitespace_ for i in contextSpaCyToksSpaCyObj]
reconstruct = ''.join(spaCy_toks)
Basically, each token in spaCy knows whether it is followed by whitespace or not. So you call token.whitespace_ instead of joining them on space by default.

In python, headings not in the same row

I extracted three columns from a larger data frame (recent_grads) as follows...
df = recent_grads.groupby('Major_category')['Men', 'Women'].sum()
However, when I print df, it comes up as follows...
Men Women
Major_category
Agriculture & Natural Resources 40357.0 35263.0
Arts 134390.0 222740.0
Biology & Life Science 184919.0 268943.0
Business 667852.0 634524.0
Communications & Journalism 131921.0 260680.0
Computers & Mathematics 208725.0 90283.0
Education 103526.0 455603.0
Engineering 408307.0 129276.0
Health 75517.0 387713.0
Humanities & Liberal Arts 272846.0 440622.0
Industrial Arts & Consumer Services 103781.0 126011.0
Interdisciplinary 2817.0 9479.0
Law & Public Policy 91129.0 87978.0
Physical Sciences 95390.0 90089.0
Psychology & Social Work 98115.0 382892.0
Social Science 256834.0 273132.0
How do I get Major_category heading in the same row as Men and Women headings? I tried to put the three columns in a new data frame as follows...
df1 = df[['Major_category', 'Men', 'Women']].copy()
This gives me an error (Major_category not in index)
Hi man you should try reset_index https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html:
df = df.groupby('Major_category')['Men', 'Women'].sum()
# Print the output.
md = df.reset_index()
print(md)
Seems like you want to convert the groupby object back to a dataframe try:
df['Major_category'].apply(pd.DataFrame)

How to download pubmed articles and read them?

Im having trouble to save pubmed articles and read them. I've seen at this page here that there are some special files types but no one of them worked for me. I want to save them in a way that I can continuous using the keys to get the the data. I don't know if its possible use it if I save it as a text file. My code is this one:
import sys
from Bio import Entrez
import re
import os
from Bio import Medline
from Bio import SeqIO
'''Class Crawler is responsable to browse the biological databases
from DownloadArticles import DownloadArticles
c = DownloadArticles()
c.articles_dataset_list
'''
class DownloadArticles():
def __init__(self):
Entrez.email='myemail#gmail.com'
self.dataC = self.saveArticlesFilesInXMLMode('pubmed', '26837606')
'''Metodo 4 ler dado em forma de texto.'''
def saveArticlesFilesInXMLMode(self,dbs, ids):
net_handle = Entrez.efetch(db=dbs, id=ids, rettype="medline", retmode="txt")
directory = "/dataset/Pubmed/DatasetArticles/"+ ids + ".fasta"
# if not os.path.exists(directory):
# os.makedirs(directory)
# filename = directory + '/'
# if not os.path.exists(filename):
out_handle = open(directory, "w+")
out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print("Saved")
print("Parsing...")
record = SeqIO.read(directory, "fasta")
print(record)
return(record.read())
I'm getting this error: ValueError: No records found in handle
Pease someone can help me?
Now my code is like this, I am trying to do a function to save in .fasta like you did. And one to read the .fasta files like in the answer above.
import sys
from Bio import Entrez
import re
import os
from Bio import Medline
from Bio import SeqIO
def save_Articles_Files(dbName, idNum, rettypeName):
net_handle = Entrez.efetch(db=dbName, id=idNum, rettype=rettypeName, retmode="txt")
filename = path + idNum + ".fasta"
out_handle = open(filename, "w")
out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print("Saved")
enter code here
Entrez.email='myemail#gmail.com'
dbName = 'pubmed'
idNum = '26837606'
rettypeName = "medline"
path ="/run/media/Dropbox/codigos/Codes/"+dbName
save_Articles_Files(dbName, idNum, rettypeName)
But my function is not working I need some help please!
You're mixing up two concepts.
1) Entrez.efetch() is used to access NCBI. In your case you are downloading an article from Pubmed. The result that you get from net_handle.read() looks like:
PMID- 26837606
OWN - NLM
STAT- In-Process
DA - 20160203
LR - 20160210
IS - 2045-2322 (Electronic)
IS - 2045-2322 (Linking)
VI - 6
DP - 2016 Feb 03
TI - Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia.
PG - 20315
LID - 10.1038/srep20315 [doi]
AB - Recently, CRISPR/Cas9 technology has emerged as a powerful approach for targeted
genome modification in eukaryotic organisms from yeast to human cell lines. Its
successful application in several plant species promises enormous potential for
basic and applied plant research. However, extensive studies are still needed to
assess this system in other important plant species, to broaden its fields of
application and to improve methods. Here we showed that the CRISPR/Cas9 system is
efficient in petunia (Petunia hybrid), an important ornamental plant and a model
for comparative research. When PDS was used as target gene, transgenic shoot
lines with albino phenotype accounted for 55.6%-87.5% of the total regenerated T0
Basta-resistant lines. A homozygous deletion close to 1 kb in length can be
readily generated and identified in the first generation. A sequential
transformation strategy--introducing Cas9 and sgRNA expression cassettes
sequentially into petunia--can be used to make targeted mutations with short
indels or chromosomal fragment deletions. Our results present a new plant species
amenable to CRIPR/Cas9 technology and provide an alternative procedure for its
exploitation.
FAU - Zhang, Bin
AU - Zhang B
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Yang, Xia
AU - Yang X
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Yang, Chunping
AU - Yang C
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Li, Mingyang
AU - Li M
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Guo, Yulong
AU - Guo Y
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
LA - eng
PT - Journal Article
PT - Research Support, Non-U.S. Gov't
DEP - 20160203
PL - England
TA - Sci Rep
JT - Scientific reports
JID - 101563288
SB - IM
PMC - PMC4738242
OID - NLM: PMC4738242
EDAT- 2016/02/04 06:00
MHDA- 2016/02/04 06:00
CRDT- 2016/02/04 06:00
PHST- 2015/09/21 [received]
PHST- 2015/12/30 [accepted]
AID - srep20315 [pii]
AID - 10.1038/srep20315 [doi]
PST - epublish
SO - Sci Rep. 2016 Feb 3;6:20315. doi: 10.1038/srep20315.
2) SeqIO.read() is used to read and parse FASTA files. This is a format that is used to store sequences. A sequence in FASTA format is represented as a series of lines. The first line in a FASTA file starts with a ">" (greater-than) symbol. Following the initial line (used for a unique description of the sequence) is the actual sequence itself in standard one-letter code.
As you can see, the result that you get back from Entrez.efetch() (which I pasted above) doesn't look like a FASTA file. So SeqIO.read() gives the error that it can't find any sequence records in the file.

Resources