unable to get rid of all emojis - python-3.x
I need help removing emojis. I looked at some other stackoverflow questions and this is what I am de but for some reason my code doesn't get rid of all the emojis
d= {'alexveachfashion': 'Fashion Style * Haute Couture * Wearable Tech * VR\n👓👜⌚👠\nSoundCloud is Live #alexveach\n👇New YouTube Episodes ▶️👇', 'andrewvng': 'Family | Fitness | Friends | Gym | Food', 'runvi.official': 'Accurate measurement via SMART insoles & real-time AI coaching. Improve your technique & BOOST your performance with every run.\nSoon on Kickstarter!', 'triing': 'Augmented Jewellery™️ • Montreal. Canada.', 'gedeanekenshima': 'Prof na Etec Albert Einstein, Mestranda em Automação e Controle de Processos, Engenheira de Controle e Automação, Técnica em Automação Industrial.', 'jetyourdaddy': '', 'lavonne_sun': '☄️🔭 ✨\n°●°。Visual Narrative\nA creative heart with a poetic soul.\n————————————\nPARSONS —— Design & Technology', 'taysearch': 'All the World’s Information At Your Fingertips. (Literally) Est. 1991🇺🇸 🎀#PrincessofSearch 🔎Sample 👇🏽 the Search Engine Here 🗽', 'hijewellery': 'Fine 3D printed jewellery for tech lovers #3dprintedjewelry #wearabletech #jewellery', 'yhanchristian': 'Estudante de Engenharia, Maker e viciado em café.', 'femka': 'Fashion Futurist + Fashion Tech Lab Founder #technoirlab + Fashion Designer / Parsons & CSM Grad / Obsessed with #fashiontech #future #cryptocurrency', 'sinhbisen': 'Creator, TRiiNG, augmented jewellery label ⭕️ Transhumanist ⭕️ Corporeal cartographer ⭕️', 'stellawearables': '#StellaWearables ✉️Info#StellaWearables.com Premium Wearable Technology That Monitors Personal Health & Environments ☀️🏝🏜🏔', 'ivoomi_india': 'We are the manufacturers of the most innovative technologies and user-friendly gadgets with a global presence.', 'bgutenschwager': "When it comes to life, it's all about the experience.\nGoogle Mapper 🗺\n360 Photographer 📷\nBrand Rep #QuickTutor", 'storiesofdesign': 'Putting stories at the heart of brands and businesses | Cornwall and London | #storiesofdesign', 'trume.jp': '草創期から国産ウオッチの製造に取り組み、挑戦を続けてきたエプソンが世界に放つ新ブランド「TRUME」(トゥルーム)。目指すのは、最先端技術でアナログウオッチを極めるブランド。', 'themarinesss': "I didn't choose the blog life, the blog life chose me | Aspiring Children's Book Author | www.slayathomemum.com", 'ayowearable': 'The world’s first light-based wearable that helps you sleep better, beat jet lag and have more energy! #goAYO Get yours at:', 'wearyourowntechs': 'Bringing you the latest trends, Current Products and Reviews of Wearable Technology. Discover how they can enhance your Life and Lifestyle', 'roxfordwatches': 'The Roxford | The most stylish and customizable fitness smartwatch. Tracks your steps/calories/dist/sleep. Comes with FOUR bands, and a travel case!', 'playertek': "Track your entire performance - every training session, every match. \nBecause the best players don't hide.", '_kate_hartman_': '', 'hmsmc10': 'Health & Wellness 🍎\nBoston, MA 🏙\nSuffolk MPA ‘17 🎓 \n.\nJust Strong Ambassador 🏋🏻\u200d♀️', 'gadgetxtreme': 'Dedicated to reviewing gadgets, technologies, internet products and breaking tech news. Follow us to see daily vblogs on all the disruptive tech..', 'freedom.journey.leader': '📍MN\n🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 \n📧Ashleybp5#gmail.com \n#homeschool #bossmom #builder #momlife', 'arts_food_life': 'Life through my phone.', 'medgizmo': 'Wearable #tech: #health #healthcare #wellness #gadgets #apps. Images/links provided as information resource only; doesn’t mean we endorse referenced', 'sawearables': 'The home of wearable tech in South Africa!\n--> #WearableTech #WearableTechnology #FitnessTech Find your wearable #', 'shop.mercury': 'Changing the way you charge.⚡️\nGet exclusive product discounts, and help us reach our goal below!🔋', 'invisawear': 'PRE-ORDERS NOW AVAILABLE! Get yours 25% OFF here: #girlboss #wearabletech'}
for key in d:
print("---with emojis----")
print(d[key])
print("---emojis removed----")
x=''.join(c for c in d[key] if c <= '\uFFFF')
print(x)
output example
---with emojis----
📍MN
🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿
📧Ashleybp5#gmail.com
#homeschool #bossmom #builder #momlife
---emojis removed----
MN
Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN.
Ashleybp5#gmail.com
#homeschool #bossmom #builder #momlife
---with emojis----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!🔋
---emojis removed----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!
There is no technical definition of what an "emoji" is. Various glyphs may be used to render printable characters, symbols, control characters and the like. What seems like an "emoji" to you may be part of normal script to others.
What you probably want to do is to look at the Unicode category of each character and filter out various categories. While this does not solve the "emoji"-definition-problem per se, you get much better control over what you are actually doing without removing, for example, literally all characters of languages spoken by 2/3 of the planet.
Instead of filtering out certain categories, you may filter everything except the lower- and uppercase letters (and numbers). However, be aware that ꙭ is not "the googly eyes emoji" but the CYRILLIC SMALL LETTER DOUBLE MONOCULAR O, which is a normal lowercase letter to millions of people.
For example:
import unicodedata
s = "🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿"
# Just filter category "symbol"
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', ))
print(t)
...results in
Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN.
This may not be emoji-free enough, yet the • is technically a form of punctuation. So filter this as well
# Filter symbols and punctuations. You may want 'Cc' as well,
# to get rid of control characters. Beware that newlines are a
# form of control-character.
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', 'Po'))
print(t)
And you get
Wife Homeschooling Mom to 5 D Y I lover Small town living in MN
Related
is there a method to detect person and associate a text?
I have a text like : Take a loot at some of the first confirmed Forum speakers: John Sequiera Graduated in Biology at Facultad de Ciencias Exactas y Naturales,University of Buenos Aires, Argentina. In 2004 obtained a PhD in Biology (Molecular Neuroscience), at University of Buenos Aires, mentored by Prof. Marcelo Rubinstein. Between 2005 and 2008 pursued postdoctoral training at Pasteur Institute (Paris) mentored by Prof Jean-Pierre Changeux, to investigate the role of nicotinic receptors in executive behaviors. Motivated by a deep interest in investigating human neurological diseases, in 2009 joined the Institute of Psychiatry at King’s College London where she performed basic research with a translational perspective in the field of neurodegeneration. Since 2016 has been chief of instructors / Adjunct professor at University of Buenos Aires, Facultad de Ciencias Exactas y Naturales. Tom Gonzalez is a professor of Neuroscience at the Sussex Neuroscience, School of Life Sciences, University of Sussex. Prof. Baden studies how neurons and networks compute, using the beautiful collection of circuits that make up the vertebrate retina as a model. I want to have in output : [{"person" : "John Sequiera" , "content": "Graduated in Biology at Facultad...."},{"person" : "Tom Gonzalez" , "content": "is a professor of Neuroscience at the Sussex..."}] so we want to get NER : PER for person and in content we put all contents after detecting person until we found a new person in the text ... it is possible ? i try to use spacy to extract NER , but i found a difficulty to get content : import spacy nlp = spacy.load("en_core_web_lg") doc = nlp(text) for ent in doc.ents: print(ent.text,ent.label_)
How to resolve pandas length error for rows/columns
I have raised the SO Question here and blessed to have an answer from #Scott Boston. However i am raising another question about an error ValueError: Columns must be same length as key as i am reading a text file and all the rows/columns are not of same length, i tried googling but did not get an answer as i don't want them to be skipped. Error b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n' My pandas dataframe generator #!/usr/bin/python3 import pandas as pd # cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False) cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':' df = cvc_file.set_index('cols').transpose() #set_index and transpose print(df) Result $ ./read_cvc.py b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n' cols ab ad an ed eg et en eck ell it id ig im ish ob og ock ut ub ug um un ud uck ush 0 cab bad ban bed beg bet den beck bell bit bid big dim fish cob bog dock but cub bug bum bun bud buck gush 1 dab dad can fed keg get hen deck cell fit did dig him dish gob cog lock cut hub dug gum fun cud duck hush 2 gab had fan led leg jet men neck dell hit hid fig rim wish job dog rock gut nub hug hum gun dud luck lush 3 jab lad man red peg let pen peck jell kit kid gig brim swish lob fog sock hut rub jug mum nun mud muck mush 4 lab mad pan wed NaN met ten check sell lit lid jig grim NaN mob hog tock jut sub lug sum pun spud puck rush 5 nab pad ran bled NaN net then fleck tell pit rid pig skim NaN rob jog block nut tub mug chum run stud suck blush File contents $ cat kids_cvc ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat ad: bad, dad, had, lad, mad, pad, sad, tad, glad an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed eg: beg, keg, leg, peg et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret en: den, hen, men, pen, ten, then, when eck: beck, deck, neck, peck, check, fleck, speck, wreck ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit id: bid, did, hid, kid, lid, rid, skid, slid ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick ish: fish, dish, wish, swish in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop ock: dock, lock, rock, sock, tock, block, clock, flock, rock, shock, smock, stock ut: but, cut, gut, hut, jut, nut, rut, shut ub: cub, hub, nub, rub, sub, tub, grub, snub, stub ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum un: bun, fun, gun, nun, pun, run, sun, spun, stun ud: bud, cud, dud, mud, spud, stud, thud uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush Note: It's making the first row/column as a master one which has 13 values and skipping all the columns which are more than 13 columns.
I couldn't figure out a pandas way to extend the columns, but converting the rows to a dictionary made things easier. ss = ''' ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat ad: bad, dad, had, lad, mad, pad, sad, tad, glad ....... un: bun, fun, gun, nun, pun, run, sun, spun, stun ud: bud, cud, dud, mud, spud, stud, thud uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush '''.strip() with open ('kids.cvc','w') as f: f.write(ss) # write data file ###################################### import pandas as pd dd = {} maxcnt=0 with open('kids.cvc') as f: lines = f.readlines() for line in lines: line = line.strip() # remove \n len1 = len(line) # words have leading space line = line.replace(' ','') cnt = len1 - len(line) # get word (space) count if cnt > maxcnt: maxcnt = cnt # max word count rec = line.split(':') # header : words dd[rec[0]] = rec[1].split(',') # split words for k in dd: dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column df = pd.DataFrame(dd) # convert dictionary to dataframe print(df.to_string(index=False)) Output ab at ad an ag ap am ack ash ed eg et en eck ell it id ig im ip ick ish in ot ob og op ock ut ub ug um un ud uck ush cab bat bad ban bag cap bam back bash bed beg bet den beck bell bit bid big dim dip kick fish bin cot cob bog cop dock but cub bug bum bun bud buck gush dab cat dad can gag gap dam hack cash fed keg get hen deck cell fit did dig him hip lick dish din dot gob cog hop lock cut hub dug gum fun cud duck hush gab fat had fan hag lap ham jack dash led leg jet men neck dell hit hid fig rim lip nick wish fin got job dog mop rock gut nub hug hum gun dud luck lush jab hat lad man lag map jam lack gash red peg let pen peck jell kit kid gig brim nip pick swish pin hot lob fog pop sock hut rub jug mum nun mud muck mush lab mat mad pan nag nap ram pack hash wed met ten check sell lit lid jig grim rip sick sin jot mob hog top tock jut sub lug sum pun spud puck rush nab pat pad ran rag rap yam rack lash bled net then fleck tell pit rid pig skim sip tick tin lot rob jog chop block nut tub mug chum run stud suck blush tab rat sad tan sag sap clam sack mash bred pet when speck well sit skid rig slim tip wick win not sob log crop clock rut grub pug drum sun thud tuck brush blab sat tad van tag tap cram tack rash fled set wreck yell wit slid wig swim zip brick chin pot blob blog drop flock shut snub rug glum spun yuck crush crab vat glad clan wag yap scam black sash pled vet dwell knit zig trim chip chick grin rot glob clog flop rock stub tug plum stun chuck flush grab brat plan brag zap slam crack clash sled wet shell quit twig whim clip click shin tot knob frog glop shock drug scum cluck slush scab chat scan drag chap spam shack crash shed yet smell slit drip flick skin blot slob plop smock plug slum pluck stab flat than flag clap swam snack flash fret spell spit flip quick spin knot snob shop stock slug stuck slab gnat snag flap tram stack slash swell grip slick thin plot slop snug truck spat stag slap wham quack smash ship stick twin shot stop snap track skip thick slot trap slip trick spot
How to reconstruct original text from spaCy tokens, even in cases with complicated whitespacing and punctuation
' '.join(token_list) does not reconstruct the original text in cases with multiple whitespaces and punctuation in a row. For example: from spacy.tokenizer import Tokenizer from spacy.lang.en import English nlp = English() # Create a blank Tokenizer with just the English vocab tokenizerSpaCy = Tokenizer(nlp.vocab) context_text = 'this is a test \n \n \t\t test for \n testing - ./l \t' contextSpaCyToksSpaCyObj = tokenizerSpaCy(context_text) spaCy_toks = [i.text for i in contextSpaCyToksSpaCyObj] reconstruct = ' '.join(spaCy_toks) reconstruct == context_text >False Is there an established way of reconstructing original text from spaCy tokens? Established answer should work with this edge case text (you can directly get the source from clicking the 'improve this question' button) " UNCLASSIFIED U.S. Department of State Case No. F-2014-20439 Doc No. C05795279 Date: 01/07/2016\n\n\n RELEASE IN PART\n B5, B6\n\n\n\n\nFrom: H <hrod17#clintonemail.com>\nSent: Monday, July 23, 2012 7:26 AM\nTo: 'millscd #state.gov'\nCc: 'DanielJJ#state.gov.; 'hanleymr#state.gov'\nSubject Re: S speech this morning\n\n\n\n Waiting to hear if Monica can come by and pick up at 8 to take to Josh. If I don't hear from her, can you send B5\nsomeone else?\n\n Original Message ----\nFrom: Mills, Cheryl D [MillsCD#state.gov]\nSent: Monday, July 23, 2012 07:23 AM\nTo: H\nCc: Daniel, Joshua J <Daniel1.1#state.gov>\nSubject: FW: S speech this morning\n\nSee below\n\n B5\n\ncdm\n\n Original Message\nFrom: Shah, Rajiv (AID/A) B6\nSent: Monday, July 23, 2012 7:19 AM\nTo: Mills, Cheryl D\nCc: Daniel, Joshua.'\nSubject: S speech this morning\n\nHi cheryl,\n\nI look fwd to attending the speech this morning.\n\nI had one last minute request - I understand that in the final version there is no reference to the child survival call to\naction, but their is a reference to family planning efforts. Could you and josh try to make sure there is some specific\nreference to the call to action?\n\nAlso, in terms of acknowledgements it would be good to note torn friedan's leadership as everyone is sensitive to our ghi\ntransition and we want to continue to send the usaid-pepfar-cdc working together public message. I don't know if he is\nthere, but wanted to flag.\n\nLook forward to it.\n\nRaj\n\n\n\n\n UNCLASSIFIED U.S. Department of State Case No. F-2014-20439 Doc No. C05795279 Date: 01/07/2016\n\x0c"
You can very easily accomplish this by changing two lines in your code: spaCy_toks = [i.text + i.whitespace_ for i in contextSpaCyToksSpaCyObj] reconstruct = ''.join(spaCy_toks) Basically, each token in spaCy knows whether it is followed by whitespace or not. So you call token.whitespace_ instead of joining them on space by default.
In python, headings not in the same row
I extracted three columns from a larger data frame (recent_grads) as follows... df = recent_grads.groupby('Major_category')['Men', 'Women'].sum() However, when I print df, it comes up as follows... Men Women Major_category Agriculture & Natural Resources 40357.0 35263.0 Arts 134390.0 222740.0 Biology & Life Science 184919.0 268943.0 Business 667852.0 634524.0 Communications & Journalism 131921.0 260680.0 Computers & Mathematics 208725.0 90283.0 Education 103526.0 455603.0 Engineering 408307.0 129276.0 Health 75517.0 387713.0 Humanities & Liberal Arts 272846.0 440622.0 Industrial Arts & Consumer Services 103781.0 126011.0 Interdisciplinary 2817.0 9479.0 Law & Public Policy 91129.0 87978.0 Physical Sciences 95390.0 90089.0 Psychology & Social Work 98115.0 382892.0 Social Science 256834.0 273132.0 How do I get Major_category heading in the same row as Men and Women headings? I tried to put the three columns in a new data frame as follows... df1 = df[['Major_category', 'Men', 'Women']].copy() This gives me an error (Major_category not in index)
Hi man you should try reset_index https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html: df = df.groupby('Major_category')['Men', 'Women'].sum() # Print the output. md = df.reset_index() print(md)
Seems like you want to convert the groupby object back to a dataframe try: df['Major_category'].apply(pd.DataFrame)
How to download pubmed articles and read them?
Im having trouble to save pubmed articles and read them. I've seen at this page here that there are some special files types but no one of them worked for me. I want to save them in a way that I can continuous using the keys to get the the data. I don't know if its possible use it if I save it as a text file. My code is this one: import sys from Bio import Entrez import re import os from Bio import Medline from Bio import SeqIO '''Class Crawler is responsable to browse the biological databases from DownloadArticles import DownloadArticles c = DownloadArticles() c.articles_dataset_list ''' class DownloadArticles(): def __init__(self): Entrez.email='myemail#gmail.com' self.dataC = self.saveArticlesFilesInXMLMode('pubmed', '26837606') '''Metodo 4 ler dado em forma de texto.''' def saveArticlesFilesInXMLMode(self,dbs, ids): net_handle = Entrez.efetch(db=dbs, id=ids, rettype="medline", retmode="txt") directory = "/dataset/Pubmed/DatasetArticles/"+ ids + ".fasta" # if not os.path.exists(directory): # os.makedirs(directory) # filename = directory + '/' # if not os.path.exists(filename): out_handle = open(directory, "w+") out_handle.write(net_handle.read()) out_handle.close() net_handle.close() print("Saved") print("Parsing...") record = SeqIO.read(directory, "fasta") print(record) return(record.read()) I'm getting this error: ValueError: No records found in handle Pease someone can help me? Now my code is like this, I am trying to do a function to save in .fasta like you did. And one to read the .fasta files like in the answer above. import sys from Bio import Entrez import re import os from Bio import Medline from Bio import SeqIO def save_Articles_Files(dbName, idNum, rettypeName): net_handle = Entrez.efetch(db=dbName, id=idNum, rettype=rettypeName, retmode="txt") filename = path + idNum + ".fasta" out_handle = open(filename, "w") out_handle.write(net_handle.read()) out_handle.close() net_handle.close() print("Saved") enter code here Entrez.email='myemail#gmail.com' dbName = 'pubmed' idNum = '26837606' rettypeName = "medline" path ="/run/media/Dropbox/codigos/Codes/"+dbName save_Articles_Files(dbName, idNum, rettypeName) But my function is not working I need some help please!
You're mixing up two concepts. 1) Entrez.efetch() is used to access NCBI. In your case you are downloading an article from Pubmed. The result that you get from net_handle.read() looks like: PMID- 26837606 OWN - NLM STAT- In-Process DA - 20160203 LR - 20160210 IS - 2045-2322 (Electronic) IS - 2045-2322 (Linking) VI - 6 DP - 2016 Feb 03 TI - Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia. PG - 20315 LID - 10.1038/srep20315 [doi] AB - Recently, CRISPR/Cas9 technology has emerged as a powerful approach for targeted genome modification in eukaryotic organisms from yeast to human cell lines. Its successful application in several plant species promises enormous potential for basic and applied plant research. However, extensive studies are still needed to assess this system in other important plant species, to broaden its fields of application and to improve methods. Here we showed that the CRISPR/Cas9 system is efficient in petunia (Petunia hybrid), an important ornamental plant and a model for comparative research. When PDS was used as target gene, transgenic shoot lines with albino phenotype accounted for 55.6%-87.5% of the total regenerated T0 Basta-resistant lines. A homozygous deletion close to 1 kb in length can be readily generated and identified in the first generation. A sequential transformation strategy--introducing Cas9 and sgRNA expression cassettes sequentially into petunia--can be used to make targeted mutations with short indels or chromosomal fragment deletions. Our results present a new plant species amenable to CRIPR/Cas9 technology and provide an alternative procedure for its exploitation. FAU - Zhang, Bin AU - Zhang B AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Yang, Xia AU - Yang X AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Yang, Chunping AU - Yang C AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Li, Mingyang AU - Li M AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Guo, Yulong AU - Guo Y AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20160203 PL - England TA - Sci Rep JT - Scientific reports JID - 101563288 SB - IM PMC - PMC4738242 OID - NLM: PMC4738242 EDAT- 2016/02/04 06:00 MHDA- 2016/02/04 06:00 CRDT- 2016/02/04 06:00 PHST- 2015/09/21 [received] PHST- 2015/12/30 [accepted] AID - srep20315 [pii] AID - 10.1038/srep20315 [doi] PST - epublish SO - Sci Rep. 2016 Feb 3;6:20315. doi: 10.1038/srep20315. 2) SeqIO.read() is used to read and parse FASTA files. This is a format that is used to store sequences. A sequence in FASTA format is represented as a series of lines. The first line in a FASTA file starts with a ">" (greater-than) symbol. Following the initial line (used for a unique description of the sequence) is the actual sequence itself in standard one-letter code. As you can see, the result that you get back from Entrez.efetch() (which I pasted above) doesn't look like a FASTA file. So SeqIO.read() gives the error that it can't find any sequence records in the file.