Scraping youtube playlist - python-3.x
I've been trying to write a python script which will fetch me the name of the songs contained in the playlist whose link will be provided. for eg.https://www.youtube.com/watch?v=foE1mO2yM04&list=RDGMEMYH9CUrFO7CfLJpaD7UR85wVMfoE1mO2yM04 from the terminal.
I've found out that names could be extracted by using "li" tag or "h4" tag.
I wrote the following code,
import sys
link = sys.argv[1]
from bs4 import BeautifulSoup
import requests
req = requests.get(link)
try:
req.raise_for_status()
except Exception as exc:
print('There was a problem:',exc)
soup = BeautifulSoup(req.text,"html.parser")
Then I tried using li-tag as:
i=soup.findAll('li')
print(type(i))
for o in i:
print(o.get('data-video-title'))
But it printed "None" those number of time. I belive it is not able to reach those li tags which contains data-video-title attribute.
Then I tried using div and h4 tags as,
for i in soup.findAll('div', attrs={'class':'playlist-video-description'}):
o = i.find('h4')
print(o.text)
But nothing happens again..
import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com/watch?v=foE1mO2yM04&list=RDGMEMYH9CUrFO7CfLJpaD7UR85wVMfoE1mO2yM04'
data = requests.get(url)
data = data.text
soup = BeautifulSoup(data)
h4 = soup.find_all("h4")
for h in h4:
print(h.text)
output:
Mike Posner - I Took A Pill In Ibiza (Seeb Remix) (Explicit)
Alan Walker - Faded
Calvin Harris - This Is What You Came For (Official Video) ft. Rihanna
Coldplay - Hymn For The Weekend (Official video)
Jonas Blue - Fast Car ft. Dakota
Calvin Harris & Disciples - How Deep Is Your Love
Galantis - No Money (Official Video)
Kungs vs Cookin’ on 3 Burners - This Girl
Clean Bandit - Rockabye ft. Sean Paul & Anne-Marie [Official Video]
Major Lazer - Light It Up (feat. Nyla & Fuse ODG) [Remix] (Official Lyric Video)
Robin Schulz - Sugar (feat. Francesco Yates) (OFFICIAL MUSIC VIDEO)
DJ Snake - Middle ft. Bipolar Sunshine
Jonas Blue - Perfect Strangers ft. JP Cooper
David Guetta ft. Zara Larsson - This One's For You (Music Video) (UEFA EURO 2016™ Official Song)
DJ Snake - Let Me Love You ft. Justin Bieber
Duke Dumont - Ocean Drive
Galantis - Runaway (U & I) (Official Video)
Sigala - Sweet Lovin' (Official Video) ft. Bryn Christopher
Martin Garrix - Animals (Official Video)
David Guetta & Showtek - Bad ft.Vassy (Lyrics Video)
DVBBS & Borgeous - TSUNAMI (Original Mix)
AronChupa - I'm an Albatraoz | OFFICIAL VIDEO
Lilly Wood & The Prick and Robin Schulz - Prayer In C (Robin Schulz Remix) (Official)
Kygo - Firestone ft. Conrad Sewell
DEAF KEV - Invincible [NCS Release]
Eiffel 65 - Blue (KNY Factory Remix)
Ok guys, I have figured out what was happening. My code was perfect and it works fine, the problem was that I was passing the link as an argument from the terminal and co-incidentally, the link contained some symbols which were interpreted in some other fashion for eg. ('&').
Now I am passing the link as a string in the terminal and everything works fine. So dumb yet time-consuming mistake.
Related
Trying to replace image extensions like "<filename>.<extension>" to "<filename>_resized.<extension>"
I'm trying to use this code in Python using regular expression to get all the image files (of types jpg, png and bmp) in my current folder and add a word "resized" inbetween the filename and the extension Input Batman - The Grey Ghost.png Mom and Dad - Young.jpg Expected Output Batman - The Grey Ghost_resized.png Mom and Dad - Young_resized.jpg Query But my output is not as expected. Somehow the 2nd letter of the extension is getting replaced. I have tried tutorials online, but didn't see one which answers my query. Any help would be appreciated. Code: import glob import re files=glob.glob('*.[jp][pn]g')+glob.glob('*.bmp') for x in files: new_file = re.sub(r'([a-z|0-9]).([jpb|pnm|ggp])$',r'\1_resized.\2',x) print(new_file,' : ',x) Code Output Ma image scan - Copy.j\_resized.g : Ma image scan - Copy.jpg Ma image scan.j\_resized.g : Ma image scan.jpg Mom and Dad - Young.j\_resized.g : Mom and Dad - Young.jpg PPF - SBI - 4.j\_resized.g : PPF - SBI - 4.jpg when-youre-a-noob-programmer-and-you-think-your-loop-64102565.p\_resized.g : when-youre-a-noob-programmer-and-you-think-your-loop-64102565.png Sample.b\_resized.p : Sample.bmp
Try this: r'([a-zA-Z0-9_ -]+)\.(bmp|jpg|png)$' Input: Batman - The Grey Ghost.png Output: Batman - The Grey Ghost_resized.png See live demo.
New to using Spotipy and Python3, I want to use new_releases() to print Artist name and the artist album
So far I was able to print out all albums by a person of my choosing using this spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(client_id, client_secret)) results = spotify.artist_albums(posty_uri, album_type='album') albums = results['items'] while results['next']: results = spotify.next(results) albums.extend(results['items']) for album in albums: print(album['name']) I was trying to do a similar process for new_releases() by doing this newReleases = spotify.new_releases() test = newReleases['items'] but this throws me an error on the line test = newReleases['items']. If anyone is familiar with Spotipy and knows how to return things like release date, artist name, album name from new_releases() I would greatly appreciate it.
I'm a little confused because the documentation says that the new_releases method returns a list. In any event, it is a one-item dictionary which contains a list. However that list contains dictionaries which seem a bit unwieldy, so I understand why you're asking this question. You can make use of the collections.namedtuple data structure to make it easier to see the relevant information. I don't claim that this is the best way to transform this data, but it seems to me a decent way. import collecdtions as co # namedtuple data structure that will be easier to understand and use Album = co.namedtuple(typename='Album',field_names=['album_name', 'artist_name', 'release_date']) newReleases2 = [] # couldn't think of a better name for album in newReleases['albums']['items']: artist_sublist = [] for artist in album['artists']: artist_sublist.append(artist['name']) newReleases2.append(Album(album_name=album['name'], artist_name=artist_sublist, release_date=album['release_date'])) This results in the following list of namedtuples: [Album(album_name='Only Wanna Be With You (Pokémon 25 Version)', artist_name=['Post Malone'], release_date='2021-02-25'), Album(album_name='AP (Music from the film Boogie)', artist_name=['Pop Smoke'], release_date='2021-02-26'), Album(album_name='Like This', artist_name=['2KBABY', 'Marshmello'], release_date='2021-02-26'), Album(album_name='Go Big (From The Amazon Original Motion Picture Soundtrack Coming 2 America)', artist_name=['YG', 'Big Sean'], release_date='2021-02-26'), Album(album_name='Here Comes The Shock', artist_name=['Green Day'], release_date='2021-02-21'), Album(album_name='Spaceman', artist_name=['Nick Jonas'], release_date='2021-02-25'), Album(album_name='Life Support', artist_name=['Madison Beer'], release_date='2021-02-26'), Album(album_name="Drunk (And I Don't Wanna Go Home)", artist_name=['Elle King', 'Miranda Lambert'], release_date='2021-02-26'), Album(album_name='PROBLEMA', artist_name=['Daddy Yankee'], release_date='2021-02-26'), Album(album_name='Leave A Little Love', artist_name=['Alesso', 'Armin van Buuren'], release_date='2021-02-26'), Album(album_name='Rotate', artist_name=['Becky G', 'Burna Boy'], release_date='2021-02-22'), Album(album_name='BED', artist_name=['Joel Corry', 'RAYE', 'David Guetta'], release_date='2021-02-26'), Album(album_name='A N N I V E R S A R Y (Deluxe)', artist_name=['Bryson Tiller'], release_date='2021-02-26'), Album(album_name='Little Oblivions', artist_name=['Julien Baker'], release_date='2021-02-26'), Album(album_name='Money Long (feat. 42 Dugg)', artist_name=['DDG', 'OG Parker'], release_date='2021-02-26'), Album(album_name='El Madrileño', artist_name=['C. Tangana'], release_date='2021-02-26'), Album(album_name='Skegee', artist_name=['JID'], release_date='2021-02-23'), Album(album_name='Coyote Cry', artist_name=['Ian Munsick'], release_date='2021-02-26'), Album(album_name='Rainforest', artist_name=['Noname'], release_date='2021-02-26'), Album(album_name='The American Negro', artist_name=['Adrian Younge'], release_date='2021-02-26')] If you wanted to see the artist(s) associated with the 11th album in this list, you could do this: In [62]: newReleases2[10].artist_name Out[62]: ['Becky G', 'Burna Boy'] Edit: in a comment on this answer, OP requested getting album cover as well. Please see helper function, and slightly modified code below: import os import requests def download_album_cover(url): # helper function to download album cover # using code from: https://stackoverflow.com/a/13137873/42346 download_path = os.getcwd() + os.sep + url.rsplit('/', 1)[-1] r = requests.get(url, stream=True) if r.status_code == 200: with open(download_path, 'wb') as f: for chunk in r.iter_content(1024): f.write(chunk) return download_path # modified data structure Album = co.namedtuple(typename='Album',field_names=['album_name', 'album_cover', 'artist_name', 'release_date']) # modified retrieval code newReleases2 = [] for album in newReleases['albums']['items']: album_cover = download_album_cover(album['images'][0]['url']) artist_sublist = [] for artist in album['artists']: artist_sublist.append(artist['name']) newReleases2.append(Album(album_name=album['name'], album_cover=album_cover, artist_name=artist_sublist, release_date=album['release_date'])) Result: [Album(album_name='Scary Hours 2', album_cover='/home/adamcbernier/ab67616d0000b2738b20e4631fa15d3953528bbc', artist_name=['Drake'], release_date='2021-03-05'), Album(album_name='Boogie: Original Motion Picture Soundtrack', album_cover='/home/adamcbernier/ab67616d0000b27395e532805e8c97be7a551e3a', artist_name=['Various Artists'], release_date='2021-03-05'), Album(album_name='Hold On', album_cover='/home/adamcbernier/ab67616d0000b273f33d3618aca6b3cfdcd2fc43', artist_name=['Justin Bieber'], release_date='2021-03-05'), Album(album_name='Serotonin', album_cover='/home/adamcbernier/ab67616d0000b2737fb30ee0638c764d6f3247d2', artist_name=['girl in red'], release_date='2021-03-03'), Album(album_name='Leave The Door Open', album_cover='/home/adamcbernier/ab67616d0000b2736f9e6abbd6fa43ac3cdbeee0', artist_name=['Bruno Mars', 'Anderson .Paak', 'Silk Sonic'], release_date='2021-03-05'), Album(album_name='Real As It Gets (feat. EST Gee)', album_cover='/home/adamcbernier/ab67616d0000b273f0f6f6144929a1ff72001f5e', artist_name=['Lil Baby', 'EST Gee'], release_date='2021-03-04'), Album(album_name='Life’s A Mess II (with Clever & Post Malone)', album_cover='/home/adamcbernier/ab67616d0000b2732e8d23414fd0b81c35bdedea', artist_name=['Juice WRLD'], release_date='2021-03-05'), Album(album_name='slower', album_cover='/home/adamcbernier/ab67616d0000b273b742c96d78d9091ce4a1c5c1', artist_name=['Tate McRae'], release_date='2021-03-03'), Album(album_name='Sacrifice', album_cover='/home/adamcbernier/ab67616d0000b27398bfcce8be630dd5f2f346e4', artist_name=['Bebe Rexha'], release_date='2021-03-05'), Album(album_name='Poster Girl', album_cover='/home/adamcbernier/ab67616d0000b273503b16348e47bc3c1c823eba', artist_name=['Zara Larsson'], release_date='2021-03-05'), Album(album_name='Beautiful Mistakes (feat. Megan Thee Stallion)', album_cover='/home/adamcbernier/ab67616d0000b273787f41be59050c46f69db580', artist_name=['Maroon 5', 'Megan Thee Stallion'], release_date='2021-03-03'), Album(album_name='Pay Your Way In Pain', album_cover='/home/adamcbernier/ab67616d0000b273a1e1b4608e1e04b40113e6e1', artist_name=['St. Vincent'], release_date='2021-03-04'), Album(album_name='My Head is a Moshpit', album_cover='/home/adamcbernier/ab67616d0000b2733db806083e3b649f1d969a4e', artist_name=['Verzache'], release_date='2021-03-05'), Album(album_name='When You See Yourself', album_cover='/home/adamcbernier/ab67616d0000b27377253620f08397c998d18d78', artist_name=['Kings of Leon'], release_date='2021-03-05'), Album(album_name='Mis Manos', album_cover='/home/adamcbernier/ab67616d0000b273d7210e8d6986196b28d084ef', artist_name=['Camilo'], release_date='2021-03-04'), Album(album_name='Retumban2', album_cover='/home/adamcbernier/ab67616d0000b2738a79a82236682469aecdbbdf', artist_name=['Ovi'], release_date='2021-03-05'), Album(album_name='Take My Hand', album_cover='/home/adamcbernier/ab67616d0000b273b7839c3ba191de59f5d3a3d7', artist_name=['LP Giobbi'], release_date='2021-03-05'), Album(album_name="Ma' G", album_cover='/home/adamcbernier/ab67616d0000b27351b5ebb959c37913ac61b033', artist_name=['J Balvin'], release_date='2021-02-28'), Album(album_name='Aspen', album_cover='/home/adamcbernier/ab67616d0000b27387d1d17d16cf131765ce4be8', artist_name=['Young Dolph', 'Key Glock'], release_date='2021-03-05'), Album(album_name='Only The Family - Lil Durk Presents: Loyal Bros', album_cover='/home/adamcbernier/ab67616d0000b273a3df38e11e978b34b47583d0', artist_name=['Only The Family'], release_date='2021-03-05')]
How to resolve pandas length error for rows/columns
I have raised the SO Question here and blessed to have an answer from #Scott Boston. However i am raising another question about an error ValueError: Columns must be same length as key as i am reading a text file and all the rows/columns are not of same length, i tried googling but did not get an answer as i don't want them to be skipped. Error b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n' My pandas dataframe generator #!/usr/bin/python3 import pandas as pd # cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False) cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':' df = cvc_file.set_index('cols').transpose() #set_index and transpose print(df) Result $ ./read_cvc.py b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n' cols ab ad an ed eg et en eck ell it id ig im ish ob og ock ut ub ug um un ud uck ush 0 cab bad ban bed beg bet den beck bell bit bid big dim fish cob bog dock but cub bug bum bun bud buck gush 1 dab dad can fed keg get hen deck cell fit did dig him dish gob cog lock cut hub dug gum fun cud duck hush 2 gab had fan led leg jet men neck dell hit hid fig rim wish job dog rock gut nub hug hum gun dud luck lush 3 jab lad man red peg let pen peck jell kit kid gig brim swish lob fog sock hut rub jug mum nun mud muck mush 4 lab mad pan wed NaN met ten check sell lit lid jig grim NaN mob hog tock jut sub lug sum pun spud puck rush 5 nab pad ran bled NaN net then fleck tell pit rid pig skim NaN rob jog block nut tub mug chum run stud suck blush File contents $ cat kids_cvc ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat ad: bad, dad, had, lad, mad, pad, sad, tad, glad an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed eg: beg, keg, leg, peg et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret en: den, hen, men, pen, ten, then, when eck: beck, deck, neck, peck, check, fleck, speck, wreck ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit id: bid, did, hid, kid, lid, rid, skid, slid ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick ish: fish, dish, wish, swish in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop ock: dock, lock, rock, sock, tock, block, clock, flock, rock, shock, smock, stock ut: but, cut, gut, hut, jut, nut, rut, shut ub: cub, hub, nub, rub, sub, tub, grub, snub, stub ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum un: bun, fun, gun, nun, pun, run, sun, spun, stun ud: bud, cud, dud, mud, spud, stud, thud uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush Note: It's making the first row/column as a master one which has 13 values and skipping all the columns which are more than 13 columns.
I couldn't figure out a pandas way to extend the columns, but converting the rows to a dictionary made things easier. ss = ''' ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat ad: bad, dad, had, lad, mad, pad, sad, tad, glad ....... un: bun, fun, gun, nun, pun, run, sun, spun, stun ud: bud, cud, dud, mud, spud, stud, thud uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush '''.strip() with open ('kids.cvc','w') as f: f.write(ss) # write data file ###################################### import pandas as pd dd = {} maxcnt=0 with open('kids.cvc') as f: lines = f.readlines() for line in lines: line = line.strip() # remove \n len1 = len(line) # words have leading space line = line.replace(' ','') cnt = len1 - len(line) # get word (space) count if cnt > maxcnt: maxcnt = cnt # max word count rec = line.split(':') # header : words dd[rec[0]] = rec[1].split(',') # split words for k in dd: dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column df = pd.DataFrame(dd) # convert dictionary to dataframe print(df.to_string(index=False)) Output ab at ad an ag ap am ack ash ed eg et en eck ell it id ig im ip ick ish in ot ob og op ock ut ub ug um un ud uck ush cab bat bad ban bag cap bam back bash bed beg bet den beck bell bit bid big dim dip kick fish bin cot cob bog cop dock but cub bug bum bun bud buck gush dab cat dad can gag gap dam hack cash fed keg get hen deck cell fit did dig him hip lick dish din dot gob cog hop lock cut hub dug gum fun cud duck hush gab fat had fan hag lap ham jack dash led leg jet men neck dell hit hid fig rim lip nick wish fin got job dog mop rock gut nub hug hum gun dud luck lush jab hat lad man lag map jam lack gash red peg let pen peck jell kit kid gig brim nip pick swish pin hot lob fog pop sock hut rub jug mum nun mud muck mush lab mat mad pan nag nap ram pack hash wed met ten check sell lit lid jig grim rip sick sin jot mob hog top tock jut sub lug sum pun spud puck rush nab pat pad ran rag rap yam rack lash bled net then fleck tell pit rid pig skim sip tick tin lot rob jog chop block nut tub mug chum run stud suck blush tab rat sad tan sag sap clam sack mash bred pet when speck well sit skid rig slim tip wick win not sob log crop clock rut grub pug drum sun thud tuck brush blab sat tad van tag tap cram tack rash fled set wreck yell wit slid wig swim zip brick chin pot blob blog drop flock shut snub rug glum spun yuck crush crab vat glad clan wag yap scam black sash pled vet dwell knit zig trim chip chick grin rot glob clog flop rock stub tug plum stun chuck flush grab brat plan brag zap slam crack clash sled wet shell quit twig whim clip click shin tot knob frog glop shock drug scum cluck slush scab chat scan drag chap spam shack crash shed yet smell slit drip flick skin blot slob plop smock plug slum pluck stab flat than flag clap swam snack flash fret spell spit flip quick spin knot snob shop stock slug stuck slab gnat snag flap tram stack slash swell grip slick thin plot slop snug truck spat stag slap wham quack smash ship stick twin shot stop snap track skip thick slot trap slip trick spot
unable to get rid of all emojis
I need help removing emojis. I looked at some other stackoverflow questions and this is what I am de but for some reason my code doesn't get rid of all the emojis d= {'alexveachfashion': 'Fashion Style * Haute Couture * Wearable Tech * VR\n👓👜⌚👠\nSoundCloud is Live #alexveach\n👇New YouTube Episodes ▶️👇', 'andrewvng': 'Family | Fitness | Friends | Gym | Food', 'runvi.official': 'Accurate measurement via SMART insoles & real-time AI coaching. Improve your technique & BOOST your performance with every run.\nSoon on Kickstarter!', 'triing': 'Augmented Jewellery™️ • Montreal. Canada.', 'gedeanekenshima': 'Prof na Etec Albert Einstein, Mestranda em Automação e Controle de Processos, Engenheira de Controle e Automação, Técnica em Automação Industrial.', 'jetyourdaddy': '', 'lavonne_sun': '☄️🔭 ✨\n°●°。Visual Narrative\nA creative heart with a poetic soul.\n————————————\nPARSONS —— Design & Technology', 'taysearch': 'All the World’s Information At Your Fingertips. (Literally) Est. 1991🇺🇸 🎀#PrincessofSearch 🔎Sample 👇🏽 the Search Engine Here 🗽', 'hijewellery': 'Fine 3D printed jewellery for tech lovers #3dprintedjewelry #wearabletech #jewellery', 'yhanchristian': 'Estudante de Engenharia, Maker e viciado em café.', 'femka': 'Fashion Futurist + Fashion Tech Lab Founder #technoirlab + Fashion Designer / Parsons & CSM Grad / Obsessed with #fashiontech #future #cryptocurrency', 'sinhbisen': 'Creator, TRiiNG, augmented jewellery label ⭕️ Transhumanist ⭕️ Corporeal cartographer ⭕️', 'stellawearables': '#StellaWearables ✉️Info#StellaWearables.com Premium Wearable Technology That Monitors Personal Health & Environments ☀️🏝🏜🏔', 'ivoomi_india': 'We are the manufacturers of the most innovative technologies and user-friendly gadgets with a global presence.', 'bgutenschwager': "When it comes to life, it's all about the experience.\nGoogle Mapper 🗺\n360 Photographer 📷\nBrand Rep #QuickTutor", 'storiesofdesign': 'Putting stories at the heart of brands and businesses | Cornwall and London | #storiesofdesign', 'trume.jp': '草創期から国産ウオッチの製造に取り組み、挑戦を続けてきたエプソンが世界に放つ新ブランド「TRUME」(トゥルーム)。目指すのは、最先端技術でアナログウオッチを極めるブランド。', 'themarinesss': "I didn't choose the blog life, the blog life chose me | Aspiring Children's Book Author | www.slayathomemum.com", 'ayowearable': 'The world’s first light-based wearable that helps you sleep better, beat jet lag and have more energy! #goAYO Get yours at:', 'wearyourowntechs': 'Bringing you the latest trends, Current Products and Reviews of Wearable Technology. Discover how they can enhance your Life and Lifestyle', 'roxfordwatches': 'The Roxford | The most stylish and customizable fitness smartwatch. Tracks your steps/calories/dist/sleep. Comes with FOUR bands, and a travel case!', 'playertek': "Track your entire performance - every training session, every match. \nBecause the best players don't hide.", '_kate_hartman_': '', 'hmsmc10': 'Health & Wellness 🍎\nBoston, MA 🏙\nSuffolk MPA ‘17 🎓 \n.\nJust Strong Ambassador 🏋🏻\u200d♀️', 'gadgetxtreme': 'Dedicated to reviewing gadgets, technologies, internet products and breaking tech news. Follow us to see daily vblogs on all the disruptive tech..', 'freedom.journey.leader': '📍MN\n🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 \n📧Ashleybp5#gmail.com \n#homeschool #bossmom #builder #momlife', 'arts_food_life': 'Life through my phone.', 'medgizmo': 'Wearable #tech: #health #healthcare #wellness #gadgets #apps. Images/links provided as information resource only; doesn’t mean we endorse referenced', 'sawearables': 'The home of wearable tech in South Africa!\n--> #WearableTech #WearableTechnology #FitnessTech Find your wearable #', 'shop.mercury': 'Changing the way you charge.⚡️\nGet exclusive product discounts, and help us reach our goal below!🔋', 'invisawear': 'PRE-ORDERS NOW AVAILABLE! Get yours 25% OFF here: #girlboss #wearabletech'} for key in d: print("---with emojis----") print(d[key]) print("---emojis removed----") x=''.join(c for c in d[key] if c <= '\uFFFF') print(x) output example ---with emojis---- 📍MN 🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 📧Ashleybp5#gmail.com #homeschool #bossmom #builder #momlife ---emojis removed---- MN Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN. Ashleybp5#gmail.com #homeschool #bossmom #builder #momlife ---with emojis---- Changing the way you charge.⚡️ Get exclusive product discounts, and help us reach our goal below!🔋 ---emojis removed---- Changing the way you charge.⚡️ Get exclusive product discounts, and help us reach our goal below!
There is no technical definition of what an "emoji" is. Various glyphs may be used to render printable characters, symbols, control characters and the like. What seems like an "emoji" to you may be part of normal script to others. What you probably want to do is to look at the Unicode category of each character and filter out various categories. While this does not solve the "emoji"-definition-problem per se, you get much better control over what you are actually doing without removing, for example, literally all characters of languages spoken by 2/3 of the planet. Instead of filtering out certain categories, you may filter everything except the lower- and uppercase letters (and numbers). However, be aware that ꙭ is not "the googly eyes emoji" but the CYRILLIC SMALL LETTER DOUBLE MONOCULAR O, which is a normal lowercase letter to millions of people. For example: import unicodedata s = "🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿" # Just filter category "symbol" t = ''.join(c for c in s if unicodedata.category(c) not in ('So', )) print(t) ...results in Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN. This may not be emoji-free enough, yet the • is technically a form of punctuation. So filter this as well # Filter symbols and punctuations. You may want 'Cc' as well, # to get rid of control characters. Beware that newlines are a # form of control-character. t = ''.join(c for c in s if unicodedata.category(c) not in ('So', 'Po')) print(t) And you get Wife Homeschooling Mom to 5 D Y I lover Small town living in MN
How to download pubmed articles and read them?
Im having trouble to save pubmed articles and read them. I've seen at this page here that there are some special files types but no one of them worked for me. I want to save them in a way that I can continuous using the keys to get the the data. I don't know if its possible use it if I save it as a text file. My code is this one: import sys from Bio import Entrez import re import os from Bio import Medline from Bio import SeqIO '''Class Crawler is responsable to browse the biological databases from DownloadArticles import DownloadArticles c = DownloadArticles() c.articles_dataset_list ''' class DownloadArticles(): def __init__(self): Entrez.email='myemail#gmail.com' self.dataC = self.saveArticlesFilesInXMLMode('pubmed', '26837606') '''Metodo 4 ler dado em forma de texto.''' def saveArticlesFilesInXMLMode(self,dbs, ids): net_handle = Entrez.efetch(db=dbs, id=ids, rettype="medline", retmode="txt") directory = "/dataset/Pubmed/DatasetArticles/"+ ids + ".fasta" # if not os.path.exists(directory): # os.makedirs(directory) # filename = directory + '/' # if not os.path.exists(filename): out_handle = open(directory, "w+") out_handle.write(net_handle.read()) out_handle.close() net_handle.close() print("Saved") print("Parsing...") record = SeqIO.read(directory, "fasta") print(record) return(record.read()) I'm getting this error: ValueError: No records found in handle Pease someone can help me? Now my code is like this, I am trying to do a function to save in .fasta like you did. And one to read the .fasta files like in the answer above. import sys from Bio import Entrez import re import os from Bio import Medline from Bio import SeqIO def save_Articles_Files(dbName, idNum, rettypeName): net_handle = Entrez.efetch(db=dbName, id=idNum, rettype=rettypeName, retmode="txt") filename = path + idNum + ".fasta" out_handle = open(filename, "w") out_handle.write(net_handle.read()) out_handle.close() net_handle.close() print("Saved") enter code here Entrez.email='myemail#gmail.com' dbName = 'pubmed' idNum = '26837606' rettypeName = "medline" path ="/run/media/Dropbox/codigos/Codes/"+dbName save_Articles_Files(dbName, idNum, rettypeName) But my function is not working I need some help please!
You're mixing up two concepts. 1) Entrez.efetch() is used to access NCBI. In your case you are downloading an article from Pubmed. The result that you get from net_handle.read() looks like: PMID- 26837606 OWN - NLM STAT- In-Process DA - 20160203 LR - 20160210 IS - 2045-2322 (Electronic) IS - 2045-2322 (Linking) VI - 6 DP - 2016 Feb 03 TI - Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia. PG - 20315 LID - 10.1038/srep20315 [doi] AB - Recently, CRISPR/Cas9 technology has emerged as a powerful approach for targeted genome modification in eukaryotic organisms from yeast to human cell lines. Its successful application in several plant species promises enormous potential for basic and applied plant research. However, extensive studies are still needed to assess this system in other important plant species, to broaden its fields of application and to improve methods. Here we showed that the CRISPR/Cas9 system is efficient in petunia (Petunia hybrid), an important ornamental plant and a model for comparative research. When PDS was used as target gene, transgenic shoot lines with albino phenotype accounted for 55.6%-87.5% of the total regenerated T0 Basta-resistant lines. A homozygous deletion close to 1 kb in length can be readily generated and identified in the first generation. A sequential transformation strategy--introducing Cas9 and sgRNA expression cassettes sequentially into petunia--can be used to make targeted mutations with short indels or chromosomal fragment deletions. Our results present a new plant species amenable to CRIPR/Cas9 technology and provide an alternative procedure for its exploitation. FAU - Zhang, Bin AU - Zhang B AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Yang, Xia AU - Yang X AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Yang, Chunping AU - Yang C AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Li, Mingyang AU - Li M AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. FAU - Guo, Yulong AU - Guo Y AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400716, China. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20160203 PL - England TA - Sci Rep JT - Scientific reports JID - 101563288 SB - IM PMC - PMC4738242 OID - NLM: PMC4738242 EDAT- 2016/02/04 06:00 MHDA- 2016/02/04 06:00 CRDT- 2016/02/04 06:00 PHST- 2015/09/21 [received] PHST- 2015/12/30 [accepted] AID - srep20315 [pii] AID - 10.1038/srep20315 [doi] PST - epublish SO - Sci Rep. 2016 Feb 3;6:20315. doi: 10.1038/srep20315. 2) SeqIO.read() is used to read and parse FASTA files. This is a format that is used to store sequences. A sequence in FASTA format is represented as a series of lines. The first line in a FASTA file starts with a ">" (greater-than) symbol. Following the initial line (used for a unique description of the sequence) is the actual sequence itself in standard one-letter code. As you can see, the result that you get back from Entrez.efetch() (which I pasted above) doesn't look like a FASTA file. So SeqIO.read() gives the error that it can't find any sequence records in the file.