Scraping youtube playlist - python-3.x

I've been trying to write a python script which will fetch me the name of the songs contained in the playlist whose link will be provided. for eg.https://www.youtube.com/watch?v=foE1mO2yM04&list=RDGMEMYH9CUrFO7CfLJpaD7UR85wVMfoE1mO2yM04 from the terminal.
I've found out that names could be extracted by using "li" tag or "h4" tag.
I wrote the following code,
import sys
link = sys.argv[1]
from bs4 import BeautifulSoup
import requests
req = requests.get(link)
try:
req.raise_for_status()
except Exception as exc:
print('There was a problem:',exc)
soup = BeautifulSoup(req.text,"html.parser")
Then I tried using li-tag as:
i=soup.findAll('li')
print(type(i))
for o in i:
print(o.get('data-video-title'))
But it printed "None" those number of time. I belive it is not able to reach those li tags which contains data-video-title attribute.
Then I tried using div and h4 tags as,
for i in soup.findAll('div', attrs={'class':'playlist-video-description'}):
o = i.find('h4')
print(o.text)
But nothing happens again..

import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com/watch?v=foE1mO2yM04&list=RDGMEMYH9CUrFO7CfLJpaD7UR85wVMfoE1mO2yM04'
data = requests.get(url)
data = data.text
soup = BeautifulSoup(data)
h4 = soup.find_all("h4")
for h in h4:
print(h.text)
output:
Mike Posner - I Took A Pill In Ibiza (Seeb Remix) (Explicit)
Alan Walker - Faded
Calvin Harris - This Is What You Came For (Official Video) ft. Rihanna
Coldplay - Hymn For The Weekend (Official video)
Jonas Blue - Fast Car ft. Dakota
Calvin Harris & Disciples - How Deep Is Your Love
Galantis - No Money (Official Video)
Kungs vs Cookin’ on 3 Burners - This Girl
Clean Bandit - Rockabye ft. Sean Paul & Anne-Marie [Official Video]
Major Lazer - Light It Up (feat. Nyla & Fuse ODG) [Remix] (Official Lyric Video)
Robin Schulz - Sugar (feat. Francesco Yates) (OFFICIAL MUSIC VIDEO)
DJ Snake - Middle ft. Bipolar Sunshine
Jonas Blue - Perfect Strangers ft. JP Cooper
David Guetta ft. Zara Larsson - This One's For You (Music Video) (UEFA EURO 2016™ Official Song)
DJ Snake - Let Me Love You ft. Justin Bieber
Duke Dumont - Ocean Drive
Galantis - Runaway (U & I) (Official Video)
Sigala - Sweet Lovin' (Official Video) ft. Bryn Christopher
Martin Garrix - Animals (Official Video)
David Guetta & Showtek - Bad ft.Vassy (Lyrics Video)
DVBBS & Borgeous - TSUNAMI (Original Mix)
AronChupa - I'm an Albatraoz | OFFICIAL VIDEO
Lilly Wood & The Prick and Robin Schulz - Prayer In C (Robin Schulz Remix) (Official)
Kygo - Firestone ft. Conrad Sewell
DEAF KEV - Invincible [NCS Release]
Eiffel 65 - Blue (KNY Factory Remix)

Ok guys, I have figured out what was happening. My code was perfect and it works fine, the problem was that I was passing the link as an argument from the terminal and co-incidentally, the link contained some symbols which were interpreted in some other fashion for eg. ('&').
Now I am passing the link as a string in the terminal and everything works fine. So dumb yet time-consuming mistake.

Related

Trying to replace image extensions like "<filename>.<extension>" to "<filename>_resized.<extension>"

I'm trying to use this code in Python using regular expression to get all the image files (of types jpg, png and bmp) in my current folder and add a word "resized" inbetween the filename and the extension
Input
Batman - The Grey Ghost.png
Mom and Dad - Young.jpg
Expected Output
Batman - The Grey Ghost_resized.png
Mom and Dad - Young_resized.jpg
Query
But my output is not as expected. Somehow the 2nd letter of the extension is getting replaced. I have tried tutorials online, but didn't see one which answers my query. Any help would be appreciated.
Code:
import glob
import re
files=glob.glob('*.[jp][pn]g')+glob.glob('*.bmp')
for x in files:
new_file = re.sub(r'([a-z|0-9]).([jpb|pnm|ggp])$',r'\1_resized.\2',x)
print(new_file,' : ',x)
Code Output
Ma image scan - Copy.j\_resized.g : Ma image scan - Copy.jpg
Ma image scan.j\_resized.g : Ma image scan.jpg
Mom and Dad - Young.j\_resized.g : Mom and Dad - Young.jpg
PPF - SBI - 4.j\_resized.g : PPF - SBI - 4.jpg
when-youre-a-noob-programmer-and-you-think-your-loop-64102565.p\_resized.g : when-youre-a-noob-programmer-and-you-think-your-loop-64102565.png
Sample.b\_resized.p : Sample.bmp
Try this:
r'([a-zA-Z0-9_ -]+)\.(bmp|jpg|png)$'
Input:
Batman - The Grey Ghost.png
Output:
Batman - The Grey Ghost_resized.png
See live demo.

New to using Spotipy and Python3, I want to use new_releases() to print Artist name and the artist album

So far I was able to print out all albums by a person of my choosing using this
spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(client_id, client_secret))
results = spotify.artist_albums(posty_uri, album_type='album')
albums = results['items']
while results['next']:
results = spotify.next(results)
albums.extend(results['items'])
for album in albums:
print(album['name'])
I was trying to do a similar process for new_releases() by doing this
newReleases = spotify.new_releases()
test = newReleases['items']
but this throws me an error on the line test = newReleases['items']. If anyone is familiar with Spotipy and knows how to return things like release date, artist name, album name from new_releases() I would greatly appreciate it.
I'm a little confused because the documentation says that the new_releases method returns a list. In any event, it is a one-item dictionary which contains a list.
However that list contains dictionaries which seem a bit unwieldy, so I understand why you're asking this question.
You can make use of the collections.namedtuple data structure to make it easier to see the relevant information. I don't claim that this is the best way to transform this data, but it seems to me a decent way.
import collecdtions as co
# namedtuple data structure that will be easier to understand and use
Album = co.namedtuple(typename='Album',field_names=['album_name',
'artist_name',
'release_date'])
newReleases2 = [] # couldn't think of a better name
for album in newReleases['albums']['items']:
artist_sublist = []
for artist in album['artists']:
artist_sublist.append(artist['name'])
newReleases2.append(Album(album_name=album['name'],
artist_name=artist_sublist,
release_date=album['release_date']))
This results in the following list of namedtuples:
[Album(album_name='Only Wanna Be With You (Pokémon 25 Version)', artist_name=['Post Malone'], release_date='2021-02-25'),
Album(album_name='AP (Music from the film Boogie)', artist_name=['Pop Smoke'], release_date='2021-02-26'),
Album(album_name='Like This', artist_name=['2KBABY', 'Marshmello'], release_date='2021-02-26'),
Album(album_name='Go Big (From The Amazon Original Motion Picture Soundtrack Coming 2 America)', artist_name=['YG', 'Big Sean'], release_date='2021-02-26'),
Album(album_name='Here Comes The Shock', artist_name=['Green Day'], release_date='2021-02-21'),
Album(album_name='Spaceman', artist_name=['Nick Jonas'], release_date='2021-02-25'),
Album(album_name='Life Support', artist_name=['Madison Beer'], release_date='2021-02-26'),
Album(album_name="Drunk (And I Don't Wanna Go Home)", artist_name=['Elle King', 'Miranda Lambert'], release_date='2021-02-26'),
Album(album_name='PROBLEMA', artist_name=['Daddy Yankee'], release_date='2021-02-26'),
Album(album_name='Leave A Little Love', artist_name=['Alesso', 'Armin van Buuren'], release_date='2021-02-26'),
Album(album_name='Rotate', artist_name=['Becky G', 'Burna Boy'], release_date='2021-02-22'),
Album(album_name='BED', artist_name=['Joel Corry', 'RAYE', 'David Guetta'], release_date='2021-02-26'),
Album(album_name='A N N I V E R S A R Y (Deluxe)', artist_name=['Bryson Tiller'], release_date='2021-02-26'),
Album(album_name='Little Oblivions', artist_name=['Julien Baker'], release_date='2021-02-26'),
Album(album_name='Money Long (feat. 42 Dugg)', artist_name=['DDG', 'OG Parker'], release_date='2021-02-26'),
Album(album_name='El Madrileño', artist_name=['C. Tangana'], release_date='2021-02-26'),
Album(album_name='Skegee', artist_name=['JID'], release_date='2021-02-23'),
Album(album_name='Coyote Cry', artist_name=['Ian Munsick'], release_date='2021-02-26'),
Album(album_name='Rainforest', artist_name=['Noname'], release_date='2021-02-26'),
Album(album_name='The American Negro', artist_name=['Adrian Younge'], release_date='2021-02-26')]
If you wanted to see the artist(s) associated with the 11th album in this list, you could do this:
In [62]: newReleases2[10].artist_name
Out[62]: ['Becky G', 'Burna Boy']
Edit: in a comment on this answer, OP requested getting album cover as well.
Please see helper function, and slightly modified code below:
import os
import requests
def download_album_cover(url):
# helper function to download album cover
# using code from: https://stackoverflow.com/a/13137873/42346
download_path = os.getcwd() + os.sep + url.rsplit('/', 1)[-1]
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(download_path, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
return download_path
# modified data structure
Album = co.namedtuple(typename='Album',field_names=['album_name',
'album_cover',
'artist_name',
'release_date'])
# modified retrieval code
newReleases2 = []
for album in newReleases['albums']['items']:
album_cover = download_album_cover(album['images'][0]['url'])
artist_sublist = []
for artist in album['artists']:
artist_sublist.append(artist['name'])
newReleases2.append(Album(album_name=album['name'],
album_cover=album_cover,
artist_name=artist_sublist,
release_date=album['release_date']))
Result:
[Album(album_name='Scary Hours 2', album_cover='/home/adamcbernier/ab67616d0000b2738b20e4631fa15d3953528bbc', artist_name=['Drake'], release_date='2021-03-05'),
Album(album_name='Boogie: Original Motion Picture Soundtrack', album_cover='/home/adamcbernier/ab67616d0000b27395e532805e8c97be7a551e3a', artist_name=['Various Artists'], release_date='2021-03-05'),
Album(album_name='Hold On', album_cover='/home/adamcbernier/ab67616d0000b273f33d3618aca6b3cfdcd2fc43', artist_name=['Justin Bieber'], release_date='2021-03-05'),
Album(album_name='Serotonin', album_cover='/home/adamcbernier/ab67616d0000b2737fb30ee0638c764d6f3247d2', artist_name=['girl in red'], release_date='2021-03-03'),
Album(album_name='Leave The Door Open', album_cover='/home/adamcbernier/ab67616d0000b2736f9e6abbd6fa43ac3cdbeee0', artist_name=['Bruno Mars', 'Anderson .Paak', 'Silk Sonic'], release_date='2021-03-05'),
Album(album_name='Real As It Gets (feat. EST Gee)', album_cover='/home/adamcbernier/ab67616d0000b273f0f6f6144929a1ff72001f5e', artist_name=['Lil Baby', 'EST Gee'], release_date='2021-03-04'),
Album(album_name='Life’s A Mess II (with Clever & Post Malone)', album_cover='/home/adamcbernier/ab67616d0000b2732e8d23414fd0b81c35bdedea', artist_name=['Juice WRLD'], release_date='2021-03-05'),
Album(album_name='slower', album_cover='/home/adamcbernier/ab67616d0000b273b742c96d78d9091ce4a1c5c1', artist_name=['Tate McRae'], release_date='2021-03-03'),
Album(album_name='Sacrifice', album_cover='/home/adamcbernier/ab67616d0000b27398bfcce8be630dd5f2f346e4', artist_name=['Bebe Rexha'], release_date='2021-03-05'),
Album(album_name='Poster Girl', album_cover='/home/adamcbernier/ab67616d0000b273503b16348e47bc3c1c823eba', artist_name=['Zara Larsson'], release_date='2021-03-05'),
Album(album_name='Beautiful Mistakes (feat. Megan Thee Stallion)', album_cover='/home/adamcbernier/ab67616d0000b273787f41be59050c46f69db580', artist_name=['Maroon 5', 'Megan Thee Stallion'], release_date='2021-03-03'),
Album(album_name='Pay Your Way In Pain', album_cover='/home/adamcbernier/ab67616d0000b273a1e1b4608e1e04b40113e6e1', artist_name=['St. Vincent'], release_date='2021-03-04'),
Album(album_name='My Head is a Moshpit', album_cover='/home/adamcbernier/ab67616d0000b2733db806083e3b649f1d969a4e', artist_name=['Verzache'], release_date='2021-03-05'),
Album(album_name='When You See Yourself', album_cover='/home/adamcbernier/ab67616d0000b27377253620f08397c998d18d78', artist_name=['Kings of Leon'], release_date='2021-03-05'),
Album(album_name='Mis Manos', album_cover='/home/adamcbernier/ab67616d0000b273d7210e8d6986196b28d084ef', artist_name=['Camilo'], release_date='2021-03-04'),
Album(album_name='Retumban2', album_cover='/home/adamcbernier/ab67616d0000b2738a79a82236682469aecdbbdf', artist_name=['Ovi'], release_date='2021-03-05'),
Album(album_name='Take My Hand', album_cover='/home/adamcbernier/ab67616d0000b273b7839c3ba191de59f5d3a3d7', artist_name=['LP Giobbi'], release_date='2021-03-05'),
Album(album_name="Ma' G", album_cover='/home/adamcbernier/ab67616d0000b27351b5ebb959c37913ac61b033', artist_name=['J Balvin'], release_date='2021-02-28'),
Album(album_name='Aspen', album_cover='/home/adamcbernier/ab67616d0000b27387d1d17d16cf131765ce4be8', artist_name=['Young Dolph', 'Key Glock'], release_date='2021-03-05'),
Album(album_name='Only The Family - Lil Durk Presents: Loyal Bros', album_cover='/home/adamcbernier/ab67616d0000b273a3df38e11e978b34b47583d0', artist_name=['Only The Family'], release_date='2021-03-05')]

How to resolve pandas length error for rows/columns

I have raised the SO Question here and blessed to have an answer from #Scott Boston.
However i am raising another question about an error ValueError: Columns must be same length as key as i am reading a text file and all the rows/columns are not of same length, i tried googling but did not get an answer as i don't want them to be skipped.
Error
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
My pandas dataframe generator
#!/usr/bin/python3
import pandas as pd
#
cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False)
cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':'
df = cvc_file.set_index('cols').transpose() #set_index and transpose
print(df)
Result
$ ./read_cvc.py
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
cols ab ad an ed eg et en eck ell it id ig im ish ob og ock ut ub ug um un ud uck ush
0 cab bad ban bed beg bet den beck bell bit bid big dim fish cob bog dock but cub bug bum bun bud buck gush
1 dab dad can fed keg get hen deck cell fit did dig him dish gob cog lock cut hub dug gum fun cud duck hush
2 gab had fan led leg jet men neck dell hit hid fig rim wish job dog rock gut nub hug hum gun dud luck lush
3 jab lad man red peg let pen peck jell kit kid gig brim swish lob fog sock hut rub jug mum nun mud muck mush
4 lab mad pan wed NaN met ten check sell lit lid jig grim NaN mob hog tock jut sub lug sum pun spud puck rush
5 nab pad ran bled NaN net then fleck tell pit rid pig skim NaN rob jog block nut tub mug chum run stud suck blush
File contents
$ cat kids_cvc
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than
ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag
ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap
am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham
ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track
ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash
ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed
eg: beg, keg, leg, peg
et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret
en: den, hen, men, pen, ten, then, when
eck: beck, deck, neck, peck, check, fleck, speck, wreck
ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell
it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit
id: bid, did, hid, kid, lid, rid, skid, slid
ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig
im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim
ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip
ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick
ish: fish, dish, wish, swish
in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin
ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot
ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob
og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog
op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop
ock: dock, lock, rock, sock, tock, block, clock, flock, rock, shock, smock, stock
ut: but, cut, gut, hut, jut, nut, rut, shut
ub: cub, hub, nub, rub, sub, tub, grub, snub, stub
ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug
um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
Note:
It's making the first row/column as a master one which has 13 values and skipping all the columns which are more than 13 columns.
I couldn't figure out a pandas way to extend the columns, but converting the rows to a dictionary made things easier.
ss = '''
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
.......
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
'''.strip()
with open ('kids.cvc','w') as f: f.write(ss) # write data file
######################################
import pandas as pd
dd = {}
maxcnt=0
with open('kids.cvc') as f:
lines = f.readlines()
for line in lines:
line = line.strip() # remove \n
len1 = len(line) # words have leading space
line = line.replace(' ','')
cnt = len1 - len(line) # get word (space) count
if cnt > maxcnt: maxcnt = cnt # max word count
rec = line.split(':') # header : words
dd[rec[0]] = rec[1].split(',') # split words
for k in dd:
dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column
df = pd.DataFrame(dd) # convert dictionary to dataframe
print(df.to_string(index=False))
Output
ab at ad an ag ap am ack ash ed eg et en eck ell it id ig im ip ick ish in ot ob og op ock ut ub ug um un ud uck ush
cab bat bad ban bag cap bam back bash bed beg bet den beck bell bit bid big dim dip kick fish bin cot cob bog cop dock but cub bug bum bun bud buck gush
dab cat dad can gag gap dam hack cash fed keg get hen deck cell fit did dig him hip lick dish din dot gob cog hop lock cut hub dug gum fun cud duck hush
gab fat had fan hag lap ham jack dash led leg jet men neck dell hit hid fig rim lip nick wish fin got job dog mop rock gut nub hug hum gun dud luck lush
jab hat lad man lag map jam lack gash red peg let pen peck jell kit kid gig brim nip pick swish pin hot lob fog pop sock hut rub jug mum nun mud muck mush
lab mat mad pan nag nap ram pack hash wed met ten check sell lit lid jig grim rip sick sin jot mob hog top tock jut sub lug sum pun spud puck rush
nab pat pad ran rag rap yam rack lash bled net then fleck tell pit rid pig skim sip tick tin lot rob jog chop block nut tub mug chum run stud suck blush
tab rat sad tan sag sap clam sack mash bred pet when speck well sit skid rig slim tip wick win not sob log crop clock rut grub pug drum sun thud tuck brush
blab sat tad van tag tap cram tack rash fled set wreck yell wit slid wig swim zip brick chin pot blob blog drop flock shut snub rug glum spun yuck crush
crab vat glad clan wag yap scam black sash pled vet dwell knit zig trim chip chick grin rot glob clog flop rock stub tug plum stun chuck flush
grab brat plan brag zap slam crack clash sled wet shell quit twig whim clip click shin tot knob frog glop shock drug scum cluck slush
scab chat scan drag chap spam shack crash shed yet smell slit drip flick skin blot slob plop smock plug slum pluck
stab flat than flag clap swam snack flash fret spell spit flip quick spin knot snob shop stock slug stuck
slab gnat snag flap tram stack slash swell grip slick thin plot slop snug truck
spat stag slap wham quack smash ship stick twin shot stop
snap track skip thick slot
trap slip trick spot

unable to get rid of all emojis

I need help removing emojis. I looked at some other stackoverflow questions and this is what I am de but for some reason my code doesn't get rid of all the emojis
d= {'alexveachfashion': 'Fashion Style * Haute Couture * Wearable Tech * VR\n👓👜⌚👠\nSoundCloud is Live #alexveach\n👇New YouTube Episodes ▶️👇', 'andrewvng': 'Family | Fitness | Friends | Gym | Food', 'runvi.official': 'Accurate measurement via SMART insoles & real-time AI coaching. Improve your technique & BOOST your performance with every run.\nSoon on Kickstarter!', 'triing': 'Augmented Jewellery™️ • Montreal. Canada.', 'gedeanekenshima': 'Prof na Etec Albert Einstein, Mestranda em Automação e Controle de Processos, Engenheira de Controle e Automação, Técnica em Automação Industrial.', 'jetyourdaddy': '', 'lavonne_sun': '☄️🔭 ✨\n°●°。Visual Narrative\nA creative heart with a poetic soul.\n————————————\nPARSONS —— Design & Technology', 'taysearch': 'All the World’s Information At Your Fingertips. (Literally) Est. 1991🇺🇸 🎀#PrincessofSearch 🔎Sample 👇🏽 the Search Engine Here 🗽', 'hijewellery': 'Fine 3D printed jewellery for tech lovers #3dprintedjewelry #wearabletech #jewellery', 'yhanchristian': 'Estudante de Engenharia, Maker e viciado em café.', 'femka': 'Fashion Futurist + Fashion Tech Lab Founder #technoirlab + Fashion Designer / Parsons & CSM Grad / Obsessed with #fashiontech #future #cryptocurrency', 'sinhbisen': 'Creator, TRiiNG, augmented jewellery label ⭕️ Transhumanist ⭕️ Corporeal cartographer ⭕️', 'stellawearables': '#StellaWearables ✉️Info#StellaWearables.com Premium Wearable Technology That Monitors Personal Health & Environments ☀️🏝🏜🏔', 'ivoomi_india': 'We are the manufacturers of the most innovative technologies and user-friendly gadgets with a global presence.', 'bgutenschwager': "When it comes to life, it's all about the experience.\nGoogle Mapper 🗺\n360 Photographer 📷\nBrand Rep #QuickTutor", 'storiesofdesign': 'Putting stories at the heart of brands and businesses | Cornwall and London | #storiesofdesign', 'trume.jp': '草創期から国産ウオッチの製造に取り組み、挑戦を続けてきたエプソンが世界に放つ新ブランド「TRUME」(トゥルーム)。目指すのは、最先端技術でアナログウオッチを極めるブランド。', 'themarinesss': "I didn't choose the blog life, the blog life chose me | Aspiring Children's Book Author | www.slayathomemum.com", 'ayowearable': 'The world’s first light-based wearable that helps you sleep better, beat jet lag and have more energy! #goAYO Get yours at:', 'wearyourowntechs': 'Bringing you the latest trends, Current Products and Reviews of Wearable Technology. Discover how they can enhance your Life and Lifestyle', 'roxfordwatches': 'The Roxford | The most stylish and customizable fitness smartwatch. Tracks your steps/calories/dist/sleep. Comes with FOUR bands, and a travel case!', 'playertek': "Track your entire performance - every training session, every match. \nBecause the best players don't hide.", '_kate_hartman_': '', 'hmsmc10': 'Health & Wellness 🍎\nBoston, MA 🏙\nSuffolk MPA ‘17 🎓 \n.\nJust Strong Ambassador 🏋🏻\u200d♀️', 'gadgetxtreme': 'Dedicated to reviewing gadgets, technologies, internet products and breaking tech news. Follow us to see daily vblogs on all the disruptive tech..', 'freedom.journey.leader': '📍MN\n🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 \n📧Ashleybp5#gmail.com \n#homeschool #bossmom #builder #momlife', 'arts_food_life': 'Life through my phone.', 'medgizmo': 'Wearable #tech: #health #healthcare #wellness #gadgets #apps. Images/links provided as information resource only; doesn’t mean we endorse referenced', 'sawearables': 'The home of wearable tech in South Africa!\n--> #WearableTech #WearableTechnology #FitnessTech Find your wearable #', 'shop.mercury': 'Changing the way you charge.⚡️\nGet exclusive product discounts, and help us reach our goal below!🔋', 'invisawear': 'PRE-ORDERS NOW AVAILABLE! Get yours 25% OFF here: #girlboss #wearabletech'}
for key in d:
print("---with emojis----")
print(d[key])
print("---emojis removed----")
x=''.join(c for c in d[key] if c <= '\uFFFF')
print(x)
output example
---with emojis----
📍MN
🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿
📧Ashleybp5#gmail.com
#homeschool #bossmom #builder #momlife
---emojis removed----
MN
Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN.
Ashleybp5#gmail.com
#homeschool #bossmom #builder #momlife
---with emojis----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!🔋
---emojis removed----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!
There is no technical definition of what an "emoji" is. Various glyphs may be used to render printable characters, symbols, control characters and the like. What seems like an "emoji" to you may be part of normal script to others.
What you probably want to do is to look at the Unicode category of each character and filter out various categories. While this does not solve the "emoji"-definition-problem per se, you get much better control over what you are actually doing without removing, for example, literally all characters of languages spoken by 2/3 of the planet.
Instead of filtering out certain categories, you may filter everything except the lower- and uppercase letters (and numbers). However, be aware that ꙭ is not "the googly eyes emoji" but the CYRILLIC SMALL LETTER DOUBLE MONOCULAR O, which is a normal lowercase letter to millions of people.
For example:
import unicodedata
s = "🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿"
# Just filter category "symbol"
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', ))
print(t)
...results in
Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN.
This may not be emoji-free enough, yet the • is technically a form of punctuation. So filter this as well
# Filter symbols and punctuations. You may want 'Cc' as well,
# to get rid of control characters. Beware that newlines are a
# form of control-character.
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', 'Po'))
print(t)
And you get
Wife Homeschooling Mom to 5 D Y I lover Small town living in MN

How to download pubmed articles and read them?

Im having trouble to save pubmed articles and read them. I've seen at this page here that there are some special files types but no one of them worked for me. I want to save them in a way that I can continuous using the keys to get the the data. I don't know if its possible use it if I save it as a text file. My code is this one:
import sys
from Bio import Entrez
import re
import os
from Bio import Medline
from Bio import SeqIO
'''Class Crawler is responsable to browse the biological databases
from DownloadArticles import DownloadArticles
c = DownloadArticles()
c.articles_dataset_list
'''
class DownloadArticles():
def __init__(self):
Entrez.email='myemail#gmail.com'
self.dataC = self.saveArticlesFilesInXMLMode('pubmed', '26837606')
'''Metodo 4 ler dado em forma de texto.'''
def saveArticlesFilesInXMLMode(self,dbs, ids):
net_handle = Entrez.efetch(db=dbs, id=ids, rettype="medline", retmode="txt")
directory = "/dataset/Pubmed/DatasetArticles/"+ ids + ".fasta"
# if not os.path.exists(directory):
# os.makedirs(directory)
# filename = directory + '/'
# if not os.path.exists(filename):
out_handle = open(directory, "w+")
out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print("Saved")
print("Parsing...")
record = SeqIO.read(directory, "fasta")
print(record)
return(record.read())
I'm getting this error: ValueError: No records found in handle
Pease someone can help me?
Now my code is like this, I am trying to do a function to save in .fasta like you did. And one to read the .fasta files like in the answer above.
import sys
from Bio import Entrez
import re
import os
from Bio import Medline
from Bio import SeqIO
def save_Articles_Files(dbName, idNum, rettypeName):
net_handle = Entrez.efetch(db=dbName, id=idNum, rettype=rettypeName, retmode="txt")
filename = path + idNum + ".fasta"
out_handle = open(filename, "w")
out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print("Saved")
enter code here
Entrez.email='myemail#gmail.com'
dbName = 'pubmed'
idNum = '26837606'
rettypeName = "medline"
path ="/run/media/Dropbox/codigos/Codes/"+dbName
save_Articles_Files(dbName, idNum, rettypeName)
But my function is not working I need some help please!
You're mixing up two concepts.
1) Entrez.efetch() is used to access NCBI. In your case you are downloading an article from Pubmed. The result that you get from net_handle.read() looks like:
PMID- 26837606
OWN - NLM
STAT- In-Process
DA - 20160203
LR - 20160210
IS - 2045-2322 (Electronic)
IS - 2045-2322 (Linking)
VI - 6
DP - 2016 Feb 03
TI - Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia.
PG - 20315
LID - 10.1038/srep20315 [doi]
AB - Recently, CRISPR/Cas9 technology has emerged as a powerful approach for targeted
genome modification in eukaryotic organisms from yeast to human cell lines. Its
successful application in several plant species promises enormous potential for
basic and applied plant research. However, extensive studies are still needed to
assess this system in other important plant species, to broaden its fields of
application and to improve methods. Here we showed that the CRISPR/Cas9 system is
efficient in petunia (Petunia hybrid), an important ornamental plant and a model
for comparative research. When PDS was used as target gene, transgenic shoot
lines with albino phenotype accounted for 55.6%-87.5% of the total regenerated T0
Basta-resistant lines. A homozygous deletion close to 1 kb in length can be
readily generated and identified in the first generation. A sequential
transformation strategy--introducing Cas9 and sgRNA expression cassettes
sequentially into petunia--can be used to make targeted mutations with short
indels or chromosomal fragment deletions. Our results present a new plant species
amenable to CRIPR/Cas9 technology and provide an alternative procedure for its
exploitation.
FAU - Zhang, Bin
AU - Zhang B
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Yang, Xia
AU - Yang X
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Yang, Chunping
AU - Yang C
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Li, Mingyang
AU - Li M
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
FAU - Guo, Yulong
AU - Guo Y
AD - Chongqing Engineering Research Centre for Floriculture, Key Laboratory of
Horticulture Science for Southern Mountainous Regions, Ministry of Education,
College of Horticulture and Landscape Architecture, Southwest University,
Chongqing 400716, China.
LA - eng
PT - Journal Article
PT - Research Support, Non-U.S. Gov't
DEP - 20160203
PL - England
TA - Sci Rep
JT - Scientific reports
JID - 101563288
SB - IM
PMC - PMC4738242
OID - NLM: PMC4738242
EDAT- 2016/02/04 06:00
MHDA- 2016/02/04 06:00
CRDT- 2016/02/04 06:00
PHST- 2015/09/21 [received]
PHST- 2015/12/30 [accepted]
AID - srep20315 [pii]
AID - 10.1038/srep20315 [doi]
PST - epublish
SO - Sci Rep. 2016 Feb 3;6:20315. doi: 10.1038/srep20315.
2) SeqIO.read() is used to read and parse FASTA files. This is a format that is used to store sequences. A sequence in FASTA format is represented as a series of lines. The first line in a FASTA file starts with a ">" (greater-than) symbol. Following the initial line (used for a unique description of the sequence) is the actual sequence itself in standard one-letter code.
As you can see, the result that you get back from Entrez.efetch() (which I pasted above) doesn't look like a FASTA file. So SeqIO.read() gives the error that it can't find any sequence records in the file.

Resources