Related
So far I was able to print out all albums by a person of my choosing using this
spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(client_id, client_secret))
results = spotify.artist_albums(posty_uri, album_type='album')
albums = results['items']
while results['next']:
results = spotify.next(results)
albums.extend(results['items'])
for album in albums:
print(album['name'])
I was trying to do a similar process for new_releases() by doing this
newReleases = spotify.new_releases()
test = newReleases['items']
but this throws me an error on the line test = newReleases['items']. If anyone is familiar with Spotipy and knows how to return things like release date, artist name, album name from new_releases() I would greatly appreciate it.
I'm a little confused because the documentation says that the new_releases method returns a list. In any event, it is a one-item dictionary which contains a list.
However that list contains dictionaries which seem a bit unwieldy, so I understand why you're asking this question.
You can make use of the collections.namedtuple data structure to make it easier to see the relevant information. I don't claim that this is the best way to transform this data, but it seems to me a decent way.
import collecdtions as co
# namedtuple data structure that will be easier to understand and use
Album = co.namedtuple(typename='Album',field_names=['album_name',
'artist_name',
'release_date'])
newReleases2 = [] # couldn't think of a better name
for album in newReleases['albums']['items']:
artist_sublist = []
for artist in album['artists']:
artist_sublist.append(artist['name'])
newReleases2.append(Album(album_name=album['name'],
artist_name=artist_sublist,
release_date=album['release_date']))
This results in the following list of namedtuples:
[Album(album_name='Only Wanna Be With You (Pokémon 25 Version)', artist_name=['Post Malone'], release_date='2021-02-25'),
Album(album_name='AP (Music from the film Boogie)', artist_name=['Pop Smoke'], release_date='2021-02-26'),
Album(album_name='Like This', artist_name=['2KBABY', 'Marshmello'], release_date='2021-02-26'),
Album(album_name='Go Big (From The Amazon Original Motion Picture Soundtrack Coming 2 America)', artist_name=['YG', 'Big Sean'], release_date='2021-02-26'),
Album(album_name='Here Comes The Shock', artist_name=['Green Day'], release_date='2021-02-21'),
Album(album_name='Spaceman', artist_name=['Nick Jonas'], release_date='2021-02-25'),
Album(album_name='Life Support', artist_name=['Madison Beer'], release_date='2021-02-26'),
Album(album_name="Drunk (And I Don't Wanna Go Home)", artist_name=['Elle King', 'Miranda Lambert'], release_date='2021-02-26'),
Album(album_name='PROBLEMA', artist_name=['Daddy Yankee'], release_date='2021-02-26'),
Album(album_name='Leave A Little Love', artist_name=['Alesso', 'Armin van Buuren'], release_date='2021-02-26'),
Album(album_name='Rotate', artist_name=['Becky G', 'Burna Boy'], release_date='2021-02-22'),
Album(album_name='BED', artist_name=['Joel Corry', 'RAYE', 'David Guetta'], release_date='2021-02-26'),
Album(album_name='A N N I V E R S A R Y (Deluxe)', artist_name=['Bryson Tiller'], release_date='2021-02-26'),
Album(album_name='Little Oblivions', artist_name=['Julien Baker'], release_date='2021-02-26'),
Album(album_name='Money Long (feat. 42 Dugg)', artist_name=['DDG', 'OG Parker'], release_date='2021-02-26'),
Album(album_name='El Madrileño', artist_name=['C. Tangana'], release_date='2021-02-26'),
Album(album_name='Skegee', artist_name=['JID'], release_date='2021-02-23'),
Album(album_name='Coyote Cry', artist_name=['Ian Munsick'], release_date='2021-02-26'),
Album(album_name='Rainforest', artist_name=['Noname'], release_date='2021-02-26'),
Album(album_name='The American Negro', artist_name=['Adrian Younge'], release_date='2021-02-26')]
If you wanted to see the artist(s) associated with the 11th album in this list, you could do this:
In [62]: newReleases2[10].artist_name
Out[62]: ['Becky G', 'Burna Boy']
Edit: in a comment on this answer, OP requested getting album cover as well.
Please see helper function, and slightly modified code below:
import os
import requests
def download_album_cover(url):
# helper function to download album cover
# using code from: https://stackoverflow.com/a/13137873/42346
download_path = os.getcwd() + os.sep + url.rsplit('/', 1)[-1]
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(download_path, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
return download_path
# modified data structure
Album = co.namedtuple(typename='Album',field_names=['album_name',
'album_cover',
'artist_name',
'release_date'])
# modified retrieval code
newReleases2 = []
for album in newReleases['albums']['items']:
album_cover = download_album_cover(album['images'][0]['url'])
artist_sublist = []
for artist in album['artists']:
artist_sublist.append(artist['name'])
newReleases2.append(Album(album_name=album['name'],
album_cover=album_cover,
artist_name=artist_sublist,
release_date=album['release_date']))
Result:
[Album(album_name='Scary Hours 2', album_cover='/home/adamcbernier/ab67616d0000b2738b20e4631fa15d3953528bbc', artist_name=['Drake'], release_date='2021-03-05'),
Album(album_name='Boogie: Original Motion Picture Soundtrack', album_cover='/home/adamcbernier/ab67616d0000b27395e532805e8c97be7a551e3a', artist_name=['Various Artists'], release_date='2021-03-05'),
Album(album_name='Hold On', album_cover='/home/adamcbernier/ab67616d0000b273f33d3618aca6b3cfdcd2fc43', artist_name=['Justin Bieber'], release_date='2021-03-05'),
Album(album_name='Serotonin', album_cover='/home/adamcbernier/ab67616d0000b2737fb30ee0638c764d6f3247d2', artist_name=['girl in red'], release_date='2021-03-03'),
Album(album_name='Leave The Door Open', album_cover='/home/adamcbernier/ab67616d0000b2736f9e6abbd6fa43ac3cdbeee0', artist_name=['Bruno Mars', 'Anderson .Paak', 'Silk Sonic'], release_date='2021-03-05'),
Album(album_name='Real As It Gets (feat. EST Gee)', album_cover='/home/adamcbernier/ab67616d0000b273f0f6f6144929a1ff72001f5e', artist_name=['Lil Baby', 'EST Gee'], release_date='2021-03-04'),
Album(album_name='Life’s A Mess II (with Clever & Post Malone)', album_cover='/home/adamcbernier/ab67616d0000b2732e8d23414fd0b81c35bdedea', artist_name=['Juice WRLD'], release_date='2021-03-05'),
Album(album_name='slower', album_cover='/home/adamcbernier/ab67616d0000b273b742c96d78d9091ce4a1c5c1', artist_name=['Tate McRae'], release_date='2021-03-03'),
Album(album_name='Sacrifice', album_cover='/home/adamcbernier/ab67616d0000b27398bfcce8be630dd5f2f346e4', artist_name=['Bebe Rexha'], release_date='2021-03-05'),
Album(album_name='Poster Girl', album_cover='/home/adamcbernier/ab67616d0000b273503b16348e47bc3c1c823eba', artist_name=['Zara Larsson'], release_date='2021-03-05'),
Album(album_name='Beautiful Mistakes (feat. Megan Thee Stallion)', album_cover='/home/adamcbernier/ab67616d0000b273787f41be59050c46f69db580', artist_name=['Maroon 5', 'Megan Thee Stallion'], release_date='2021-03-03'),
Album(album_name='Pay Your Way In Pain', album_cover='/home/adamcbernier/ab67616d0000b273a1e1b4608e1e04b40113e6e1', artist_name=['St. Vincent'], release_date='2021-03-04'),
Album(album_name='My Head is a Moshpit', album_cover='/home/adamcbernier/ab67616d0000b2733db806083e3b649f1d969a4e', artist_name=['Verzache'], release_date='2021-03-05'),
Album(album_name='When You See Yourself', album_cover='/home/adamcbernier/ab67616d0000b27377253620f08397c998d18d78', artist_name=['Kings of Leon'], release_date='2021-03-05'),
Album(album_name='Mis Manos', album_cover='/home/adamcbernier/ab67616d0000b273d7210e8d6986196b28d084ef', artist_name=['Camilo'], release_date='2021-03-04'),
Album(album_name='Retumban2', album_cover='/home/adamcbernier/ab67616d0000b2738a79a82236682469aecdbbdf', artist_name=['Ovi'], release_date='2021-03-05'),
Album(album_name='Take My Hand', album_cover='/home/adamcbernier/ab67616d0000b273b7839c3ba191de59f5d3a3d7', artist_name=['LP Giobbi'], release_date='2021-03-05'),
Album(album_name="Ma' G", album_cover='/home/adamcbernier/ab67616d0000b27351b5ebb959c37913ac61b033', artist_name=['J Balvin'], release_date='2021-02-28'),
Album(album_name='Aspen', album_cover='/home/adamcbernier/ab67616d0000b27387d1d17d16cf131765ce4be8', artist_name=['Young Dolph', 'Key Glock'], release_date='2021-03-05'),
Album(album_name='Only The Family - Lil Durk Presents: Loyal Bros', album_cover='/home/adamcbernier/ab67616d0000b273a3df38e11e978b34b47583d0', artist_name=['Only The Family'], release_date='2021-03-05')]
I am trying to parse raw data results from a text file into an organised tuple but having trouble getting it right.
My raw data from the textfile looks something like this:
Episode Cumulative Results
EpisodeXD0281119
Date collected21/10/2019
Time collected10:00
Real time PCR for M. tuberculosis (Xpert MTB/Rif Ultra):
PCR result Mycobacterium tuberculosis complex NOT detected
Bacterial Culture:
Bottle: Type FAN Aerobic Plus
Result No growth after 5 days
EpisodeST32423457
Date collected23/02/2019
Time collected09:00
Gram Stain:
Neutrophils Occasional
Gram positive bacilli Moderate (2+)
Gram negative bacilli Numerous (3+)
Gram negative cocci Moderate (2+)
EpisodeST23423457
Date collected23/02/2019
Time collected09:00
Bacterial Culture:
A heavy growth of
1) Klebsiella pneumoniae subsp pneumoniae (KLEPP)
ensure that this organism does not spread in the ward/unit.
A heavy growth of
2) Enterococcus species (ENCSP)
Antibiotic/Culture KLEPP ENCSP
Trimethoprim-sulfam R
Ampicillin / Amoxic R S
Amoxicillin-clavula R
Ciprofloxacin R
Cefuroxime (Parente R
Cefuroxime (Oral) R
Cefotaxime / Ceftri R
Ceftazidime R
Cefepime R
Gentamicin S
Piperacillin/tazoba R
Ertapenem R
Imipenem S
Meropenem R
S - Sensitive ; I - Intermediate ; R - Resistant ; SDD - Sensitive Dose Dependant
Comment for organism KLEPP:
** Please note: this is a carbapenem-RESISTANT organism. Although some
carbapenems may appear susceptible in vitro, these agents should NOT be used as
MONOTHERAPY in the treatment of this patient. **
Please isolate this patient and practice strict contact precautions. Please
inform Infection Prevention and Control as contact screening might be
indicated.
For further advice on the treatment of this isolate, please contact.
The currently available laboratory methods for performing colistin
susceptibility results are unreliable and may not predict clinical outcome.
Based on published data and clinical experience, colistin is a suitable
therapeutic alternative for carbapenem resistant Acinetobacter spp, as well as
carbapenem resistant Enterobacteriaceae. If colistin is clinically indicated,
please carefully assess clinical response.
EpisodeST234234057
Date collected23/02/2019
Time collected09:00
Authorised by xxxx on 27/02/2019 at 10:35
MIC by E-test:
Organism Klebsiella pneumoniae (KLEPN)
Antibiotic Meropenem
MIC corrected 4 ug/mL
MIC interpretation Resistant
Antibiotic Imipenem
MIC corrected 1 ug/mL
MIC interpretation Sensitive
Antibiotic Ertapenem
MIC corrected 2 ug/mL
MIC interpretation Resistant
EpisodeST23423493
Date collected18/02/2019
Time collected03:15
Potassium 4.4 mmol/L 3.5 - 5.1
EpisodeST45445293
Date collected18/02/2019
Time collected03:15
Creatinine 32 L umol/L 49 - 90
eGFR (MDRD formula) >60 mL/min/1.73 m2
Creatinine 28 L umol/L 49 - 90
eGFR (MDRD formula) >60 mL/min/1.73 m2
Essentially the pattern is that ALL information starts with a unique EPISODE NUMBER and follows with a DATE and TIME and then the result of whatever test. This is the pattern throughout.
What I am trying to parse into my tuple is the date, time, name of the test and the result - whatever it might be. I have the following code:
with open(filename) as f:
data = f.read()
data = data.splitlines()
DS = namedtuple('DS', 'date time name value')
parsed = list()
idx_date = [i for i, r in enumerate(data) if r.strip().startswith('Date')]
for start, stop in zip(idx_date[:-1], idx_date[1:]):
chunk = data[start:stop]
date = time = name = value = None
for row in chunk:
if not row: continue
row = row.strip()
if row.startswith('Episode'): continue
if row.startswith('Date'):
_, date = row.split()
date = date.replace('collected', '')
elif row.startswith('Time'):
_, time = row.split()
time = time.replace('collected', '')
else:
name, value, *_ = row.split()
print (name)
parsed.append(DS(date, time, name, value))
print(parsed)
My error is that I am unable to find a way to parse the heterogeneity of the test RESULT in a way that I can use later, for example for the tuple DS ('DS', 'date time name value'):
DATE = 21/10/2019
TIME = 10:00
NAME = Real time PCR for M tuberculosis or Potassium
RESULT = Negative or 4.7
Any advice appreciated. I have hit a brick wall.
I´m still new to Node.js and currently developing a small app for my kitchen. This app can scan receipts and uses OCR to extract the data. For OCR extracting I´m using the ocr-space web api. Afterwards I need to parse the raw text to a JSON structure and send it to my database. I´ve also tested this receipt using AWS textract, which gave me a even poorer result.
Currently I´m struggling at the parsing part using RegEx in Node.js.
Here is my JSON structure which I use to parse the receipt data:
receipt = {
title: 'title of receipt'
items: [
'item1',
'item2',
'item3'
],
preparation: 'preparation text'
}
As most of the receipts have a items part and afterwards a preparation part my general approach so far looks like the following:
Searching for keywords like 'items' and 'preparation' in the raw text
Parse the text between these keywords
Do further string processing, like missing whitespaces, triming etc.
This approach doesn´t work if these keywords are missing. Take for example the following receipt, where I´m struggle to parse it into my JSON structure. The receipt is in German and there are no corresponding keywords ('items' or 'Zutaten', 'preparation' or 'Zubereitung').
Following information from the raw text are necessary:
title: line 1
items: line 2 - 8
preparation: line 9 until end
Do you have any hints or tips how to come closer to the solution? Or do you have any other ideas how to manage such situations accordingly?
Quinoa-Brot
30 g Chiasamen
350 g Quinoa
70 ml Olivenöl
1/2 TL Speisenatron
1 Prise Salz
Saft von 1/2 Zitrone
1 Handvoll Sonnenblumenkerne
30 g Schwarzkümmelsamen
1 Chiasamen mit 100 ml Wasser
verrühren und 30 Minuten quel-
len lassen. Den Ofen auf 200 oc
vorheizen, eine kleine Kastenform
mit Backpapier auslegen.
2 Quinoa mit der dreifachen
Menge Wasser in einen Topf ge-
ben, einmal aufkochen und dann
3 Minuten köcheln lassen - die
Quinoa wird so nur teilweise ge-
gegart. In ein Sieb abgießen, kalt
abschrecken und anschließend
gut abtropfen lassen.
Between each line there is a \n tabulator.
The parsed receipt should look like this:
receipt = {
title: 'Quinoa-Brot',
items: [
'30 g Chiasamen',
'350 g Quinoa',
'70 ml Olivenöl',
'1/2 TL Speisenatron',
'1 Prise Salz',
'Saft von 1/2 Zitrone'
'1 Handvoll Sonnenblumenkerne'
'30 g Schwarzkümmelsamen',
],
preparation: '1 Chiasamen mit 100 ml Wasser verrühren und 30 Minuten quellen lassen. Den Ofen auf 200 oc vorheizen, eine kleine Kastenform mit Backpapier auslegen. 2 Quinoa mit der dreifachen Menge Wasser in einen Topf geben, einmal aufkochen und dann 3 Minuten köcheln lassen - die Quinoa wird so nur teilweise gegegart. In ein Sieb abgießen, kalt abschrecken und anschließend gut abtropfen lassen.'
}
Pattern matching solutions like RegExp don't sound suitable for this sort of a categorization problem. You might want to consider clustering (k-means, etc.) - training a model to differentiate between ingredients and instructions. This can be done by labeling a number of recipes (the more the better), or using unsupervised ML by clustering line by line.
If you need to stick to RegExp for some reason, you keeping track of repeated words. Weak methodology: ingredient names (Chiasemen, Quinoa, ) will be referenced in the instructions, so you can match on multiline to find where the same word is repeated later on:
(?<=\b| )([^ ]+)(?= |$).+(\1)
If you do run this on a loop, plus logic, you can find pairs ingredient-instruction pairs, and work through the document with silhouette information.
You might be able to take advantage of ingredient lines containing numeric data like numbers or words like "piece(s), sticks, leaves" which you might store in a dictionary. That can enrich the word boundary input matches.
I would reconsider using RegExp here at all...
I am using Spacy models to find modal verb (MD) from following languages.
en
de
fr
es
ru
From tag_map.py of en and de it is clear that "VerbType": "mod" is a modal verb. But tag_map.py for fr, es and ru do not have any such property. How can I find out Modal verb in these 3 languages(on which properties should I focused on)? Also is there any generic way that I can find out the Modal verb of any language released by Spacy in the future let say greek is released?
Note: I am not looking for high-level tags but I am looking for low-level tags. In Spacy terminology, I am preferring token.tag_ property.
I don't think there is currently a language-independent way to do so. But modal words are closed-classed words, so just checking if token.tag_ == 'AUX' (although in German, modal verbs are tagged as VERB) and if token.lemma_ is in a set of modal verbs should do the job.
Currently I am working on a similar project.
For English, you can use token.dep_ == 'AUX' and token.tag_ = 'MD'.
for instance:
for token in doc:
if token.dep_ == 'aux' and token.tag_ == 'MD':
print(token.text)
For German the closest I get is as like this:
sent = 'Ich muss auf den Lehrer hören'
nlp = spacy.load("de_core_news_sm")
doc = nlp(sent)
print(doc.text)
for token in doc:
print(token.text, token.pos_, token.dep_, token.morph)
Ich muss auf den Lehrer hören
Ich PRON sb Case=Nom|Number=Sing|Person=1|PronType=Prs
muss VERB ROOT Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin
auf ADP mo
den DET nk Case=Acc|Definite=Def|Gender=Masc|Number=Sing|PronType=Art
Lehrer NOUN nk Case=Acc|Gender=Masc|Number=Sing
hören VERB oc VerbForm=Inf
for token in doc:
if token.morph.get('VerbForm') == ['Fin']:
print(token.text)
muss
You can also create a stoplist for German modals and add query to look in token.lemma_
modals = ['mussen']
for token in doc:
if token.lemma_ in modals:
print(token.text)
muss
I am not sure about other languages yet stoplist approach might work.
I am needing to convert a list of lists of strings into a three column table where the first column is 1 space longer than the longest string. I have figured out how to identify the longest string and how long it is, but getting the table to form has been quite tricky. Here is the program with the lists in it and it shows you that the longest one is 26 characters long.
def main():
mycities = [['Cape Girardeau', 'MO', '63780'], ['Columbia', 'MO', '65201'],
['Kansas City', 'MO', '64108'], ['Rolla', 'MO', '65402'],
['Springfield', 'MO', '65897'], ['St Joseph', 'MO', '64504'],
['St Louis', 'MO', '63111'], ['Ames', 'IA', '50010'], ['Enid',
'OK', '73773'], ['West Palm Beach', 'FL', '33412'],
['International Falls', 'MN', '56649'], ['Frostbite Falls',
'MN', '56650']]
col_width = max(len(item) for sub in mycities for item in sub)
print(col_width)
main()
Now I am just needing to get it to print off like this:
Cape Girardeau MO 63780
Columbia MO 65201
Kansas City MO 64108
Springfield MO 65897
St Joseph MO 64504
St Louis MO 63111
Ames IA 50010
Enid OK 73773
West Palm Beach FL 33412
International Falls MN 56649
Frostbite Falls MN 56650
You're off to the right start -- as an example given the specific structure to the lists you have, you can use the col_width you calculated to determine the number of spaces you'd need after the name of each city to append to the end of each city name:
for city in mycities:
# append the string with the number of spaces required
city_padded = city[0] + " " + " "*(col_width-len(city[0]))
print(city_padded + city[1] + " " + city[2])
Given your example, this will produce:
Cape Girardeau MO 63780
Columbia MO 65201
Kansas City MO 64108
Rolla MO 65402
Springfield MO 65897
St Joseph MO 64504
St Louis MO 63111
Ames IA 50010
Enid OK 73773
West Palm Beach FL 33412
International Falls MN 56649
Frostbite Falls MN 56650
Note in the original version of your question, you are missing commas in your sublists in your mycities variable, for which I've added in an edit.
As a side note, it is convention in Python that words be separated by underscores in variable names for readability, so you might rename mycities to my_cities.
pep8 ref: (https://www.python.org/dev/peps/pep-0008/#function-and-variable-names)
"String name".ljust(26) will add spaces to the end of your string. For example,
Ames.ljust(26) will result in 'Ames (22 spaces here)', and then the next column will print after. If you are not sure what the longest city will be, you could replace the 26 with len(cities[-1]) after ordering the cities in a list by length. To do this, you can do sortedCities = sorted(cityListVariable, key=len)
def main():
cities = ['Cape Girardeau, MO 63780', 'Columbia, MO 65201', 'Kansas City, MO 64108', 'Rolla, MO 65402',
'Springfield, MO 65897', 'St Joseph, MO 64504', 'St Louis, MO 63111', 'Ames, IA 50010',
'Enid, OK 73773', 'West Palm Beach, FL 33412', 'International Falls, MN 56649', 'Frostbite Falls, MN 56650',
'Charlotte, NC 28241', 'Upper Marlboro, MD 20774', 'Camdenton, MO 65020', 'San Fransisco, CA 94016'] #create list of information
for x in cities:
col = x.split(",")
if(len(col) == 2):
city = col[0].strip()
temp = col[1].strip()
else:
city = x[:15].strip()
temp = c[15:].strip()
state = temp[:2]
zipCode = int(temp[-5::])
print("%-20s\t%s\t%d"%(city, state, zipCode))
main()