Related
So far I was able to print out all albums by a person of my choosing using this
spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(client_id, client_secret))
results = spotify.artist_albums(posty_uri, album_type='album')
albums = results['items']
while results['next']:
results = spotify.next(results)
albums.extend(results['items'])
for album in albums:
print(album['name'])
I was trying to do a similar process for new_releases() by doing this
newReleases = spotify.new_releases()
test = newReleases['items']
but this throws me an error on the line test = newReleases['items']. If anyone is familiar with Spotipy and knows how to return things like release date, artist name, album name from new_releases() I would greatly appreciate it.
I'm a little confused because the documentation says that the new_releases method returns a list. In any event, it is a one-item dictionary which contains a list.
However that list contains dictionaries which seem a bit unwieldy, so I understand why you're asking this question.
You can make use of the collections.namedtuple data structure to make it easier to see the relevant information. I don't claim that this is the best way to transform this data, but it seems to me a decent way.
import collecdtions as co
# namedtuple data structure that will be easier to understand and use
Album = co.namedtuple(typename='Album',field_names=['album_name',
'artist_name',
'release_date'])
newReleases2 = [] # couldn't think of a better name
for album in newReleases['albums']['items']:
artist_sublist = []
for artist in album['artists']:
artist_sublist.append(artist['name'])
newReleases2.append(Album(album_name=album['name'],
artist_name=artist_sublist,
release_date=album['release_date']))
This results in the following list of namedtuples:
[Album(album_name='Only Wanna Be With You (Pokémon 25 Version)', artist_name=['Post Malone'], release_date='2021-02-25'),
Album(album_name='AP (Music from the film Boogie)', artist_name=['Pop Smoke'], release_date='2021-02-26'),
Album(album_name='Like This', artist_name=['2KBABY', 'Marshmello'], release_date='2021-02-26'),
Album(album_name='Go Big (From The Amazon Original Motion Picture Soundtrack Coming 2 America)', artist_name=['YG', 'Big Sean'], release_date='2021-02-26'),
Album(album_name='Here Comes The Shock', artist_name=['Green Day'], release_date='2021-02-21'),
Album(album_name='Spaceman', artist_name=['Nick Jonas'], release_date='2021-02-25'),
Album(album_name='Life Support', artist_name=['Madison Beer'], release_date='2021-02-26'),
Album(album_name="Drunk (And I Don't Wanna Go Home)", artist_name=['Elle King', 'Miranda Lambert'], release_date='2021-02-26'),
Album(album_name='PROBLEMA', artist_name=['Daddy Yankee'], release_date='2021-02-26'),
Album(album_name='Leave A Little Love', artist_name=['Alesso', 'Armin van Buuren'], release_date='2021-02-26'),
Album(album_name='Rotate', artist_name=['Becky G', 'Burna Boy'], release_date='2021-02-22'),
Album(album_name='BED', artist_name=['Joel Corry', 'RAYE', 'David Guetta'], release_date='2021-02-26'),
Album(album_name='A N N I V E R S A R Y (Deluxe)', artist_name=['Bryson Tiller'], release_date='2021-02-26'),
Album(album_name='Little Oblivions', artist_name=['Julien Baker'], release_date='2021-02-26'),
Album(album_name='Money Long (feat. 42 Dugg)', artist_name=['DDG', 'OG Parker'], release_date='2021-02-26'),
Album(album_name='El Madrileño', artist_name=['C. Tangana'], release_date='2021-02-26'),
Album(album_name='Skegee', artist_name=['JID'], release_date='2021-02-23'),
Album(album_name='Coyote Cry', artist_name=['Ian Munsick'], release_date='2021-02-26'),
Album(album_name='Rainforest', artist_name=['Noname'], release_date='2021-02-26'),
Album(album_name='The American Negro', artist_name=['Adrian Younge'], release_date='2021-02-26')]
If you wanted to see the artist(s) associated with the 11th album in this list, you could do this:
In [62]: newReleases2[10].artist_name
Out[62]: ['Becky G', 'Burna Boy']
Edit: in a comment on this answer, OP requested getting album cover as well.
Please see helper function, and slightly modified code below:
import os
import requests
def download_album_cover(url):
# helper function to download album cover
# using code from: https://stackoverflow.com/a/13137873/42346
download_path = os.getcwd() + os.sep + url.rsplit('/', 1)[-1]
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(download_path, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
return download_path
# modified data structure
Album = co.namedtuple(typename='Album',field_names=['album_name',
'album_cover',
'artist_name',
'release_date'])
# modified retrieval code
newReleases2 = []
for album in newReleases['albums']['items']:
album_cover = download_album_cover(album['images'][0]['url'])
artist_sublist = []
for artist in album['artists']:
artist_sublist.append(artist['name'])
newReleases2.append(Album(album_name=album['name'],
album_cover=album_cover,
artist_name=artist_sublist,
release_date=album['release_date']))
Result:
[Album(album_name='Scary Hours 2', album_cover='/home/adamcbernier/ab67616d0000b2738b20e4631fa15d3953528bbc', artist_name=['Drake'], release_date='2021-03-05'),
Album(album_name='Boogie: Original Motion Picture Soundtrack', album_cover='/home/adamcbernier/ab67616d0000b27395e532805e8c97be7a551e3a', artist_name=['Various Artists'], release_date='2021-03-05'),
Album(album_name='Hold On', album_cover='/home/adamcbernier/ab67616d0000b273f33d3618aca6b3cfdcd2fc43', artist_name=['Justin Bieber'], release_date='2021-03-05'),
Album(album_name='Serotonin', album_cover='/home/adamcbernier/ab67616d0000b2737fb30ee0638c764d6f3247d2', artist_name=['girl in red'], release_date='2021-03-03'),
Album(album_name='Leave The Door Open', album_cover='/home/adamcbernier/ab67616d0000b2736f9e6abbd6fa43ac3cdbeee0', artist_name=['Bruno Mars', 'Anderson .Paak', 'Silk Sonic'], release_date='2021-03-05'),
Album(album_name='Real As It Gets (feat. EST Gee)', album_cover='/home/adamcbernier/ab67616d0000b273f0f6f6144929a1ff72001f5e', artist_name=['Lil Baby', 'EST Gee'], release_date='2021-03-04'),
Album(album_name='Life’s A Mess II (with Clever & Post Malone)', album_cover='/home/adamcbernier/ab67616d0000b2732e8d23414fd0b81c35bdedea', artist_name=['Juice WRLD'], release_date='2021-03-05'),
Album(album_name='slower', album_cover='/home/adamcbernier/ab67616d0000b273b742c96d78d9091ce4a1c5c1', artist_name=['Tate McRae'], release_date='2021-03-03'),
Album(album_name='Sacrifice', album_cover='/home/adamcbernier/ab67616d0000b27398bfcce8be630dd5f2f346e4', artist_name=['Bebe Rexha'], release_date='2021-03-05'),
Album(album_name='Poster Girl', album_cover='/home/adamcbernier/ab67616d0000b273503b16348e47bc3c1c823eba', artist_name=['Zara Larsson'], release_date='2021-03-05'),
Album(album_name='Beautiful Mistakes (feat. Megan Thee Stallion)', album_cover='/home/adamcbernier/ab67616d0000b273787f41be59050c46f69db580', artist_name=['Maroon 5', 'Megan Thee Stallion'], release_date='2021-03-03'),
Album(album_name='Pay Your Way In Pain', album_cover='/home/adamcbernier/ab67616d0000b273a1e1b4608e1e04b40113e6e1', artist_name=['St. Vincent'], release_date='2021-03-04'),
Album(album_name='My Head is a Moshpit', album_cover='/home/adamcbernier/ab67616d0000b2733db806083e3b649f1d969a4e', artist_name=['Verzache'], release_date='2021-03-05'),
Album(album_name='When You See Yourself', album_cover='/home/adamcbernier/ab67616d0000b27377253620f08397c998d18d78', artist_name=['Kings of Leon'], release_date='2021-03-05'),
Album(album_name='Mis Manos', album_cover='/home/adamcbernier/ab67616d0000b273d7210e8d6986196b28d084ef', artist_name=['Camilo'], release_date='2021-03-04'),
Album(album_name='Retumban2', album_cover='/home/adamcbernier/ab67616d0000b2738a79a82236682469aecdbbdf', artist_name=['Ovi'], release_date='2021-03-05'),
Album(album_name='Take My Hand', album_cover='/home/adamcbernier/ab67616d0000b273b7839c3ba191de59f5d3a3d7', artist_name=['LP Giobbi'], release_date='2021-03-05'),
Album(album_name="Ma' G", album_cover='/home/adamcbernier/ab67616d0000b27351b5ebb959c37913ac61b033', artist_name=['J Balvin'], release_date='2021-02-28'),
Album(album_name='Aspen', album_cover='/home/adamcbernier/ab67616d0000b27387d1d17d16cf131765ce4be8', artist_name=['Young Dolph', 'Key Glock'], release_date='2021-03-05'),
Album(album_name='Only The Family - Lil Durk Presents: Loyal Bros', album_cover='/home/adamcbernier/ab67616d0000b273a3df38e11e978b34b47583d0', artist_name=['Only The Family'], release_date='2021-03-05')]
I received a CSV file that includes a combination of string and tuple elements and cannot find a way to parse it properly. Am I missing something obvious?
csvfile
"presentation_id","presentation_name","sectionId","sectionNumber","courseId","courseIdentifier","courseName","activity_id","activity_prompt","activity_content","solution","event_timestamp","answer_id","answer","isCorrect","userid","firstname","lastname","email","role"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","62d059e8-9ab4-41d4-9eb8-00ba67d9fac9","A blow to which side of the knee might tear the medial collateral ligament?","{"choices":["medial","lateral"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:54:16.000","7b5048e5-7460-49f8-a64a-763b7f62d771","{"solution":[1],"type":"MultipleChoice"}","1","57ba970d-d02b-4a10-a64d-56f02336ee08","Student","One","student1#example.com","Student"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{"solution":[3],"type":"MultipleChoice"}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2#example.com","Student"
where 1st row is titles, 2nd starts the data
with open(filepathcsv) as csvfile:
readCSV = csv.reader(csvfile)
for row in readCSV:
numcolumns = len(row)
print(numcolumns,": ",row)
yields:
20 : ['presentation_id', 'presentation_name', 'sectionId', 'sectionNumber', 'courseId', 'courseIdentifier', 'courseName', 'activity_id', 'activity_prompt', 'activity_content', 'solution', 'event_timestamp', 'answer_id', 'answer', 'isCorrect', 'userid', 'firstname', 'lastname', 'email', 'role']
25 : ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', '62d059e8-9ab4-41d4-9eb8-00ba67d9fac9', 'A blow to which side of the knee might tear the medial collateral ligament?', '{choices":["medial"', 'lateral]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:54:16.000', '7b5048e5-7460-49f8-a64a-763b7f62d771', '{solution":[1]', 'type:"MultipleChoice"}"', '1', '57ba970d-d02b-4a10-a64d-56f02336ee08', 'William', 'Muter', 'wmuter#umich.edu', 'Student']
27 : ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{choices":["right rotation"', 'left rotation', 'right lateral rotation', 'left lateral rotation]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{solution":[3]', 'type:"MultipleChoice"}"', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Noah', 'Willett', 'willettn#umich.edu', 'Student']
csv.reader is parsing each row differently because of complicated structure with embedded curly braced elements.
...but I expect 20 elements in each row.
The in the records, not the code. Your code works fine. To solve the problem you need to fix csv file because the fields with json content weren't serialised correctly.
Just change one quote sign " to two signs "" to escape them.
Here the example of fixed csv row.
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{""choices"":[""right rotation"",""left rotation"",""right lateral rotation"",""left lateral rotation""],""type"":""MultipleChoice""}","{""solution"":[1],""selectAll"":false,""type"":""MultipleChoice""}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{""solution"":[3],""type"":""MultipleChoice""}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2#example.com","Student"
And the result of your code after fix:
20 : ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}', '{"solution":[1],"selectAll":false,"type":"MultipleChoice"}', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{"solution":[3],"type":"MultipleChoice"}', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Student', 'Two', 'student2#example.com', 'Student']
Thank you all for your suggestions!
Also, my apologies, as I did not include the raw CSV file I was trying to parse (example here:)
"b5ae18d3-b6dd-4d0a-84fe-7c43df472571"|"Climate_Rapid_Change_W18.pdf"|"18563b1e-a467-44b3-aed7-3607a1acd712"|"001"|"c86c8c8d-dca6-41cd-a010-a83e40d93e75"|"CLIMATE 102"|"Extreme Weather"|"278c4561-c834-4343-a770-3f544966f633"|"Which European city is at the same latitude as Ann Arbor?"|"{"choices":["Stockholm, Sweden","Berlin, Germany","London, England","Paris, France","Madrid, Spain"],"type":"MultipleChoice"}"|"{"solution":[4],"selectAll":false,"type":"MultipleChoice"}"|"2019-01-31 22:11:08.000"|"81392cd3-28e9-4e2e-8a33-018104b1f4d1"|"{"solution":[3,4],"type":"MultipleChoice"}"|"0"|"2db10c95-b507-4211-8244-394361148b22"|"Student"|"One"|"student1#umich.edu"|"Student"
"ee73fdaf-a926-4899-b0f7-9b942f1b44ad"|"6-Elbow, Wrist, Hand W19"|"48539109-529e-4359-83b9-2ae81be0532c"|"001"|"3b5b5e49-1798-4eab-86d7-186cf59149b4"|"MOVESCI 230"|"Human Musculoskeletal Anatomy"|"fcd7c673-d944-48c3-8a09-f458e03f8c44"|"What is the name of this movement?"|"{"choices":["first phalangeal joint","first proximal interphalangeal joint","first distal interphalangeal joint","first interphalangeal joint"],"type":"MultipleChoice"}"|"{"solution":[3],"selectAll":false,"type":"MultipleChoice"}"|"2019-01-31 22:07:32.000"|"9016f36c-41f5-4e14-84a9-78eea682c802"|"{"solution":[3],"type":"MultipleChoice"}"|"1"|"7184708d-4dc7-42e0-b1ea-4aca51f00fcd"|"Student"|"Two"|"student2#umich.edu"|"Student"
You are correct that the problem was the form of the CSV file.
I changed readCSV = csv.reader(csvfile) to readCSV = csv.reader(csvfile, delimiter="|", quotechar='|')
I then took the resulting list and removed the extraneous quotation marks from each element.
The rest of the program now works properly.
Here is my code
#course registration
list_courses=[]
for line in open("courses.txt",'r').readlines():
list_courses.append(line.strip())
print ("Gathering course information from file: \n",list_courses)
close("courses.txt")
list_student=[]
for line in open("students.txt",'r').readlines():
list_student.append(line.strip())
print("Here is student info: \n",list_student)
close("students.txt")
this is giving me errors when I try to close the files. How do I close, I am basically reading contents of file and storing them in a list. Now later on want to close the open files.There I get error.
I edited the code as per suggestions below.
The new code is
list_courses=[]
with open("courses.txt",'r') as myfile1:
list_courses=myfile1.readlines()
list_courses=[x.strip() for x in list_courses]
print ("Gathering course information from file: \n",list_courses)
list_student=[]
with open("students.txt",'r') as myfile1:
list_student=myfile1.readlines()
list_student=[x.strip() for x in list_student]
print("Here is student info: \n",list_student)
The information in courses.txt is
cs101,C programming
cs102,Digital logic and design
cs103,Electrical engineering
cs231,IT networks
cs232,IT Workshop
cs233,IT programming
cs301,Compilers and automata
cs302,Operating Systems
cs303,Networks
cs401,Game Theory
cs402,Systems Programming
cs403,Automata
ec101,Digitization
ec102,Analog cicuit design
ec103,IP Telephony
ec201,Wireless Network
ec202,Microwave engineering
ec203,Antenna
ec301,Maths2
ec302,Theory of Circuits
ec303,PCB design
ec401,PLC programming
ec402,Scada
ec403,VLSI
When I run the code I get output
Gathering course information from file:
['cs101,C programming', 'cs102,Digital logic and design', 'cs103,Electrical engineering', 'cs231,IT networks', 'cs232,IT Workshop', 'cs233,IT programming', 'cs301,Compilers and automata', 'cs302,Operating Systems', 'cs303,Networks', 'cs401,Game Theory', 'cs402,Systems Programming', 'cs403,Automata', 'ec101,Digitization', 'ec102,Analog cicuit design', 'ec103,IP Telephony', 'ec201,Wireless Network', 'ec202,Microwave engineering', 'ec203,Antenna', 'ec301,Maths2', 'ec302,Theory of Circuits', 'ec303,PCB design', 'ec401,PLC programming', 'ec402,Scada', 'ec403,VLSI']
Instead of it what I want is the input cs101 from the first line of courses.txt to go in list_courses[0] and list_courses[1] to have c programming i.e.
list_courses[0]=cs101
list_courses[1]=C programming
So I tried methods where programe have taken a line and read the line stored that line as an element of list but there is a comma which separates two elements in courses.txt and comma separated values should be separate list elements.
This will work for you:
students=[]
subject=[]
with open("students.txt","r") as f:
for line in f.readlines():
eachl=line.split(",")
students.append(eachl[0])
subject.append(eachl[1][:-1])
you will get two lists containing student names and another one with subjects:
students list will look like:
['cs101', 'cs102', 'cs103', 'cs231', 'cs232', 'cs233', 'cs301', 'cs302', 'cs303', 'cs401', 'cs402', 'cs403', 'ec101', 'ec102', 'ec103', 'ec201', 'ec202', 'ec203', 'ec301', 'ec302', 'ec303', 'ec401', 'ec402', 'ec40`3]
subjects list will look like:
['C programming', 'Digital logic and design', 'Electrical engineering', 'IT networks', 'IT Workshop', 'IT programming', 'Compilers and automata', 'Operating Systems', 'Networks', 'Game Theory', 'Systems Programming', 'Automata', 'Digitization', 'Analog cicuit design', 'IP Telephony', 'Wireless Network', 'Microwave engineering', 'Antenna', 'Maths2', 'Theory of Circuits', 'PCB design', 'PLC programming', 'Scada', 'VLSI']
Why don’t you try
students = []
courses = []
open(“courses.txt”, “r”) as f
for line in f.readlines()
a, b = line.split(“,”)
students.append(a)
courses.append(b[:-1])
f.close()
This will produce two lists students and courses
I have this csv file with only two entries. Here it is:
Meat One,['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']
First one is title and second is a business headings.
Problem lies with entry two.
Here is my code:
import csv
with open('phonebookCOMPK-Directory.csv', "rt") as textfile:
reader = csv.reader(textfile)
for row in reader:
row5 = row[5].replace("[", "").replace("]", "")
listt = [(''.join(row5))]
print (listt[0])
it prints:
'Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers'
What i need to do is that i want to create a list containing these words and then print them like this using for loop to print every item separately:
Abattoirs
Exporters
Food Delivery
Butchers Retail
Meat Dealers-Retail
Meat Freezer
Meat Packers
Actually I am trying to reformat my current csv file and clean it so it can be more precise and understandable.
Complete 1st line of csv is this:
Meat One,+92-21-111163281,Al Shaheer Corporation,Retailers,2008,"['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']","[[' Outlets Address : Shop No. Z-10, Station Shopping Complex, MES Market, Malir-Cantt, Karachi. Landmarks : MES Market, Station Shopping Complex City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi. Landmarks : Nadra Chowrangi, Sky Garden, Tipu Sultan Road City : Karachi UAN : +92-21-111163281 '], ["" Outlets Address : Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi. Landmarks : Boat Basin, Jans Broast, Khayaban-e-Roomi City : Karachi UAN : +92-21-111163281 View Map ""], [' Outlets Address : Gulistan-e-Johar, Karachi. Landmarks : Perfume Chowk City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Tee Emm Mart, Creek Vista Appartments, Khayaban-e-Shaheen, Phase VIII, DHA, Karachi. Landmarks : Creek Vista Appartments, Nueplex Cinema, Tee Emm Mart, The Place City : Karachi Mobile : 0302-8333666 '], [' Outlets Address : Y-Block, DHA, Lahore. Landmarks : Y-Block City : Lahore UAN : +92-42-111163281 '], [' Outlets Address : Adj. PSO, Main Bhittai Road, Jinnah Supermarket, F-7 Markaz, Islamabad. Landmarks : Bhittai Road, Jinnah Super Market, PSO Petrol Pump City : Islamabad UAN : +92-51-111163281 ']]","Agriculture, fishing & Forestry > Farming equipment & services > Abattoirs in Pakistan"
First column is Name
Second column is Number
Third column is Owner
Forth column is Business type
Fifth column is Y.O.E
Sixth column is Business Headings
Seventh column is Outlets (List of lists containing every branch address)
Eighth column is classification
There is no restriction of using csv.reader, I am open to any technique available to clean my file.
Think of it in terms of two separate tasks:
Collect some data items from a ‘dirty’ source (this CSV file)
Store that data somewhere so that it’s easy to access and manipulate programmatically (according to what you want to do with it)
Processing dirty CSV
One way to do this is to have a function deserialize_business() to distill structured business information from each incoming line in your CSV. This function can be complex because that’s the nature of the task, but still it’s advisable to split it into self-containing smaller functions (such as get_outlets(), get_headings(), and so on). This function can return a dictionary but depending on what you want it can be a [named] tuple, a custom object, etc.
This function would be an ‘adapter’ for this particular CSV data source.
Example of deserialization function:
def deserialize_business(csv_line):
"""
Distills structured business information from given raw CSV line.
Returns a dictionary like {name, phone, owner,
btype, yoe, headings[], outlets[], category}.
"""
pieces = [piece.strip("[[\"\']] ") for piece in line.strip().split(',')]
name = pieces[0]
phone = pieces[1]
owner = pieces[2]
btype = pieces[3]
yoe = pieces[4]
# after yoe headings begin, until substring Outlets Address
headings = pieces[4:pieces.index("Outlets Address")]
# outlets go from substring Outlets Address until category
outlet_pieces = pieces[pieces.index("Outlets Address"):-1]
# combine each individual outlet information into a string
# and let ``deserialize_outlet()`` deal with that
raw_outlets = ', '.join(outlet_pieces).split("Outlets Address")
outlets = [deserialize_outlet(outlet) for outlet in raw_outlets]
# category is the last piece
category = pieces[-1]
return {
'name': name,
'phone': phone,
'owner': owner,
'btype': btype,
'yoe': yoe,
'headings': headings,
'outlets': outlets,
'category': category,
}
Example of calling it:
with open("phonebookCOMPK-Directory.csv") as f:
lineno = 0
for line in f:
lineno += 1
try:
business = deserialize_business(line)
except:
# Bad line formatting?
log.exception(u"Failed to deserialize line #%s!", lineno)
else:
# All is well
store_business(business)
Storing the data
You’ll have the store_business() function take your data structure and write it somewhere. Maybe it’ll be another CSV that’s better structured, maybe multiple CSVs, a JSON file, or you can make use of SQLite relational database facilities since Python has it built-in.
It all depends on what you want to do later.
Relational example
In this case your data would be split across multiple tables. (I’m using the word “table” but it can be a CSV file, although you can as well make use of an SQLite DB since Python has that built-in.)
Table identifying all possible business headings:
business heading ID, name
1, Abattoirs
2, Exporters
3, Food Delivery
4, Butchers Retail
5, Meat Dealers-Retail
6, Meat Freezer
7, Meat Packers
Table identifying all possible categories:
category ID, parent category, name
1, NULL, "Agriculture, fishing & Forestry"
2, 1, "Farming equipment & services"
3, 2, "Abattoirs in Pakistan"
Table identifying businesses:
business ID, name, phone, owner, type, yoe, category
1, Meat One, +92-21-111163281, Al Shaheer Corporation, Retailers, 2008, 3
Table describing their outlets:
business ID, city, address, landmarks, phone
1, Karachi UAN, "Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi", "Nadra Chowrangi, Sky Garden, Tipu Sultan Road", +92-21-111163281
1, Karachi UAN, "Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi", "Boat Basin, Jans Broast, Khayaban-e-Roomi", +92-21-111163281
Table describing their headings:
business ID, business heading ID
1, 1
1, 2
1, 3
…
Handling all this would require a complex store_business() function. It may be worth looking into SQLite and some ORM framework, if going with relational way of keeping the data.
You can just replace the line :
print(listt[0])
with :
print(*listt[0], sep='\n')
Perl Newbie here and looking for some help.
I have a directory of files and a "keywords" file which has the attributes to search for and the attribute type.
For example:
Keywords.txt
Attribute1 boolean
Attribute2 boolean
Attribute3 search_and_extract
Attribute4 chunk
For each file in the directory, I have to:
lookup the keywords.txt
search based on Attribute type
something like the below.
IF attribute_type = boolean THEN
search for attribute;
set found = Y if attribute found;
ELSIF attribute_type = search_and_extract THEN
extract string where attribute is Found
ELSIF attribute_type = chunk THEN
extract the complete chunk of paragraph where attribute is found.
This is what I have so far and I'm sure there is a more efficient way to do this.
I'm hoping someone can guide me in the right direction to do the above.
Thanks & regards,
SiMa
# Reads attributes from config file
# First set boolean attributes. IF keyword is found in text,
# variable flag is set to Y else N
# End Code: For each text file in directory loop.
# Run the below for each document.
use strict;
use warnings;
# open Doc
open(DOC_FILE,'Final_CLP.txt');
while(<DOC_FILE>) {
chomp;
# open the file
open(FILE,'attribute_config.txt');
while (<FILE>) {
chomp;
($attribute,$attribute_type) = split("\t");
$is_boolean = ($attribute_type eq "boolean") ? "N" : "Y";
# For each boolean attribute, check if the keyword exists
# in the file and return Y or N
if ($is_boolean eq "Y") {
print "Yes\n";
# search for keyword in doc and assign values
}
print "Attribute: $attribute\n";
print "Attribute_Type: $attribute_type\n";
print "is_boolean: $is_boolean\n";
print "-----------\n";
}
close(FILE);
}
close(DOC_FILE);
exit;
It is a good idea to start your specs/question with a story ("I have a ..."). But
such a story - whether true or made up, because you can't disclose the truth -
should give
a vivid picture of the situation/problem/task
the reason(s) why all the work must be done
definitions for uncommon(ly used)terms
So I'd start with: I'm working in a prison and have to scan the emails
of the inmates for
names (like "Al Capone") mentioned anywhere in the text; the director
wants to read those mails in toto
order lines (like "weapon: AK 4711 quantity: 14"); the ordnance
officer wants those info to calculate the amount of ammunition and
rack space needed
paragraphs containing 'family'-keywords like "wife", "child", ...;
the parson wants to prepare her sermons efficiently
Taken for itself, each of the terms "keyword" (~running text) and
"attribute" (~structured text) of may be 'clear', but if both are applied
to "the X I have to search for", things get mushy. Instead of general ("chunk")
and technical ("string") terms, you should use 'real-world' (line) and
specific (paragraph) words. Samples of your input:
From: Robin Hood
To: Scarface
Hi Scarface,
tell Al Capone to send a car to the prison gate on sunday.
For the riot we need:
weapon: AK 4711 quantity: 14
knife: Bowie quantity: 8
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Regards
Robin
and your expected output:
--- Robin.txt ----
keywords:
Al Capone: Yes
Billy the Kid: No
Scarface: Yes
order lines:
knife:
knife: Bowie quantity: 8
machine gun:
stinger rocket:
weapon:
weapon: AK 4711 quantity: 14
social relations paragaphs:
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Pseudo code should begin at the top level. If you start with
for each file in folder
load search list
process current file('s content) using search list
it's obvious that
load search list
for each file in folder
process current file using search list
would be much better.
Based on this story, examples, and top level plan, I would try to come
up with proof of concept code for a simplified version of the "process
current file('s content) using search list" task:
given file/text to search in and list of keywords/attributes
print file name
print "keywords:"
for each boolean item
print boolean item text
if found anywhere in whole text
print "Yes"
else
print "No"
print "order line:"
for each line item
print line item text
if found anywhere in whole text
print whole line
print "social relations paragaphs:"
for each paragraph
for each social relation item
if found
print paragraph
no need to check for other items
first implementation attempt:
use Modern::Perl;
#use English qw(-no_match_vars);
use English;
exit step_00();
sub step_00 {
# given file/text to search in
my $whole_text = <<"EOT";
From: Robin Hood
To: Scarface
Hi Scarface,
tell Al Capone to send a car to the prison gate on sunday.
For the riot we need:
weapon: AK 4711 quantity: 14
knife: Bowie quantity: 8
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Regards
Robin
EOT
# print file name
say "--- Robin.txt ---";
# print "keywords:"
say "keywords:";
# for each boolean item
for my $bi ("Al Capone", "Billy the Kid", "Scarface") {
# print boolean item text
printf " %s: ", $bi;
# if found anywhere in whole text
if ($whole_text =~ /$bi/) {
# print "Yes"
say "Yes";
# else
} else {
# print "No"
say "No";
}
}
# print "order line:"
say "order lines:";
# for each line item
for my $li ("knife", "machine gun", "stinger rocket", "weapon") {
# print line item text
# if found anywhere in whole text
if ($whole_text =~ /^$li.*$/m) {
# print whole line
say " ", $MATCH;
}
}
# print "social relations paragaphs:"
say "social relations paragaphs:";
# for each paragraph
for my $para (split /\n\n/, $whole_text) {
# for each social relation item
for my $sr ("wife", "son", "husband") {
# if found
if ($para =~ /$sr/) {
## if ($para =~ /\b$sr\b/) {
# print paragraph
say $para;
# no need to check for other items
last;
}
}
}
return 0;
}
output:
perl 16953439.pl
--- Robin.txt ---
keywords:
Al Capone: Yes
Billy the Kid: No
Scarface: Yes
order lines:
knife: Bowie quantity: 8
weapon: AK 4711 quantity: 14
social relations paragaphs:
tell Al Capone to send a car to the prison gate on sunday.
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Such (premature) code helps you to
clarify your specs (Should not-found keywords go into the output?
Is your search list really flat or should it be structured/grouped?)
check your assumptions about how to do things (Should the order line
search be done on the array of lines of thw whole text?)
identify topics for further research/rtfm (eg. regex (prison!))
plan your next steps (folder loop, read input file)
(in addition, people in the know will point out all my bad practices,
so you can avoid them from the start)
Good luck!