Reformat csv file using python? - python-3.x

I have this csv file with only two entries. Here it is:
Meat One,['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']
First one is title and second is a business headings.
Problem lies with entry two.
Here is my code:
import csv
with open('phonebookCOMPK-Directory.csv', "rt") as textfile:
reader = csv.reader(textfile)
for row in reader:
row5 = row[5].replace("[", "").replace("]", "")
listt = [(''.join(row5))]
print (listt[0])
it prints:
'Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers'
What i need to do is that i want to create a list containing these words and then print them like this using for loop to print every item separately:
Abattoirs
Exporters
Food Delivery
Butchers Retail
Meat Dealers-Retail
Meat Freezer
Meat Packers
Actually I am trying to reformat my current csv file and clean it so it can be more precise and understandable.
Complete 1st line of csv is this:
Meat One,+92-21-111163281,Al Shaheer Corporation,Retailers,2008,"['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']","[[' Outlets Address : Shop No. Z-10, Station Shopping Complex, MES Market, Malir-Cantt, Karachi. Landmarks : MES Market, Station Shopping Complex City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi. Landmarks : Nadra Chowrangi, Sky Garden, Tipu Sultan Road City : Karachi UAN : +92-21-111163281 '], ["" Outlets Address : Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi. Landmarks : Boat Basin, Jans Broast, Khayaban-e-Roomi City : Karachi UAN : +92-21-111163281 View Map ""], [' Outlets Address : Gulistan-e-Johar, Karachi. Landmarks : Perfume Chowk City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Tee Emm Mart, Creek Vista Appartments, Khayaban-e-Shaheen, Phase VIII, DHA, Karachi. Landmarks : Creek Vista Appartments, Nueplex Cinema, Tee Emm Mart, The Place City : Karachi Mobile : 0302-8333666 '], [' Outlets Address : Y-Block, DHA, Lahore. Landmarks : Y-Block City : Lahore UAN : +92-42-111163281 '], [' Outlets Address : Adj. PSO, Main Bhittai Road, Jinnah Supermarket, F-7 Markaz, Islamabad. Landmarks : Bhittai Road, Jinnah Super Market, PSO Petrol Pump City : Islamabad UAN : +92-51-111163281 ']]","Agriculture, fishing & Forestry > Farming equipment & services > Abattoirs in Pakistan"
First column is Name
Second column is Number
Third column is Owner
Forth column is Business type
Fifth column is Y.O.E
Sixth column is Business Headings
Seventh column is Outlets (List of lists containing every branch address)
Eighth column is classification
There is no restriction of using csv.reader, I am open to any technique available to clean my file.

Think of it in terms of two separate tasks:
Collect some data items from a ‘dirty’ source (this CSV file)
Store that data somewhere so that it’s easy to access and manipulate programmatically (according to what you want to do with it)
Processing dirty CSV
One way to do this is to have a function deserialize_business() to distill structured business information from each incoming line in your CSV. This function can be complex because that’s the nature of the task, but still it’s advisable to split it into self-containing smaller functions (such as get_outlets(), get_headings(), and so on). This function can return a dictionary but depending on what you want it can be a [named] tuple, a custom object, etc.
This function would be an ‘adapter’ for this particular CSV data source.
Example of deserialization function:
def deserialize_business(csv_line):
"""
Distills structured business information from given raw CSV line.
Returns a dictionary like {name, phone, owner,
btype, yoe, headings[], outlets[], category}.
"""
pieces = [piece.strip("[[\"\']] ") for piece in line.strip().split(',')]
name = pieces[0]
phone = pieces[1]
owner = pieces[2]
btype = pieces[3]
yoe = pieces[4]
# after yoe headings begin, until substring Outlets Address
headings = pieces[4:pieces.index("Outlets Address")]
# outlets go from substring Outlets Address until category
outlet_pieces = pieces[pieces.index("Outlets Address"):-1]
# combine each individual outlet information into a string
# and let ``deserialize_outlet()`` deal with that
raw_outlets = ', '.join(outlet_pieces).split("Outlets Address")
outlets = [deserialize_outlet(outlet) for outlet in raw_outlets]
# category is the last piece
category = pieces[-1]
return {
'name': name,
'phone': phone,
'owner': owner,
'btype': btype,
'yoe': yoe,
'headings': headings,
'outlets': outlets,
'category': category,
}
Example of calling it:
with open("phonebookCOMPK-Directory.csv") as f:
lineno = 0
for line in f:
lineno += 1
try:
business = deserialize_business(line)
except:
# Bad line formatting?
log.exception(u"Failed to deserialize line #%s!", lineno)
else:
# All is well
store_business(business)
Storing the data
You’ll have the store_business() function take your data structure and write it somewhere. Maybe it’ll be another CSV that’s better structured, maybe multiple CSVs, a JSON file, or you can make use of SQLite relational database facilities since Python has it built-in.
It all depends on what you want to do later.
Relational example
In this case your data would be split across multiple tables. (I’m using the word “table” but it can be a CSV file, although you can as well make use of an SQLite DB since Python has that built-in.)
Table identifying all possible business headings:
business heading ID, name
1, Abattoirs
2, Exporters
3, Food Delivery
4, Butchers Retail
5, Meat Dealers-Retail
6, Meat Freezer
7, Meat Packers
Table identifying all possible categories:
category ID, parent category, name
1, NULL, "Agriculture, fishing & Forestry"
2, 1, "Farming equipment & services"
3, 2, "Abattoirs in Pakistan"
Table identifying businesses:
business ID, name, phone, owner, type, yoe, category
1, Meat One, +92-21-111163281, Al Shaheer Corporation, Retailers, 2008, 3
Table describing their outlets:
business ID, city, address, landmarks, phone
1, Karachi UAN, "Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi", "Nadra Chowrangi, Sky Garden, Tipu Sultan Road", +92-21-111163281
1, Karachi UAN, "Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi", "Boat Basin, Jans Broast, Khayaban-e-Roomi", +92-21-111163281
Table describing their headings:
business ID, business heading ID
1, 1
1, 2
1, 3
…
Handling all this would require a complex store_business() function. It may be worth looking into SQLite and some ORM framework, if going with relational way of keeping the data.

You can just replace the line :
print(listt[0])
with :
print(*listt[0], sep='\n')

Related

Need help in aligning the content in python for self automation

I am trying to create an anime series search using the tool anilistpython, but I am not able to ignore the newline character in the plot tag and need help in align the output in a proper view format.
Tried code :
from AnilistPython import Anilist
import pandas as pd
import re
# db access online
anilist = Anilist()
# User input
ani_search = anilist.get_anime(input('Enter the Anime Name\t:\t'), manual_select=True)
df = ani_search
# for Genres split
cate = []
for gen in df['genres']:
cate.append(gen)
cate1 = (' , '.join(cate))
# for Checking Episode
if df['airing_status'] == 'RELEASING':
print('Ongoing')
x ='Ongoing'
y = df['next_airing_ep']
print(y['episode'])
y1 = y['episode']
elif df['airing_status'] == 'FINISHED':
print('Ended')
x = 'Ended'
y = df['airing_episodes']
print(y)
y1 = y
else:
print('None')
# print other details
print(f"\nTitle_Name\t:\t{df['name_english']}\nRomji_Title\t:\t{df['name_romaji']}\nPlot\t:\t{re.split('<br>', df['desc'])}\nAiring_Format\t:\t{df['airing_format']}\nStatus\t:\t{x}\nEpisodes_Count\t:\t{y1}\nGenres\t:\t{cate1}\nRating\t:\t{df['average_score']}/100\n")
The output it generated :
Enter the Anime Name : Bleach
1. BLEACH
2. BEACH
3. Akkanbee da
Please select the anime that you are searching for in number: 1
Title_Name : Bleach
Romji_Title : BLEACH
Plot : ["Ichigo Kurosaki is a rather normal high school student apart from the fact he has the ability to see ghosts. This ability never impacted his life in a major way until the day he encounters the Shinigami Kuchiki Rukia, who saves him and his family's lives from a Hollow, a corrupt spirit that devours human souls. \n", '', '\nWounded during the fight against the Hollow, Rukia chooses the only option available to defeat the monster and passes her Shinigami powers to Ichigo. Now forced to act as a substitute until Rukia recovers, Ichigo hunts down the Hollows that plague his town. \n\n\n']
Airing_Format : TV
Status : Ended
Episodes_Count : 366
Genres : Action , Adventure , Supernatural
Rating : 76/100
I am looking for the format to look like this:
Title_Name : Bleach
Romji_Title : BLEACH
Plot : Ichigo Kurosaki is a rather normal high school student apart from the fact he has the ability to see ghosts. This ability never impacted his life in a major way until the day he encounters the Shinigami Kuchiki Rukia, who saves him and his family's lives from a Hollow, a corrupt spirit that devours human souls. Wounded during the fight against the Hollow, Rukia chooses the only option available to defeat the monster and passes her Shinigami powers to Ichigo. Now forced to act as a substitute until Rukia recovers, Ichigo hunts down the Hollows that plague his town.
Airing_Format : TV
Status : Ended
Episodes_Count : 366
Genres : Action , Adventure , Supernatural
Rating : 76/100

Best way to handle element of dict that has multiple key/value pairs inside it

[{'id': 2, 'Registered Address': 'Line 1: 1 Any Street Line 2: Any locale City: Any City Region / State: Any Region Postcode / Zip code: BA2 2SA Country: GB Jurisdiction: Any Jurisdiction'}]
I have the above read into a dataframe and that is the output so far. The issue is I need to break out the individual elements - due to names of places etc the values may or may not have spaces in them - looking at the above my keys are Line 1, Line 2, City, Region / State, Postcode / Zip, Country, Jurisdiction.
Output required for the "Registered Address"-'key'is the keys and values
"Line 1": "1 Any Street"
"Line 2": "Any locale"
"City": "Any City"
"Region / State": "Any Region"
"Postcode / Zip code": "BA2 2SA"
"Country": "GB"
"Jurisdiction": "Any Jurisdiction"
Just struggling to find a way to get to the end result.I have tried to pop out and use urllib.prse but fell short - is anypone able to point me in the best direction please?
Tried to write a code that generalizes your question, but there were some limitations, regarding your data format. Anyway I would do this:
def address_spliter(my_data, my_keys):
address_data = my_data[0]['Registered Address']
key_address = {}
for i,k in enumerate(keys):
print(k)
if k == 'Jurisdiction:':
key_address[k] = address_data.split('Jurisdiction:')[1].removeprefix(' ').removesuffix(' ')
else:
key_address[k] = address_data.split(k)[1].split(keys[i+1])[0].removeprefix(' ').removesuffix(' ')
return key_address
were you can call this function like this:
my_data = [{'id': 2, 'Registered Address': 'Line 1: 1 Any Street Line 2: Any locale City: Any City Region / State: Any Region Postcode / Zip code: BA2 2SA Country: GB Jurisdiction: Any Jurisdiction'}]
and
my_keys = ['Line 1:','Line 2:','City:', 'Region / State:', 'Postcode / Zip code:', 'Country:', 'Jurisdiction']
As you can see It'll work if only the sequence of keys is not changed. But anyway, you can work around this idea and change it base on your problem accordingly if it doesn't go as expected.

Need to get actor name out of the Json file

i want to get actor name out of this json file page_title and then match this with database i tried using nltk and spacy but there i have to train data. Do i have train for each and ever sentence i have more than 100k sentences. If i sit to train data it will takes a month or more. Is there any way that i can dump K_actor database to train spacy, nltk or any other way.
{"page_title": "Sonakshi Sinha To Auction Sketch Of Buddha To Help Migrant Labourers", "description": "Sonakshi Sinha took to Instagram to share a timelapse video of a sketch of Buddha that she made to auction to raise funds for migrant workers affected by Covid-19 crisis. ", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589815261_1589815196489_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/sonakshi-sinha-to-auction-sketch-of-buddha-to-help-migrant-labourers-2626123.html"}
{"page_title": "Anushka Sharma Calls Virat Kohli 'A Liar' on IG Live, Nushrat Bharucha Gets Propositioned on Twitter", "description": "In an Instagram live interaction with Sunil Chhetri, Virat Kohli was left embarrassed after Anushka Sharma called him a 'jhootha' from behind the camera. This and more in today's wrap.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589813980_1589813933996_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/anushka-sharma-calls-virat-kohli-a-liar-on-ig-live-nushrat-bharucha-gets-propositioned-on-twitter-2626093.html"}
{"page_title": "Ranveer Singh Shares a Throwback to the Days When WWF was His Life", "description": "Ranveer Singh shared a throwback picture from his childhood where he could be seen posing in front of a poster of WWE legend Hulk Hogan.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589812401_screenshot_20200518-195906_chrome_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/ranveer-singh-shares-a-throwback-to-the-days-when-wwf-was-his-life-2626067.html"}
{"page_title": "Salman Khan's Love Song 'Tere Bina' Gets 26 Million Views", "description": "Salman Khan's song Tere Bina, which was launched a few days ago, had garnered 12 million views within 24 hours. As it continues to trend, it has garnered 26 million views in less than a week.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589099778_screenshot_20200510-135934_chrome_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/salman-khans-love-song-tere-bina-gets-26-million-views-2626077.html"}
{"page_title": "Yash And Radhika Pandit Pose With Their Kids For a Perfect Family Picture", "description": "Kannada actor Yash tied the knot with actress Radhika Pandit in 2016. The couple shares two kids together.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589812187_yash.jpg", "post_url": "https://www.news18.com/news/movies/yash-and-radhika-pandit-pose-with-their-kids-for-a-perfect-family-picture-2626055.html"}
{"page_title": "Malaika Arora Shares Beach Vacay Boomerang With Hopeful Note", "description": "Malaika Arora shared a throwback boomerang from a beach vacation where she could be seen playfully spinning. She also shared a hopeful message along with it.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589810291_screenshot_20200518-192603_chrome_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/malaika-arora-shares-beach-vacay-boomerang-with-hopeful-note-2626019.html"}
{"page_title": "Actor Nawazuddin Siddiqui's Wife Aaliya Sends Legal Notice To Him Demanding Divorce, Maintenance", "description": "The notice was sent to the ", "image_url": "https://images.news18.com/ibnlive/uploads/2019/10/Nawazuddin-Siddiqui.jpg", "post_url": "https://www.news18.com/news/movies/actor-nawazuddin-siddiquis-wife-aaliya-sends-legal-notice-to-him-demanding-divorce-maintenance-2626035.html"}
{"page_title": "Lisa Haydon Celebrates Son Zack\u2019s 3rd Birthday With Homemade Cake And 'Spiderman' Surprise", "description": "Lisa Haydon took to Instagram to share some glimpses from the special day. In the pictures, we can spot a man wearing a Spiderman costume.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589807960_lisa-rey.jpg", "post_url": "https://www.news18.com/news/movies/lisa-haydon-celebrates-son-zacks-3rd-birthday-with-homemade-cake-and-spiderman-surprise-2625953.html"}
{"page_title": "Chiranjeevi Recreates Old Picture with Wife, Says 'Time Has Changed'", "description": "Chiranjeevi was last seen in historical-drama Sye Raa Narasimha Reddy. He was shooting for his next film, Acharya, before the coronavirus lockdown.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589808242_pjimage.jpg", "post_url": "https://www.news18.com/news/movies/chiranjeevi-recreates-old-picture-with-wife-says-time-has-changed-2625973.html"}
{"page_title": "Amitabh Bachchan, Rishi Kapoor\u2019s Pout Selfie Recreated By Abhishek, Ranbir is Priceless", "description": "A throwback picture that has gone viral on the internet shows Ranbir Kapoor and Abhishek Bachchan recreating a selfie of their fathers Rishi Kapoor and Amitabh Bachchan.", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589807772_screenshot_20200518-184521_chrome_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/amitabh-bachchan-rishi-kapoors-pout-selfie-recreated-by-abhishek-ranbir-is-priceless-2625867.html"}
Something that you can do is to create an annoter script wherein you can replace actor names with '###' or some other string (which will be replaced later with actor names (entities) for training).
I trained 68K data/sentences in 9 hrs with my i3 laptop. You can dump data like this and the output file can be used for training the model.
That will save time and also give you ready made training data format for SpaCy.
from nltk import word_tokenize
from pandas import read_csv
import re
import os.path
def annot(Label, entity, textlist) :
finaldict = []
for text_token in textlist:
textbk=text_token
for value in entity:
#if entity has multi tokens
text=textbk
text=text_token
text=str(text).replace('###',value)
text=text.lower()
text = re.sub('[^a-zA-Z0-9\n\.]',' ', text)
if len(word_tokenize(value))<2:
#print('I am here')
newtext=word_tokenize(text)
traindata=[]
prev_length=0
prev_pos=0
k=0
while k != len(newtext):
if k == 0:
prev_pos=0
prev_length=len(newtext[k])
if value.lower()== str(newtext[k]):
ent=Label
tup=(prev_pos,prev_length,ent)
traindata.append(tup)
else:
pass
else :
prev_pos=prev_length+1
prev_length=prev_length+len(newtext[k])+1
if value.lower()==str(newtext[k]):
ent=Label
tup=(prev_pos,prev_length,ent)
traindata.append(tup)
else:
pass
k=k+1
mydict={'entities':traindata}
finaldict.append((text,mydict))
else:
traindata=[]
try:
begin=text.index(value.lower())
ent=Label
tup=(begin,len(value.lower()),ent)
traindata.append(tup)
except ValueError:
pass
mydict={'entities':traindata}
finaldict.append((text,mydict))
return finaldict
def getEntities(csv_file, column) :
df = read_csv(csv_file)
return df[column].to_list()
def getSentences(file_name) :
with open(file_name) as file1 :
sentences = [line1.rstrip('\n') for line1 in file1]
return sentences
def saveData (data, filename, path) :
filename = os.path.join(path, filename)
with open(filename, 'a') as file :
for sent in data :
file.write("{}\n".format(sent))
ents = getEntities(csv_file, column_name) #Actor names in your case
entities = [ent for ent in ents if str(ent) != 'nan']
sentences = getSentences(filepathandname) #Considering you have the sentences in a text file
label = 'ACTOR_NAMES'
data = annot(label, entities, sentences)
saveData(data, 'train_data.txt', path)
Hope this is a relevant answer for your question.

How to parse complicated CSV file

I received a CSV file that includes a combination of string and tuple elements and cannot find a way to parse it properly. Am I missing something obvious?
csvfile
"presentation_id","presentation_name","sectionId","sectionNumber","courseId","courseIdentifier","courseName","activity_id","activity_prompt","activity_content","solution","event_timestamp","answer_id","answer","isCorrect","userid","firstname","lastname","email","role"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","62d059e8-9ab4-41d4-9eb8-00ba67d9fac9","A blow to which side of the knee might tear the medial collateral ligament?","{"choices":["medial","lateral"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:54:16.000","7b5048e5-7460-49f8-a64a-763b7f62d771","{"solution":[1],"type":"MultipleChoice"}","1","57ba970d-d02b-4a10-a64d-56f02336ee08","Student","One","student1#example.com","Student"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{"solution":[3],"type":"MultipleChoice"}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2#example.com","Student"
where 1st row is titles, 2nd starts the data
with open(filepathcsv) as csvfile:
readCSV = csv.reader(csvfile)
for row in readCSV:
numcolumns = len(row)
print(numcolumns,": ",row)
yields:
20 : ['presentation_id', 'presentation_name', 'sectionId', 'sectionNumber', 'courseId', 'courseIdentifier', 'courseName', 'activity_id', 'activity_prompt', 'activity_content', 'solution', 'event_timestamp', 'answer_id', 'answer', 'isCorrect', 'userid', 'firstname', 'lastname', 'email', 'role']
25 : ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', '62d059e8-9ab4-41d4-9eb8-00ba67d9fac9', 'A blow to which side of the knee might tear the medial collateral ligament?', '{choices":["medial"', 'lateral]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:54:16.000', '7b5048e5-7460-49f8-a64a-763b7f62d771', '{solution":[1]', 'type:"MultipleChoice"}"', '1', '57ba970d-d02b-4a10-a64d-56f02336ee08', 'William', 'Muter', 'wmuter#umich.edu', 'Student']
27 : ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{choices":["right rotation"', 'left rotation', 'right lateral rotation', 'left lateral rotation]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{solution":[3]', 'type:"MultipleChoice"}"', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Noah', 'Willett', 'willettn#umich.edu', 'Student']
csv.reader is parsing each row differently because of complicated structure with embedded curly braced elements.
...but I expect 20 elements in each row.
The in the records, not the code. Your code works fine. To solve the problem you need to fix csv file because the fields with json content weren't serialised correctly.
Just change one quote sign " to two signs "" to escape them.
Here the example of fixed csv row.
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{""choices"":[""right rotation"",""left rotation"",""right lateral rotation"",""left lateral rotation""],""type"":""MultipleChoice""}","{""solution"":[1],""selectAll"":false,""type"":""MultipleChoice""}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{""solution"":[3],""type"":""MultipleChoice""}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2#example.com","Student"
And the result of your code after fix:
20 : ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}', '{"solution":[1],"selectAll":false,"type":"MultipleChoice"}', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{"solution":[3],"type":"MultipleChoice"}', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Student', 'Two', 'student2#example.com', 'Student']
Thank you all for your suggestions!
Also, my apologies, as I did not include the raw CSV file I was trying to parse (example here:)
"b5ae18d3-b6dd-4d0a-84fe-7c43df472571"|"Climate_Rapid_Change_W18.pdf"|"18563b1e-a467-44b3-aed7-3607a1acd712"|"001"|"c86c8c8d-dca6-41cd-a010-a83e40d93e75"|"CLIMATE 102"|"Extreme Weather"|"278c4561-c834-4343-a770-3f544966f633"|"Which European city is at the same latitude as Ann Arbor?"|"{"choices":["Stockholm, Sweden","Berlin, Germany","London, England","Paris, France","Madrid, Spain"],"type":"MultipleChoice"}"|"{"solution":[4],"selectAll":false,"type":"MultipleChoice"}"|"2019-01-31 22:11:08.000"|"81392cd3-28e9-4e2e-8a33-018104b1f4d1"|"{"solution":[3,4],"type":"MultipleChoice"}"|"0"|"2db10c95-b507-4211-8244-394361148b22"|"Student"|"One"|"student1#umich.edu"|"Student"
"ee73fdaf-a926-4899-b0f7-9b942f1b44ad"|"6-Elbow, Wrist, Hand W19"|"48539109-529e-4359-83b9-2ae81be0532c"|"001"|"3b5b5e49-1798-4eab-86d7-186cf59149b4"|"MOVESCI 230"|"Human Musculoskeletal Anatomy"|"fcd7c673-d944-48c3-8a09-f458e03f8c44"|"What is the name of this movement?"|"{"choices":["first phalangeal joint","first proximal interphalangeal joint","first distal interphalangeal joint","first interphalangeal joint"],"type":"MultipleChoice"}"|"{"solution":[3],"selectAll":false,"type":"MultipleChoice"}"|"2019-01-31 22:07:32.000"|"9016f36c-41f5-4e14-84a9-78eea682c802"|"{"solution":[3],"type":"MultipleChoice"}"|"1"|"7184708d-4dc7-42e0-b1ea-4aca51f00fcd"|"Student"|"Two"|"student2#umich.edu"|"Student"
You are correct that the problem was the form of the CSV file.
I changed readCSV = csv.reader(csvfile) to readCSV = csv.reader(csvfile, delimiter="|", quotechar='|')
I then took the resulting list and removed the extraneous quotation marks from each element.
The rest of the program now works properly.

Specific string sorting [Python 2.7]

I am fairly new to python, and I was trying to sort this string in a certain way (Taken off a database):
6392079|||| 1.0|03/09/2017|PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP|INSULIN INFUSION PUMP / SENSOR AUGMENTED|MEDTRONIC MINIMED|18000 DEVONSHIRE STREET||NORTHRIDGE|CA|91325||US|91325||MMT-723LNAH|MMT-723LNAH|||0LP|R|01/29/2014|OYC||Y
This is the standard format for these types of strings:
MDR_REPORT_KEY|DEVICE_EVENT_KEY|IMPLANT_FLAG|DATE_REMOVED_FLAG|DEVICE_SEQUENCE_NO|DATE_RECEIVED|BRAND_NAME|GENERIC_NAME|MANUFACTURER_D_NAME|MANUFACTURER_D_ADDRESS_1|MANUFACTURER_D_ADDRESS_2|MANUFACTURER_D_CITY|MANUFACTURER_D_STATE_CODE|MANUFACTURER_D_ZIP_CODE|MANUFACTURER_D_ZIP_CODE_EXT|MANUFACTURER_D_COUNTRY_CODE|MANUFACTURER_D_POSTAL_CODE|EXPIRATION_DATE_OF_DEVICE|MODEL_NUMBER|CATALOG_NUMBER|LOT_NUMBER|OTHER_ID_NUMBER|DEVICE_OPERATOR|DEVICE_AVAILABILITY|DATE_RETURNED_TO_MANUFACTURER|DEVICE_REPORT_PRODUCT_CODE|DEVICE_AGE_TEXT|DEVICE_EVALUATED_BY_MANUFACTUR
Is there any way I can print out this string sorted with the specific datatype next to the value?
For example as an output I would like to have
Report key: 6392079
Device sequence number: 1.0
Date received: 03/09/2017
Brand name: PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP
etc.etc. with the other values. I think I would need to use the "|" as a divider to separate the data, but I'm not sure how to. I also cannot use sorting with the index number, because there are many variations of the string above which are all different lengths.
Also as you can see in the string some of the data such as device_event_key, implant_flag, date_removed_flag, and device_sequence number are absent, but there are still corresponding empty vertical slashes.
Any help would be greatly appreciated, thanks.
#nsortur, you can try the below code to get the output.
I have used the concept of list comprehension, zip() function and split(), join() methods defined on string objects.
You can try to run code online at
http://rextester.com/MBDXB29573 (Code perfectly works with Python2/Python3).
string1 = "6392079|||| 1.0|03/09/2017|PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP|INSULIN INFUSION PUMP / SENSOR AUGMENTED|MEDTRONIC MINIMED|18000 DEVONSHIRE STREET||NORTHRIDGE|CA|91325||US|91325||MMT-723LNAH|MMT-723LNAH|||0LP|R|01/29/2014|OYC||Y"
keys = ["Report key", "Device sequence number","Date received", "Brand name"];
values = [key.strip() for key in string1.split("|") if key.strip()];
output = "\n".join([key + ": " + str(value) for key, value in zip(keys, values)]);
print(output);
Output:
Report key: 6392079
Device sequence number: 1.0
Date received: 03/09/2017
Brand name: PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP
Use zip to merge the two lists into tuple pairs:
data = '6392079|||| 1.0|03/09/2017|PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP|INSULIN INFUSION PUMP / SENSOR AUGMENTED|MEDTRONIC MINIMED|18000 DEVONSHIRE STREET||NORTHRIDGE|CA|91325||US|91325||MMT-723LNAH|MMT-723LNAH|||0LP|R|01/29/2014|OYC||Y'
format = 'MDR_REPORT_KEY|DEVICE_EVENT_KEY|IMPLANT_FLAG|DATE_REMOVED_FLAG|DEVICE_SEQUENCE_NO|DATE_RECEIVED|BRAND_NAME|GENERIC_NAME|MANUFACTURER_D_NAME|MANUFACTURER_D_ADDRESS_1|MANUFACTURER_D_ADDRESS_2|MANUFACTURER_D_CITY|MANUFACTURER_D_STATE_CODE|MANUFACTURER_D_ZIP_CODE|MANUFACTURER_D_ZIP_CODE_EXT|MANUFACTURER_D_COUNTRY_CODE|MANUFACTURER_D_POSTAL_CODE|EXPIRATION_DATE_OF_DEVICE|MODEL_NUMBER|CATALOG_NUMBER|LOT_NUMBER|OTHER_ID_NUMBER|DEVICE_OPERATOR|DEVICE_AVAILABILITY|DATE_RETURNED_TO_MANUFACTURER|DEVICE_REPORT_PRODUCT_CODE|DEVICE_AGE_TEXT|DEVICE_EVALUATED_BY_MANUFACTUR'
for label, value in zip(format.split('|'), data.split('|')):
print("%s: %s" % (label.replace('_', ' ').capitalize(), value))
This outputs:
Mdr report key: 6392079
Device event key:
Implant flag:
Date removed flag:
Device sequence no: 1.0
Date received: 03/09/2017
Brand name: PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP
Generic name: INSULIN INFUSION PUMP / SENSOR AUGMENTED
Manufacturer d name: MEDTRONIC MINIMED
Manufacturer d address 1: 18000 DEVONSHIRE STREET
Manufacturer d address 2:
Manufacturer d city: NORTHRIDGE
Manufacturer d state code: CA
Manufacturer d zip code: 91325
Manufacturer d zip code ext:
Manufacturer d country code: US
Manufacturer d postal code: 91325
Expiration date of device:
Model number: MMT-723LNAH
Catalog number: MMT-723LNAH
Lot number:
Other id number:
Device operator: 0LP
Device availability: R
Date returned to manufacturer: 01/29/2014
Device report product code: OYC
Device age text:
Device evaluated by manufactur: Y
This can be achieved by simple split() method of the str, split('|') would have empty strings for the empty values between two |, and then match it with dict having attribute as key and value as value of dict
query = '6392079|||| 1.0|03/09/2017|PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP|INSULIN INFUSION PUMP / SENSOR AUGMENTED|MEDTRONIC MINIMED|18000 DEVONSHIRE STREET||NORTHRIDGE|CA|91325||US|91325||MMT-723LNAH|MMT-723LNAH|||0LP|R|01/29/2014|OYC||Y'
def get_detail(str_):
key_finder = {'Report Key': 0, 'Device Sequence Number': 4, 'Device Recieved': 5, 'Brand Name': 6}
split_by = str_.split('|')
print('Report Key : {}'.format(split_by[key_finder['Report Key']]))
print('Device Seq Num : {}'.format(split_by[key_finder['Device Sequence Number']]))
print('Device Recieved : {}'.format(split_by[key_finder['Device Recieved']]))
print('Brand Name : {}'.format(split_by[key_finder['Brand Name']]))
>>> get_detail(query)
Report Key : 6392079
Device Seq Num : 1.0
Device Recieved : 03/09/2017
Brand Name : PARADIGM REAL-TIME REVEL INSULIN INFUSION PUMP
This works because the splited string will be indexed from 0, so the Report Key will have the value in 0th index of the splitted string and so on for other values. This will be matched with the dict key_finder which has the stored index for each value.

Resources