adding, copying and creating new csv file - python-3.x

I got an assignment to import a CSV file with some fields, and I need to create a new CSV file with different fields that contains the original fields (in a different order).
original csv:
full name,Posiotion,Phone,Email,LinkedIn,Source,Comment
I tried to look up online and this is as far as i got:
import csv
with open("mobileTL.csv", 'r') as csv_file:
reader = csv.reader(csv_file)
newcsvdict = {"First name": [], "Middle name": [], "Last name": [], "Email": [], "Creation date": [], "Status": [],
"Position": [], "ID/SSN": [], "Source": [], "Source type": [], "Availability": [], "Salary expectations": [],
"Phone": [], "Mobile": [], "Street Adress": [], "City": [], "State": [], "Country": [], "Zip": [],
"LinkedIn URL": [], "Resume file name": [], "Migration ID": [], "Comment": [], "Comment2": []}
next(reader)
for row in reader:
first = ""
last = ""
if row[0] != "":
first = row[0].split()[0]
last = row[0].split()[1]
newcsvdict["First name"].append(first)
newcsvdict["Last name"].append(last)
newcsvdict["Phone"].append(row[2])
newcsvdict["Position"].append(row[1])
newcsvdict["Email"].append(row[3])
newcsvdict["Source"].append(row[5])
newcsvdict["Comment"].append(row[6])
newcsvdict["LinkedIn URL"].append(row[4])
with open('new.csv', 'w') as csv_file:
w = csv.DictWriter(csv_file, newcsvdict.keys())
w.writeheader()
w.writerows(newcsvdict)
It does create a new file but for some reason only the header is written.

First, the reason why it's only writing the header is because you'll get an error:
Traceback (most recent call last):
File "test.py", line 29, in <module>
w.writerows(newcsvdict)
...
wrong_fields = rowdict.keys() - self.fieldnames
AttributeError: 'str' object has no attribute 'keys'
You need to learn not to ignore error messages. The cause of that problem is that you were using writerows (note plural rows, which expects an iterable of rows) instead of writerow (note singular row, which expects just one row). To use writerows, you need to pass a list of dicts like this:
w.writerows([newcsvdict, newcsvdict, newcsvdict])
You should be using writerow, since you seem to only have 1 row, newcsvdict. Though, when I went ahead and did that, the output does not seem to be what you need:
First name,Middle name,Last name,Email,Creation date,Status,Position,ID/SSN,Source,Source type,Availability,Salary expectations,Phone,Mobile,Street Adress,City,State,Country,Zip,LinkedIn URL,Resume file name,Migration ID,Comment,Comment2
"['aaa', 'bbb', 'ccc']",[],"['AAA', 'BBB', 'CCC']","['aaa#email.com', 'bbb#email.com', 'ccc#email.com']",[],[],"['Pos1', 'Pos2', 'Pos3']",[],"['aaa', 'bbb', 'ccc']",[],[],[],"['123', '456', '789']",[],[],[],[],[],[],"['aaa', 'bbb', 'ccc']",[],[],"['aaa', 'bbb', 'ccc']",[]
That looks weird, because you created a dict with a list for each value (ex. "First name": []). Maybe that's what you want... but my understanding of your requirement is that you want for the new CSV is to have the same number of rows but different columns.
For that, it does not make sense to store the values as a list. One solution is to read one row, create a dict for it, then writerow it, then just repeat for the steps for all the rows. You can also use DictReader to easily access the values from the old CSV as a dict.
with open("new.csv", "w") as new_file:
new_row = dict.fromkeys([
"First name", "Middle name", "Last name", "Email",
"Creation date", "Status", "Position", "ID/SSN",
"Source", "Source type", "Availability", "Salary expectations",
"Phone", "Mobile", "Street Adress", "City",
"State", "Country", "Zip", "LinkedIn URL",
"Resume file name", "Migration ID", "Comment", "Comment2"
])
writer = csv.DictWriter(new_file, fieldnames=new_row.keys())
writer.writeheader()
with open("old.csv", 'r') as old_file:
old_csv = csv.DictReader(old_file)
for row in old_csv:
first = ""
last = ""
if row["full name"] != "":
first, last = row["full name"].split()
new_row["First name"] = first
new_row["Last name"] = last
new_row["Phone"] = row["Phone"]
new_row["Position"] = row["Position"]
new_row["Email"] = row["Email"]
new_row["Source"] = row["Source"]
new_row["Comment"] = row["Comment"]
new_row["LinkedIn URL"] = row["LinkedIn"]
writer.writerow(new_row)

Related

handle <NULL> value in JSON array in python

i have an array of JSON objects that arrived from the source I can't control, and sometimes there are values, like this:
[{"name": "HDD", "brand": "Samsung", "price": "$100"},
<NULL>,
{"name": "Mouse", "brand": "Logitech", "price": "$10"}]
is there any way to handle it in python? I'm getting a syntax error on reading the value.
i tried to put it this way:
try:
products = sorted(products, key=lambda k: k['price'], reverse=True)
except SyntaxError:
print("Error")
but no luck.

find and replace element in file using python 3.7

I am learning python and in the code below I am trying to replace all values where "id": null, with "id": na,...
there may not always be an id in every block. If c does not have that pattern in it then it is skipped. Thank you :)
file
{
"objects": [
{
"version": "1",
"id": null,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
desired
{
"objects": [
{
"version": "1",
"id": na,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
python3
import json
with open('file.json',encoding='utf8') as in_file:
data = json.load(in_file)
for c in data['objects']: # read all elements in object into c
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
You want to use None in python for a NoneType not null i.e. if not obj['id']: but None is falsy so you can just use if not
import json
with open('file.json', encoding='utf8') as in_file:
data = json.load(in_file)
for obj in data['objects']:
if not obj['id']:
obj['id'] = 'na'
print(data)
"id": an is invalid as Json and in a Python dictionary I think you want to either initialise an as a variable or use a string "id": "an"
Or more concisely use comprehensions:
import json
with open('file.json', encoding='utf8') as in_file:
print(json.dumps({'objects': [{k: 'na' if k == 'id' and not v else v for k, v in entry.items()} for entry in json.load(in_file)['objects']]}))
As #Mohamed Fathallah suggests use json.dump() to write the data to a file or json.dumps() to display it as a json formatted string.
The Error is in following code
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
As #Dan-Dev mentioned, the null in c#, c++, etc. is None in Python. You can use this code and read my code as side information to understand more.
import json
with open('file.json', 'r') as file:
data = json.load(file)
for obj in data['objects']:
if obj.get('id') is None:
obj['id'] = 'na'
# if you want to write back
with open('data.json', 'w') as file:
json.dump(data, file, indent=2)

Is there a shorter way of exporting python nested dictionary to CSV?

I have a nested dictionary which comprises of multiple lists and dictionaries. The "Stations" key contains contains the values which I want to convert to CSV file. I am only after the certain values. A snippet of the dictionary is as below:
data = { "brands": {...},
"fueltypes": {...},
"stations": {"items": [
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2126",
"name": "Cumnock General Store",
"address": "31 Obley St, CUMNOCK NSW 2867",
"location": {
"latitude": -32.928744,
"longitude": 148.755153
},
"state": "NSW"
},
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2200",
"name": "Tea Tree Cafe",
"address": "160 Mount Darragh Rd, SOUTH PAMBULA NSW 2549",
"location": {
"latitude": -36.944277,
"longitude": 149.845399
},
"state": "NSW"
}....]}}
In order to obtain certain values in "Stations" key, I created blank lists for each of those values and appended accordingly. After that I used the ZIP function to combine the list and converted to a CSV. The Code that I have used is as below:
Station_Code = []
Station_Name = []
Latitude = []
Longitude = []
Address = []
Brand = []
for k,v in data["stations"].items():
for item in range(len(v)):
Station_Code.append(v[item]["code"])
Station_Name.append(v[item]["name"])
Latitude.append(v[item]["location"]["latitude"])
Longitude.append(v[item]["location"]["longitude"])
Address.append(v[item]["address"])
Brand.append(v[item]["brand"])
#print(f'{v[item]["code"]} - {v[item]["name"]} - {v[item]["location"]["latitude"]}')
rows = zip(Station_Code, Station_Name, Latitude, Longitude, Address, Brand )
with open("Exported_File.csv", "w") as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
Is there any other alternate/short ways of extracting this information?
If you're using pandas, there's a fairly easy way to do this.
import pandas as pd
# Convert dict to a pandas DataFrame
df = pd.DataFrame(data["stations"]["items"])
# 'location' is a dict, so we need to extract the 'latitude' and 'longitude'.
df['latitude'] = df['location'].apply(lambda x: x['latitude'])
df['longitude'] = df['location'].apply(lambda x: x['longitude'])
# Select subset of columns for final csv
df = df[['code', 'name', 'latitude', 'longitude', 'address', 'brand']]
df.to_csv('exported-file.csv', index=False, header=False)

Convert CSV to dictionary without using libraries

I have this CSV:
color,property,type,id
red,house,building,02
I'm trying to convert a csv to dictionary with the following structure:
{
"0": {"val1": 1, "val2": 2, "val3": 3, ..., "valn": n},
"1": {"val1": 45, "val2": 7, "val3": None, ..., "valn": 68},
}
Where as val1, val2 and so on are the header names of the columns and "0" and "1" are the number of rows.
So we should have:
CSV content is like this:
color,property,type,id
red,house,building,02
blue,department,flat,04
{
"0": {"color": "red", "property": "house", "type": "building", ..., "valn": n},
"1": {"color": "blue", "property": "farm", "type": "area", ..., "valn": n},
}
How can I achieve this result without using any library? I'd like to implement it from the scratch and don't use CSV library or the like.
Thank you.
Try this approach:
inp = """color,property,type,id
red,house,building,02
blue,department,flat,04
cyan,,flat,10
"""
lines = inp.split('\n')
colnames = list(map(lambda x: x.strip(), lines[0].split(',')))
lines = lines[1:]
res = {}
for i, line in enumerate(lines[:-1]):
res[i] = {
colname: val if val != '' else None
for colname, val in zip(colnames, map(lambda x: x.strip(), line.split(',')))
}
print(res)
However for additional features like type deduction code will be more complex: you can follow answers to this question

I don't know why the second if block doesn't work?

#!/usr/bin/python
from TwitterSearch import *
import sys
import csv
tso = TwitterSearchOrder() # create a TwitterSearchOrder object
tso.set_keywords(['gmo']) # let's define all words we would like to have a look for
tso.set_language('en') # we want to see English tweets only
tso.set_include_entities(False) # and don't give us all those entity information
max_range = 1 # search range in kilometres
num_results = 500 # minimum results to obtain
outfile = "output.csv"
# create twitter API object
twitter = TwitterSearch(
access_token = "764537836884242432-GzJmUSL4hcC2DOJD71TiQXwCA0aGosz",
access_token_secret = "zDGYDeigRqDkmdqTgBOltcfNcNnfLwRZPkPLlnFyY3xqQ",
consumer_key = "Kr9ThiJWvPa1uTXZoj4O0YaSG",
consumer_secret = "ozGCkXtTCyCdOcL7ZFO4PJs85IaijjEuhl6iIdZU0AdH9CCoxS"
)
# Create an array of USA states
ustates = [
"AL",
"AK",
"AS",
"AZ",
"AR",
"CA",
"CO",
"CT",
"DE",
"DC",
"FM",
"FL",
"GA",
"GU",
"HI",
"ID",
"IL",
"IN",
"IA",
"KS",
"KY",
"LA",
"ME",
"MH",
"MD",
"MA",
"MI",
"MN",
"MS",
"MO",
"MT",
"NE",
"NV",
"NH",
"NJ",
"NM",
"NY",
"NC",
"ND",
"MP",
"OH",
"OK",
"OR",
"PW",
"PA",
"PR",
"RI",
"SC",
"SD",
"TN",
"TX",
"UT",
"VT",
"VI",
"VA",
"WA",
"WV",
"WI",
"WY",
"USA"
]
def linearSearch(item, obj, start=0):
for i in range(start, len(obj)):
if item == obj[i]:
return True
return False
# open a file to write (mode "w"), and create a CSV writer object
csvfile = file(outfile, "w")
csvwriter = csv.writer(csvfile)
# add headings to our CSV file
row = [ "user", "text", "place"]
csvwriter.writerow(row)
#-----------------------------------------------------------------------
# the twitter API only allows us to query up to 100 tweets at a time.
# to search for more, we will break our search up into 10 "pages", each
# of which will include 100 matching tweets.
#-----------------------------------------------------------------------
result_count = 0
last_id = None
while result_count < num_results:
# perform a search based on latitude and longitude
# twitter API docs: https://dev.twitter.com/docs/api/1/get/search
query = twitter.search_tweets_iterable(tso)
for result in query:
state = 0
if result["place"]:
user = result["user"]["screen_name"]
text = result["text"]
text = text.encode('utf-8', 'replace')
place = result["place"]["full_name"]
state = place.split(",")[1]
if linearSearch(state,ustates):
print state
# now write this row to our CSV file
row = [ user, text, place ]
csvwriter.writerow(row)
result_count += 1
last_id = result["id"]
print "got %d results" % result_count
csvfile.close()
I am trying to categorize the tweets by my array ustates, but the second if block seems like it doesn't work. I had no idea about that. What I did was to do a linear search, if my item is equal to the item in my array, I will write it into a csv file.
as it looks like the problem is some whitespaces remaining, you can use .strip() to remove them
>>> x=" WY "
>>> x.strip()
'WY'
>>>
Also some other tips
To speed up the membership test in ustates use a set instead of a list because set have a constant time check, while list is a linear search
The preferred way to open a file is using a context manager which ensure the closing of the file at the end of the block or in case of error in the block. Also use open instead of file
with those tip the code should look like
#!/usr/bin/python
... # all the previous stuff
# Create an set of USA states
ustates = {
"AL", "AK", "AS", "AZ", "AR",
"CA", "CO", "CT",
"DE", "DC",
"FM", "FL",
"GA", "GU",
"HI",
"ID", "IL", "IN", "IA",
"KS", "KY",
"LA",
"ME", "MH", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "MP",
"NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND",
"OH", "OK", "OR",
"PW", "PA", "PR",
"RI",
"SC", "SD",
"TN", "TX",
"UT",
"VT", "VI", "VA",
"WA", "WV", "WI", "WY",
"USA"
} # that arrange is just to take less lines, while grouping them alphabetically
# open a file to write (mode "w"), and create a CSV writer object
with open(outfile,"w") as csvfile:
... # the rest is the same
while result_count < num_results:
# perform a search based on latitude and longitude
# twitter API docs: https://dev.twitter.com/docs/api/1/get/search
query = twitter.search_tweets_iterable(tso)
for result in query:
state = 0
if result["place"]:
... # all the other stuff
state = state.strip() #<--- the strip part, add the .upper() if needed or just in case
if state in ustates:
... # all the other stuff
... # the rest of stuff
print "got %d results" % result_count

Resources