Python3 Pygal Worldmap Data not showing - python-3.x

Loading a json file to a list, converting to a dictionary in order to load into the pygal worldmap. When I print the dictionary the data looks ok (to me), however the map opens (from an svg file) but with no data plotted.
No traceback errors are occuring.
I'm not running the latest version of pygal.
Sample output from dictionary:
{'AF': 9733784, 'AL': 1437590, 'DZ': 92215683, 'AO': 17394550, 'AG': 19061, 'AR': 0}
Code below:
import json
import pygal
# Load the data into a list.
filename = 'test.json'
with open(filename, 'rb') as f:
sr_data = json.load(f)
# Print sr_data rows.
sr_exp = {}
for sr_dict in sr_data:
country_code = sr_dict['CountryCode']
gross = int(float(sr_dict['Exposed']))
if country_code:
sr_exp[country_code] = gross
# Create map.
wm = pygal.Worldmap()
wm.title = 'SR Data'
wm.add('',sr_exp)
wm.render_to_file('sr.svg')

The country codes used as keys in your dictionary need to be in lower case. The simplest fix would be to change the line
sr_exp[country_code] = gross
to
sr_exp[country_code.lower()] = gross

Related

Passing Key,Value into a Function

I want to check a YouTube video's views and keep track of them over time. I wrote a script that works great:
import requests
import re
import pandas as pd
from datetime import datetime
import time
def check_views(link):
todays_date = datetime.now().strftime('%d-%m')
now_time = datetime.now().strftime('%H:%M')
#get the site
r = requests.get(link)
text = r.text
tag = re.compile('\d+ views')
views = re.findall(tag,text)[0]
#get the digit number of views. It's returned in a list so I need to get that item out
cleaned_views=re.findall('\d+',views)[0]
print(cleaned_views)
#append to the df
df.loc[len(df)] = [todays_date, now_time, int(cleaned_views)]
#df = df.append([todays_date, now_time, int(cleaned_views)],axis=0)
df.to_csv('views.csv')
return df
df = pd.DataFrame(columns=['Date','Time','Views'])
while True:
df = check_views('https://www.youtube.com/watch?v=gPHgRp70H8o&t=3s')
time.sleep(1800)
But now I want to use this function for multiple links. I want a different CSV file for each link. So I made a dictionary:
link_dict = {'link1':'https://www.youtube.com/watch?v=gPHgRp70H8o&t=3s',
'link2':'https://www.youtube.com/watch?v=ZPrAKuOBWzw'}
#this makes it easy for each csv file to be named for the corresponding link
The loop then becomes:
for key, value in link_dict.items():
df = check_views(value)
That seems to work passing the value of the dict (link) into the function. Inside the function, I just made sure to load the correct csv file at the beginning:
#Existing csv files
df=pd.read_csv(k+'.csv')
But then I'm getting an error when I go to append a new row to the df (“cannot set a row with mismatched columns”). I don't get that since it works just fine as the code written above. This is the part giving me an error:
df.loc[len(df)] = [todays_date, now_time, int(cleaned_views)]
What am I missing here? It seems like a super messy way using this dictionary method (I only have 2 links I want to check but rather than just duplicate a function I wanted to experiment more). Any tips? Thanks!
Figured it out! The problem was that I was saving the df as a csv and then trying to read back that csv later. When I saved the csv, I didn't use index=False with df.to_csv() so there was an extra column! When I was just testing with the dictionary, I was just reusing the df and even though I was saving it to a csv, the script kept using the df to do the actual adding of rows.

For loop into a pandas dataframe

I have the following piece of code and it works but prints out data as it should. I'm trying (unsuccessfully) to putting the results into a dataframe so I can export the results to a csv file.
I am looping through a json file and the results are correct, I just need two columns that print out to go into a dataframe instead of printing the results. I took out the code that was causing the error so it will run.
import json
import requests
import re
import pandas as pd
data = {}
df = pd.DataFrame(columns=['subtechnique', 'name'])
df
RE_FOR_SUB_TECHNIQUE = r"(T\d+)\.(\d+)"
r = requests.get('https://raw.githubusercontent.com/mitre/cti/master/enterprise-attack/enterprise-attack.json', verify=False)
data = r.json()
objects = data['objects']
for obj in objects:
ext_ref = obj.get('external_references',[])
revoked = obj.get('revoked') or '*****'
subtechnique = obj.get('x_mitre_is_subtechnique')
name = obj.get('name')
for ref in ext_ref:
ext_id = ref.get('external_id') or ''
if ext_id:
re_match = re.match(RE_FOR_SUB_TECHNIQUE, ext_id)
if re_match:
technique = re_match.group(1)
sub_technique = re_match.group(2)
print('{},{}'.format(technique+'.'+sub_technique, name))
Unless there is an easier way to put the results of each row in the loop and have that append to a csv file.
Any help is appreciated.
Thanks
In this instance, it's likely easier to just write the csv file directly, rather than go through Pandas:
with open("enterprise_attack.csv", "w") as f:
my_writer = csv.writer(f)
for obj in objects:
ext_ref = obj.get('external_references',[])
revoked = obj.get('revoked') or '*****'
subtechnique = obj.get('x_mitre_is_subtechnique')
name = obj.get('name')
for ref in ext_ref:
ext_id = ref.get('external_id') or ''
if ext_id:
re_match = re.match(RE_FOR_SUB_TECHNIQUE, ext_id)
if re_match:
technique = re_match.group(1)
sub_technique = re_match.group(2)
print('{},{}'.format(technique+'.'+sub_technique, name))
my_writer.writerow([technique+"."+sub_technique, name])
It should be noted that the above will overwrite the output of any previous runs. If you wish to keep the output of multiple runs, change the file mode to "a":
with open("enterprise_attack.csv", "a") as f:

Converting list to dictionary, and tokenizing the key values - possible?

So basically I have a folder of files I'm opening and reading into python.
I want to search these files and count the keywords in each file, to make a dataframe like the attached image.
I have managed to open and read these files into a list, but my problem is as follows:
Edit 1:
I decided to try and import the files as a dictionary instead. It works, but when I try to lower-case the values, I get a 'list' object attribute error - even though in my variable explorer, it's defined as a dictionary.
import os
filenames = os.listdir('.')
file_dict = {}
for file in filenames:
with open(file) as f:
items = [i.strip() for i in f.read().split(",")]
file_dict[file.replace(".txt", "")] = items
def lower_dict(d):
new_dict = dict((k, v.lower()) for k, v in d.items())
return new_dict
print(lower_dict(file_dict))
output =
AttributeError: 'list' object has no attribute 'lower'
Pre-edit post:
1. Each list value doesn't retain the filename key. So I don't have the rows I need.
2. I can't conduct a search of keywords in the list anyway, because it is not tokenized. So I can't count the keywords per file.
Here's my code for opening the files, converting them to lowercase and storing them in a list.
How can I transform this into a dictionary retaining the filename, and tokenized key values?. Additionally, is it better to somehow import the file and contents into a dictionary directly? Can I still tokenize and lower-case everything?
import os
import nltk
# create list of filenames to loop over
filenames = os.listdir('.')
#create an empty list for storage
Lcase_content = []
tokenized = []
num = 0
# read files from folder, convert to lower case
for filename in filenames:
if filename.endswith(".txt"):
with open(os.path.join('.', filename)) as file:
content = file.read()
# convert to lower-case value
Lcase_content.append(content.lower())
## this two lines below don't work - index out of range error
tokenized[num] = nltk.tokenize.word_tokenize(tokenized[num])
num = num + 1
You can compute the count of each token by using Collections. collections.Counter can take a list of strings and return a dictionary-like Counter with each token in its keys and the count of the tokens in values. Since NLTK's workd_tokenize takes a sequence of strings and returns a list, to get a dictionary with tokens and their counts, you can basically do this:
Counter(nltk.tokenize.word_tokenize())
Since you want your file names as index (first column), make it as a nested dictionary, with a file name as a key and another dictionary with tokens and counts as a value, which looks like this:
{'file1.txt': Counter({'cat': 4, 'dog': 0, 'squirrel': 12, 'sea horse': 3}),
'file2.txt': Counter({'cat': 11, 'dog': 4, 'squirrel': 17, 'sea horse': 0})}
If you are familiar with Pandas, you can convert your dictionary to a Pandas dataframe. It will make your life so much easier to work with any tsv/csv/excel file by exporting the Pandas dataframe result as a csv file. Make sure you apply .lower() to your file content and include orient='index' so that files names be your index.
import os
import nltk
from collections import Counter
import pandas as pd
result = dict()
filenames = os.listdir('.')
for filename in filenames:
if filename.endswith(".txt"):
with open(os.path.join('.', filename)) as file:
content = file.read().lower()
result[filename] = Counter(nltk.tokenize.word_tokenize(content))
df = pd.DataFrame.from_dict(result, orient='index').fillna(0)
df['total words'] = df.sum(axis=1)
df.to_csv('words_count.csv', index=True)
Re: your first attempt, since your 'items' is a list (see [i.strip() for i in f.read().split(",")]), you can't apply .lower() to it.
Re: your second attempt, your 'tokenized' is empty as it was initialized as tokenized = []. That's why when you try to do tokenized[num] = nltk.tokenize.word_tokenize(tokenized[num]), tokenized[num] with num = 0 gives you the index out of range error.

Can't store the scraped results in third and fourth column in a csv file

I've written a script which is scraping Address and Phone number of certain shops based on Name and Lid. The way it is searching is that It takes Name and Lid stored in column A and Column B respectively from a csv file. However, after fetching the result based on the search, I expected the parser to put that results in column C and column D respectively as it is shown in the second Image. At this point, I got stuck. I don't know how to manipulate Third and Fourth column using reading or writing method so that the data should be placed there. I'm trying with this now:
import csv
import requests
from lxml import html
Names, Lids = [], []
with open("mytu.csv", "r") as f:
reader = csv.DictReader(f)
for line in reader:
Names.append(line["Name"])
Lids.append(line["Lid"])
with open("mytu.csv", "r") as f:
reader = csv.DictReader(f)
for entry in reader:
Page = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
response = requests.get(Page)
tree = html.fromstring(response.text)
titles = tree.xpath('//article[contains(#class,"business-card")]')
for title in titles:
Address= title.xpath('.//p[#class="address"]/span/text()')[0]
Contact = title.xpath('.//p[#class="phone"]/text()')[0]
print(Address,Contact)
How my csv file looks like now:
My desired output is something like:
You can do it like this. Create a fresh output csv file whose header is based on the input csv, with the addition of the two columns. When you read a csv row it's available as a dictionary, in this case called entry. You can add the new values to this dictionary from the stuff you've gleaned on the 'net. Then write each newly created row out to file.
import csv
import requests
from lxml import html
with open("mytu.csv", "r") as f, open('new_mytu.csv', 'w', newline='') as g:
reader = csv.DictReader(f)
newfieldnames = reader.fieldnames + ['Address', 'Phone']
writer = csv.writer = csv.DictWriter(g, fieldnames=newfieldnames)
writer.writeheader()
for entry in reader:
Page = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
response = requests.get(Page)
tree = html.fromstring(response.text)
titles = tree.xpath('//article[contains(#class,"business-card")]')
#~ for title in titles:
title = titles[0]
Address= title.xpath('.//p[#class="address"]/span/text()')[0]
Contact = title.xpath('.//p[#class="phone"]/text()')[0]
print(Address,Contact)
new_row = entry
new_row['Address'] = Address
new_row['Phone'] = Contact
writer.writerow(new_row)

Trouble Writing to a new excel file

I'm very new to python and got an assignment asking me to:
Design your own code in do something here part to save the title, id, share count
and comment count of each news media in separated columns of a Excel (.xls) file.
Design your own code to read the share count and comment count from the Excel
file created in step 3, and calculate the average share count and comment count of
those news media websites.
Here is my current code:
from urllib import request
import json
from pprint import pprint
import xlwt
'''
import xlrd
from xlutils import copy
'''
website_list = [
'http://www.huffingtonpost.com/',
'http://www.cnn.com/',
'https://www.nytimes.com/',
'http://www.foxnews.com/',
'http://www.nbcnews.com/'
] # place your list of website urls, e.g., http://jmu.edu
for website in website_list:
url_str = 'https://graph.facebook.com/'+website # create the url for facebook graph api
response = request.urlopen(url_str) # read the reponse into computer
html_str = response.read().decode("utf-8") # convert the reponse into string
json_data = json.loads(html_str) # convert the string into json
pprint (json_data)
book = xlwt.Workbook()
sheet_test = book.add_sheet('keys')
sheet_test.write(0,0,'Title')
sheet_test.write(0,1,'ID')
sheet_test.write(0,2,'Share Count')
sheet_test.write(0,3,'Comment Count')
for i in range(0,5):
for website in website_list[i]:
sheet_test.write(i,0,json_data['og_object']['title'])
sheet_test.write(i,1,json_data['id'])
sheet_test.write(i,2,json_data['share']['share_count'])
sheet_test.write(i,3,json_data['share']['comment_count'])
book.save('C:\\Users\\stinesr\\Downloads\\Excel\\keys.xls')
'''
reading_book = xlrd.open_workbook('C:\\Users\\stinesr\\Downloads\\Excel\\key.xls')
sheet_read = reading_book.sheet_by_name('keys')
num_record = sheet_read.nrows
writing_book = copy(reading_book)
sheet_write = writing_book.get_sheet(0)
print(sheet_write.name)
for i in range(num_record):
row = sheet_read.row_values(i)
if i == 0:
sheet_write.write(0,4,'Share Count Average')
sheet_write.write(0,5,'Comment Count Average')
else:
sheet_write.write(i,4,row[2])
sheet_write.write(i,5,row[3])
writing_book.save('C:\\Users\\stinesr\\Downloads\\Excel\\keys.xls')
'''
Any and all help is appreciated, thank you.
The Traceback error says in the nested for-loops on lines 40-45 you are attempting to overwrite the row 0 from the previous lines. You need to start from row 1, since row 0 already contains the header.
But before that, json_data is only keeping the last response, you'll want to create a list of "responses" and append each response to that list.
You need only one for-loop at line 40:
In summary:
website_list = [
'http://www.huffingtonpost.com/',
'http://www.cnn.com/',
'https://www.nytimes.com/',
'http://www.foxnews.com/',
'http://www.nbcnews.com/'
] # place your list of website urls, e.g., http://jmu.edu
json_list = []
for website in website_list:
url_str = 'https://graph.facebook.com/' + website # create the url for facebook graph api
response = request.urlopen(url_str) # read the reponse into computer
html_str = response.read().decode("utf-8") # convert the reponse into string
json_data = json.loads(html_str) # convert the string into json
json_list.append(json_data)
pprint (json_list)
book = xlwt.Workbook()
sheet_test = book.add_sheet('keys')
sheet_test.write(0,0,'Title')
sheet_test.write(0,1,'ID')
sheet_test.write(0,2,'Share Count')
sheet_test.write(0,3,'Comment Count')
for i in range(len(json_list)):
sheet_test.write(i+1, 0, json_list[i]['og_object']['title'])
sheet_test.write(i+1, 1, json_list[i]['id'])
sheet_test.write(i+1, 2, json_list[i]['share']['share_count'])
sheet_test.write(i+1, 3, json_list[i]['share']['comment_count'])
book.save('C:\\Users\\stinesr\\Downloads\\Excel\\keys.xls')
Should give you an Excel document that resembles:

Resources