List all system fonts as dictionary. | python - python-3.x

I want to get all system fonts (inside c://Windows//Fonts) as dictionary since I need to differentiate between bold and italic etc. Though when listing the content of the directory via os.listdir or in terminal it's not possible to tell which font is what. (or at least in most cases) Further, even if you wanted to iterate through all fonts you could barely tell whether it's the 'regular' font or a variant.
So windows list the folder as follows:
Each of these 'font-folders' looks like (depending on their different styles) :
Lastly, this is what I get via the list command (unreadable and unusable for most cases):
So this is the output I wish I could achieve (or similar):
path = "C://Windows//Fonts"
# do some magic
dictionary = {
'Arial':'Regular': 'Arial-Regular.ttf','Bold':'Arial-Bold.ttf',
'Carlito:'Regular':' 8514fix.fon','Bold':'someweirdotherfile.fon'
}
The only things I got so far are the bare installed font names not their filenames.
So if there is any way to either get the content as dictionary or to get the filename of the fonts please be so kind and give me a tip :)

I know that you made that post 2 years ago, but so happens that I made piece of code that makes similar thing for which you're asking for. I'm into programming only 1,5 months (and it's my 1st answer on stackoverflow ever), so probably it can be improved in many ways, but maybe there will be some guy, which I'll help with that code or give idea how they'll write it.
from fontTools import ttLib
import os
from os import walk
import json
path = "C://Windows//Fonts"
fonts_path = []
for (dirpath, dirnames, filenames) in walk(fr'{path}\Windows\Fonts'):
for i in filenames:
if any(i.endswith(ext) for ext in ['.ttf', '.otf', '.ttc', '.ttz', '.woff', '.woff2']):
fonts_path.append(dirpath.replace('\\\\', '\\') + '\\' + i)
def getFont(font, font_path):
x = lambda x: font['name'].getDebugName(x)
if x(16) is None:
return x(1), x(2), font_path
if x(16) is not None:
return x(16), x(17), font_path
else:
pass
fonts = []
for i in range(len(fonts_path)):
j = fonts_path[i]
if not j.endswith('.ttc'):
fonts.append(getFont(ttLib.TTFont(j), j))
if j.endswith('.ttc'):
try:
for k in range(100):
fonts.append(getFont(ttLib.TTFont(j, fontNumber=k), j))
except:
pass
fonts_dict = {}
no_dups = []
for i in fonts:
index_0 = i[0]
if index_0 not in no_dups:
no_dups.append(index_0)
for i in fonts:
for k in no_dups:
if i[0] == k:
fonts_dict[k] = json.loads('{\"' + str(i[1]) + '\" : \"' + str(i[2]).split('\\')[-1] + '\"}')
for j in fonts:
if i[0] == j[0]:
fonts_dict[k][j[1]] = j[2].split('\\')[-1]
print(json.dumps(fonts_dict, indent=2))
Some sample output (I made it shorter, because otherwise it'd be too big):
{
"CAC Moose PL": {
"Regular": "CAC Moose PL.otf"
},
"Calibri": {
"Bold Italic": "calibriz.ttf",
"Regular": "calibri.ttf",
"Bold": "calibrib.ttf",
"Italic": "calibrii.ttf",
"Light": "calibril.ttf",
"Light Italic": "calibrili.ttf"
},
"Cambria": {
"Bold Italic": "cambriaz.ttf",
"Regular": "cambria.ttc",
"Bold": "cambriab.ttf",
"Italic": "cambriai.ttf"
},
"Cambria Math": {
"Regular": "cambria.ttc"
},
"Candara": {
"Bold Italic": "Candaraz.ttf",
"Regular": "Candara.ttf",
"Bold": "Candarab.ttf",
"Italic": "Candarai.ttf",
"Light": "Candaral.ttf",
"Light Italic": "Candarali.ttf"
},
"Capibara Mono B PL": {
"Regular": "Capibara Mono B PL.otf"
}
}
If someone needs full path to the font, then the only thing you need to change is removing .split('\')[-1] in last for loop, then output will be like that:
"Arial": {
"Black": "C:\\Windows\\Fonts\\ariblk.ttf",
"Regular": "C:\\Windows\\Fonts\\arial.ttf",
"Bold": "C:\\Windows\\Fonts\\arialbd.ttf",
"Bold Italic": "C:\\Windows\\Fonts\\arialbi.ttf",
"Italic": "C:\\Windows\\Fonts\\ariali.ttf"
}
Some postscriptium. Fonts in Windows are stored in two folders. One for global fonts (installed for all users) and user's specific. Global fonts are stored in 'C:\Windows\Fonts' but user local are stored in 'C:\Users\username\AppData\Local\Microsoft\Windows\Fonts', so keep that in mind. It's possible to simply take username to variable by os.getlogin().
I decided to ignore .fon fonts because they're very problematic (for me).
Some code explaination:
fonts_path = []
for (dirpath, dirnames, filenames) in walk(fr'{path}\Windows\Fonts'):
for i in filenames:
if any(i.endswith(ext) for ext in ['.ttf', '.otf', '.ttc', '.ttz', '.woff', '.woff2']):
fonts_path.append(dirpath.replace('\\\\', '\\') + '\\' + i)
Taking only .ttf, otf, ttc, ttz, woff and woff2 fonts from Windows's global fonts folder and making list (fonts_path) with all fonts path.
def getFont(font, font_path):
x = lambda x: font['name'].getDebugName(x)
if x(16) is None:
return x(1), x(2), font_path
if x(16) is not None:
return x(16), x(17), font_path
else:
pass
So this funtion based on ttLib.TTFont file obtained in ttLib.TTFont(font_path) function from fontTools library and font path, checks debug names of fonts. Debug names are pre-defined metadata of font and contains informations such as font name, font family etc. You can read about that here: https://learn.microsoft.com/en-us/typography/opentype/spec/name#name-ids. So example use for full font name'll be:
NameID = 4
font = ttLib.TTFont(font_path)
font_full_name = font['name'].getDebugName(NameID)
print(font_full_name)
Example output: Candara Bold Italic
Basicly we need only family font name and font name. Only problem is that some fonts are having None value on NameID 1 and 2 so based on font, value is taken from NameID 1, 2 or 16, 17. After that every font is being packed to tuple in this way: (font_family, font_name, font_path)
fonts = []
for i in range(len(fonts_path)):
j = fonts_path[i]
if not j.endswith('.ttc'):
fonts.append(getFont(ttLib.TTFont(j), j))
if j.endswith('.ttc'):
try:
for k in range(100):
fonts.append(getFont(ttLib.TTFont(j, fontNumber=k), j))
except:
pass
.ttc fonts need special treatment, because .ttc font format contains more than one font, so we must specify which font we're going to use, so syntax of ttLib.TTFont(font_path) needs one more argument: fontNumber, so then it become: ttLib.TTFont(font_path, fontNumber=font_index).
After that we have list full of tuples in order: (font_family, font_name, font_path)
no_dups = []
for i in fonts:
index_0 = i[0]
if index_0 not in no_dups:
no_dups.append(index_0)
We make list (named: no_dups) of all font families (stored in index 0 of tuple), without duplications.
fonts_dict = {}
for i in fonts:
for k in no_dups:
if i[0] == k:
fonts_dict[k] = json.loads('{\"' + str(i[1]) + '\" : \"' + str(i[2]).split('\\')[-1] + '\"}')
for j in fonts:
if i[0] == j[0]:
fonts_dict[k][j[1]] = j[2].split('\\')[-1]
This code creates all font families as dictionary and then changes value in that way that value of font family dict is another dict.
So it's manipulated in this way:
It's in list:
[..., 'Cambria', ...]
Making dict with one value at time:
{...}, "Cambria": {"Bold Italic": "calibriz.ttf"}, {...}
Adding more key's with values to subdicts:
{...}, "Cambria": {"Bold Italic": "calibriz.ttf", "Regular": "cambria.ttc"}, {...}
And it's doing it till everything is assigned to dictionary.
And lastly:
print(json.dumps(fonts_dict, indent=2))
Printing result in nicely formated way.
Hope that I'll help somebody.

Related

Reading images from pdf and extract Text from it

Problem Statement: I have a pdf which contains n number of pages and each page has 1 image whose text I need to read and perform some operation.
What I tried: I have to do this in python, and the only library I found with the best result is pytesserac.
I am pasting the sample code which I tried
fn = kw['fn'] = self.env.context.get('wfg_pg', kw['fn'])
zoom, zoom_config = self.get_zoom_for_doc(index), ' -c tessedit_do_invert=0'
if 3.3 < zoom < 3.5:
zoom_config += ' --oem 3 --psm 4'
elif 0 != page_number_list[0]:
zoom_config += ' --psm 6'
full_text, page_length = '', kw['doc'].pageCount
if recursion and index >= 10:
return fn.get('most_correct') or fn.get(page_number_list[0])
mat = fitz.Matrix(zoom, zoom) # increase resolution
for page_no in page_number_list:
page = kw['doc'].loadPage(page_no) # number of page
pix = page.getPixmap(matrix=mat)
with Image.open(io.BytesIO(pix.getImageData())) as img:
text_of_each_page = str(pytesseract.image_to_string(img, config='%s' % zoom_config)).strip()
fn[page_no] = text_of_each_page
full_text = '\n'.join((full_text, text_of_each_page, '\n'))
_logger.critical(f"full text in load immage {full_text}")
args = (full_text, page_number_list)
load = recursion and self.run_recursion_to_load_new_image_to_text(*args, **kw)
if recursion and load:
return self.load_image
return full_text
The issue: My pdf is having dates like 1/13, 1/7 the library is reading them as 143, 1n and in some places, it is reading 17 as 1). Also after the text, it is also giving some symbols like { & . , = randomly whereas in pdf these things are not even there.
For accuracy
1. I tried converting the image to .tiff format but it didn't work for me.
2. Tried adjusting the resolution of the image.
You can use pdftoppm tool for converting you images really fast as it provides you to use multi-threading feature by just passing thread_count=(no of threads).
You can refer to this link for more info on this tool. Also better images can increase the accuracy of tesseract.

is there a method to collect data intelligently from website?

i want to get data from this link https://meshb.nlm.nih.gov/treeView
the problem is to get all the tree, we should click on + each time and for each line to get the children node of the tree,
but I want to display all the tree just on one click then i want to copy all the content.
Any ideas, please?
Well, it all depends what you mean by "intelligently". Not sure if that meets the criteria, but you might want to try this.
import json
import string
import requests
abc = string.ascii_uppercase
base_url = "https://meshb.nlm.nih.gov/api/tree/children/"
follow_url = "https://meshb.nlm.nih.gov/record/ui?ui="
tree = {}
for letter in abc[:1]:
res = requests.get(f"{base_url}{letter}").json()
tree[letter] = {
"Records": [i["RecordName"] for i in res],
"FollowURLS": [f"{follow_url}{i['RecordUI']}" for i in res],
}
print(json.dumps(tree, indent=2))
This prints:
{
"A": {
"Records": [
"Body Regions",
"Musculoskeletal System",
"Digestive System",
"Respiratory System",
"Urogenital System",
"Endocrine System",
"Cardiovascular System",
"Nervous System",
"Sense Organs",
"Tissues",
"Cells",
"Fluids and Secretions",
"Animal Structures",
"Stomatognathic System",
"Hemic and Immune Systems",
"Embryonic Structures",
"Integumentary System",
"Plant Structures",
"Fungal Structures",
"Bacterial Structures",
"Viral Structures"
],
"FollowURLS": [
"https://meshb.nlm.nih.gov/record/ui?ui=D001829",
"https://meshb.nlm.nih.gov/record/ui?ui=D009141",
"https://meshb.nlm.nih.gov/record/ui?ui=D004064",
"https://meshb.nlm.nih.gov/record/ui?ui=D012137",
"https://meshb.nlm.nih.gov/record/ui?ui=D014566",
"https://meshb.nlm.nih.gov/record/ui?ui=D004703",
"https://meshb.nlm.nih.gov/record/ui?ui=D002319",
"https://meshb.nlm.nih.gov/record/ui?ui=D009420",
"https://meshb.nlm.nih.gov/record/ui?ui=D012679",
"https://meshb.nlm.nih.gov/record/ui?ui=D014024",
"https://meshb.nlm.nih.gov/record/ui?ui=D002477",
"https://meshb.nlm.nih.gov/record/ui?ui=D005441",
"https://meshb.nlm.nih.gov/record/ui?ui=D000825",
"https://meshb.nlm.nih.gov/record/ui?ui=D013284",
"https://meshb.nlm.nih.gov/record/ui?ui=D006424",
"https://meshb.nlm.nih.gov/record/ui?ui=D004628",
"https://meshb.nlm.nih.gov/record/ui?ui=D034582",
"https://meshb.nlm.nih.gov/record/ui?ui=D018514",
"https://meshb.nlm.nih.gov/record/ui?ui=D056229",
"https://meshb.nlm.nih.gov/record/ui?ui=D056226",
"https://meshb.nlm.nih.gov/record/ui?ui=D056224"
]
}
}
If you want all of it, just remove [:1] from the loop. If there's no entry for a given letter on the page you'll get, well, an empty entry in the dictionary.
Obviously, you can dump the entire response, but that's just a proof of concept.
Try this, some parts are a bit tricky but it manages to give you the tree:
import requests as r
import operator
import string
link = 'https://meshb.nlm.nih.gov/api/tree/children/{}'
all_data = []
for i in string.ascii_uppercase:
all_data.append({'RecordName': i, 'RecordUI': '', 'TreeNumber': i, 'HasChildren': True})
res = r.get(link.format(i))
data_json = res.json()
all_data += data_json
# This request will get all the rest of the data at once, other than A-Z or A..-Z..
# This request takes time to load, depending on your network, it got like 3 million+ characters
res = r.get(link.format('.*'))
data_json = res.json()
all_data += data_json
# Sorting the data depending on the TreeNumber
all_data.sort(key=operator.itemgetter('TreeNumber'))
# Printing the tree using tabulations
for row in all_data:
l = len(row['TreeNumber'])
if l == 3:
print('\t', end='')
elif l > 3:
print('\t'*(len(row['TreeNumber'].split('.'))+1), end='')
print(row['RecordName'])

How to get python to recognize missing value when converting list to dictionary?

Working in Python and I've tried a number of different variations but this is my latest. I'm trying to convert "user" list to a dictionary that looks like this:
{
"Grae Drake": 98110,
"Bethany Kok": None,
"Alex Nussbacher": 94101,
"Darrell Silver": 11201,
}
It would show the user's name and zip code, but one user is missing a zip code so I want it to show 'None' where the zip code is missing. Converting isn't the issue, but I'm trying to make it more dynamic in that it will recognize the missing zip code and input 'None' instead.
users = [["Grae Drake", 98110], ["Bethany Kok"], ["Alex Nussbacher", 94101], ["Darrell Silver", 11201]]
def user_contacts():
for name, z in users:
[None if z is None else z for z in users]
user_dict = dict(users)
return user_dict
One possible solution:
users = [["Grae Drake", 98110], ["Bethany Kok"], ["Alex Nussbacher", 94101], ["Darrell Silver", 11201]]
d = dict((u + [None])[:2] for u in users)
print(d)
Prints:
{'Grae Drake': 98110, 'Bethany Kok': None, 'Alex Nussbacher': 94101, 'Darrell Silver': 11201}

Receiving Error: NameError: name 'size_cost' is not define

I am having a bit of issues running my code. I keep getting a NameError: name 'size_cost' is not define.
A little background about my code/what I am trying to do. I am trying to create a program that runs through the terminal on VSCODE that has the user enter their pizza size they want and then return a price that the size is associated with in the dictionary called 'size_cost'.
I bleieve my issue is with the location of the dictionary(size_cost) or with my class/functions I am trying to create.
Here is the code I am running:
class PizzaOrderingSys:
size_cost = {
'small': 9.75,
'large': 12.23,
'extra large': 13.80,
'party size': 26.50
}
pizza_size_order = []
available_toppings = ['Anchovies', 'Artichoke Hearts', 'Bacon', 'Basil (Fresh)', 'Bell Peppers', 'Black Olives', 'Chicken', 'Extra Cheese',
'Green Chiles', 'Green Olives','Pepperoni', 'Ground Beef', 'Jalapenos', 'Mushrooms']
customer_requested_toppings = []
number_of_toppings = 0
def __init__(self, size, toppings):
self.size = size
self.toppings= toppings
def shop_title():
print("Hello and thank you for choosing The Pizza Pie Place! \nBegin your order by telling us what size of pizza you would like.")
print("After you have chosen your pizza size you will pick your toppings")
return None
def size_order():
print('\n12 inch - Small ($9.75 + Tax), 14 inch- Large ($12.30 + Tax), 16 inch- Extra Large ($13.80 + Tax), 24 inch Party Pizza($26.50 + Tax)')
print('\tWhat size pizza do you want?')
user_size = input('')
print(f'Your {user_size} pizza will cost ${size_cost[user_size] }')
list_pizza = size_cost[user_size]
pizza_size_order.append(list_pizza)
order_1 = shop_title()
order_1= size_order()
So what I am asking is; Why do I keep getting this error message? Is it because of where my dictionary is located? or am I having issues with my class/functions and if so what is wrong with them?
I am fairly new to the coding world so thought I would start working with some fundamental elements of python.
ANY advice would be greatly appreciated! Thank you!
As ForceBru pointed out, size_cost does not exist in the function as it is a class variable. However, what I fail to understand is how you are initialising your class? Your second function (are they inside the class? Unclear to see as there are no indentations) asks for a size from the user, but that size is passed on when you initialise the class? To answer your question: yes, you can put the dictionary inside the function. However, if other class methods need to be able to access it, I think you'd be better of having it as a class variable, which is accessible through a class instance.
As a minimal example:
class PizzaOrderingSys:
size_cost = {
'small': 9.75,
'large': 12.23,
'extra large': 13.80,
'party size': 26.50
}
pizza_size_order = []
available_toppings = ['Anchovies', 'Artichoke Hearts', 'Bacon', 'Basil (Fresh)', 'Bell Peppers', 'Black Olives', 'Chicken', 'Extra Cheese',
'Green Chiles', 'Green Olives','Pepperoni', 'Ground Beef', 'Jalapenos', 'Mushrooms']
customer_requested_toppings = []
number_of_toppings = 0
def __init__(self):
pass
def my_function(self, string):
self.cost = self.size_cost[string]
if __name__ == '__main__':
order_1 = PizzaOrderingSys()
order_1.my_function('small')
print(order_1.cost)

list index out of range but it seems impossible since it's only after 3 questions

kanji = ['上','下','大','工','八','入','山','口','九','一','人','力','川','七','十','三','二','女',]
reading = ['じょう','か','たい','こう','はち','にゅう','さん','こう','く','いち','にん','りょく','かわ','しち','じゅう','さん','に','じょ']
definition = ['above','below','big','construction','eight','enter','mountain','mouth','nine','one','person','power','river','seven','ten','three','two','woman']
score = number_of_questions = kanji_item = 0
def question_format(prompt_type,lang,solution_selection):
global reading,definition,score,num_of_questions,kanji_item
question_prompt = 'What is the '+str(prompt_type)+' for "'+str(kanji[kanji_item])+'"? (Keyboard:'+str(lang)+')\n'
solution_selection = [reading,definition]
usr = input(question_prompt)
if usr in solution_selection[kanji_item] and kanji[kanji_item]:
score += 1
num_of_questions += 1
else:
pass
kanji_item += 1
while number_of_questions != 18:
question_format('READING','Japanese',[0])
print('You got ',score,'/',number_of_questions)
while number_of_questions != 36:
question_format('DEFINITION','English',[1])
print('You got ',score,'/',number_of_questions)
I can't get past 大. but I can't see where it's messing up. I've tried to change pretty much everything. "kanji_item" is supposed to give a common index number so that the answers can match up. It gets through the first two problems with no hassle, but for some reason refuses to accept my third problem.
Problems:
- wrong name using number_of_questions vs. num_of_questions
- wrong way to check truthyness if usr in solution_selection[kanji_item] and kanji[kanji_item]: - the last part is always True as it is a non empty string
- lots of globals wich is not considered very good style
It would be easier to zip your three list together so you get tuples of (kanji, reading, description) and feed 2 of those into your function depending on what you want to test. You do this 2 times, once for reading, once for description.
You can even randomize your list of tuples to get different "orders" in which questions are asked:
kanji = ['上', '下', '大', '工', '八', '入', '山', '口', '九', '一' , '人',
'力', '川', '七', '十', '三', '二', '女',]
reading = ['じょう', 'か', 'たい', 'こう', 'はち', 'にゅう', 'さん', 'こう', 'く',
'いち', 'にん', 'りょく', 'かわ', 'しち', 'じゅう', 'さん', 'に', 'じょ']
definition = ['above', 'below', 'big', 'construction', 'eight', 'enter', 'mountain',
'mouth', 'nine', 'one', 'person', 'power', 'river', 'seven', 'ten', 'three',
'two', 'woman']
import random
data = list(zip(kanji, reading, definition))
random.shuffle(data)
def question_format(prompt_type, lang, kanji, solution):
"""Creates a question about *kanji* - the correct answer is *solution*
Returns 1 if correct else 0."""
question_prompt = f'What is the {prompt_type} for {kanji}? (Keyboard: {lang})'
usr = input(question_prompt)
if usr == solution:
return 1
else:
return 0
questions_asked = 0
correct = 0
for (kanji, reading, _) in data:
correct += question_format('READING','Japanese', kanji, reading)
questions_asked += 1
print('You got ',correct,'/',questions_asked)
for (kanji, _, definition) in data:
correct += question_format('DEFINITION','ENGLISH', kanji, definition)
questions_asked += 1
print('You got ',correct,'/',questions_asked)
After zipping our list and shuffling them data looks like
[('山', 'さん', 'mountain'), ('女', 'じょ', 'woman'), ('力', 'りょく', 'power'),
('上', 'じょう', 'above'), ('九', 'く', 'nine'), ('川', 'かわ', 'river'),
('入', 'にゅう', 'enter'), ('三', 'さん', 'three'), ('口', 'こう', 'mouth'),
('二', 'に', 'two'), ('人', 'にん', 'person'), ('七', 'しち', 'seven'),
('一', 'いち', 'one'), ('工', 'こう', 'construction'), ('下', 'か', 'below'),
('八', 'はち', 'eight'), ('十', 'じゅう', 'ten'), ('大', 'たい', 'big')]

Resources