extending list after for loop append fails - python-3.x

I am extracting data from pdfs into lists
list1=[]
for page in pages:
for lobj in element:
if isinstance(lobj, LTTextBox):
x, y, text = lobj.bbox[0], lobj.bbox[3], lobj.get_text()
if isinstance(lobj, LTTextContainer):
for text_line in lobj:
for character in text_line:
if isinstance(character, LTChar):
Font_size = character.size
list1.append([Font_size,(lobj.get_text())])
if isinstance(lobj, LTTextContainer):
for text_line in lobj:
for character in text_line:
if isinstance(character, LTChar):
font_name = character.fontname
list1.append(font_name)
print(list1)
gives me a list of lists that has the font_name not within each of the list with size and text.
list = [[12.0, 'aaa'], 'IJEAMP+Times-Bold', [12.0, 'bbb'], 'IJEAOO+Times-Roman', [12.0, 'ccc'], 'IJEAMP+Times-Bold', [10.0, 'ddd'], 'IJEAOO+Times-Roman', [10.0, 'eee'], 'IJEAOO+Times-Roman', [8.0, '2\n'], 'IJEAOO+Times-Roman', 'IJEAOO+Times-Roman']
How the list of lists should look like
list = [[12.0, 'aaa', 'IJEAMP+Times-Bold'], [12.0, 'bbb', 'IJEAOO+Times-Roman'], [12.0, 'ccc', 'IJEAMP+Times-Bold'], [10.0, 'ddd', 'IJEAOO+Times-Roman'], [10.0, 'eee', 'IJEAOO+Times-Roman'], [8.0, '2\n', 'IJEAOO+Times-Roman'], 'IJEAOO+Times-Roman']
If possible, i would like to ask for an answer to my problem that fixes my error in the code. I believe it is possible so that i dont need to create two lists and zip them afterwards.
I tried list2.extend([list1, font_name]) but that doesent do it as the font_name keeps getting split into individual letters

You are appending to the outer list, not the list you just added into it.
This adds your inner list:
list1.append([Font_size,(lobj.get_text())])
if you want to extend that added list, you can do so by using
list1[-1].append(font_name)
instead of
list1.append(font_name)

Related

dictionary with nested list values to single list of values

I am trying to make single list of values in dictionary for each key. Bellow is problem. I am trying to parse list in Jinja and looks better to convert them into one before taking to template.
Problem:
{'EFTPOS': [[10.0, 5.0], 15.0], 'StoreDeposit': [[5.0, 6.0], 11.0]}
Result:
{'EFTPOS': [10.0, 5.0, 15.0], 'StoreDeposit': [5.0, 6.0, 11.0]}
Please try this code snippet.I have defined a method to remove the nested list and convert it into a flat list.
output = []
def rmNest(ls):
for i in ls:
if type(i) == list:
rmNest(i)
else:
output.append(i)
return output
a_dict = {'EFTPOS': [[10.0, 5.0], 15.0], 'StoreDeposit': [[5.0, 6.0], 11.0]}
new_dict = {}
for i in a_dict:
new_dict[i] = rmNest(a_dict[i])
output = []

comparing the second element in a list of tuples

Learning python here.
In this exercise the input is of the form [('Alice', 'R'), ('Bob', 'B'), ('Claire', 'R'), ('Dave', 'R'), ('Elsa', 'B')] whereby each element represents a person and the colour hat they wear (red or blue).
I need to compare the colours of the hats. How do I do that? Is there a way to slice the list such that I compare one person's hat to the other without losing track of their order and who wears what?
OK: Here's an example:
people = [('Alice', 'R'), ('Bob', 'B'),
('Claire', 'R'), ('Dave', 'R'), ('Elsa', 'B')]
redhats = [item for item in people if item[1] == 'R']
# printing output
print(redhats)

Out of order dictionary keys after merging in Python

After merging two dictionaries, the subkeys form the resulted dictionary are disorderly. Subkeys are months ['jan', 'feb', 'mar', ... , 'dec']. In some cases, an original dictionary might not contain a subkey (month) so the output gets disoredered with the merge.
So I have two dictionaries both with the next structure
{Model:{'Jan':[1], 'Feb':[2], Jun: [5], ...}
As you can see in this example, some subkeys (months) are not represented, so they can't be found it the original dicts. But, what I need is the merged dict to keep the monthly order, doesn't matter how original dicts were looking like.
The merging function:
def merge_dicts(dict1, dict2):
'''Marge two dicts by adding up (accumulating) values in each key.
Returns: A merge (addition) of two dictionaries by adding values of same keys
'''
# Merge dictionaries and add values of same keys
out = {**dict1, **dict2}
for key, value in out.items():
if key in dict1 and key in dict2:
out[key] = [value, dict1[key]]
#Things got harder, out[key] appends in list of list of list... no itertools can help here.
lst = [] #the one dimensional list to fix the problem of list of list with out[key]
for el in out[key]:
try:
#if inside out[key] there is a list of list we split it
for sub_el in el:
lst.append(sub_el)
except:
#if inside out[key] there is only a single float
lst.append(el)
#Replace the old key with the one dimensional list
out[key] = lst
return out
How I merge it:
for c in range(len([*CMs_selection.keys()])):
if c == 0:
#First merge, with dict0 & dict1
merged_dict = {cm:merge_dicts(CMs_selection[[*CMs_selection.keys()][c]][cm], CMs_selection[[*CMs_selection.keys()][c + 1]][cm])
for cm in CMs_selection[[*CMs_selection.keys()][0]]}
elif c > 0 and c < (len(years) - 1):
#Second merge to n merge, starting with dict_merged and dict 2
merged_dict = {cm:merge_dicts(merged_dict[cm], CMs_selection[[*CMs_selection.keys()][c + 1]][cm])
for cm in CMs_selection[[*CMs_selection.keys()][0]]}
Right now, after trying all the merging possible I am getting this results always.
{'Model1': {'Jan': [-0.0952586755156517,
0.1015196293592453,
-0.10572463274002075],
'Oct': [-0.02473766915500164,
0.0678798109292984,
0.08870666474103928,
-0.06378963589668274],
'Nov': [-0.08730728179216385,
0.013518977910280228,
0.023245899006724358,
-0.03917887806892395],
'Jul': [-0.07940272241830826, -0.04912888631224632, -0.07454635202884674],
'Dec': [-0.061335086822509766, -0.0033914903178811073, 0.09630533307790756],
'Mar': [0.029064208269119263, 0.11327305436134338, 0.009556809440255165],
'Apr': [-0.04433680325746536, -0.08620205521583557],
'Jun': [-0.036688946187496185, 0.05543896555900574, -0.07162825018167496],
'Aug': -0.03712410107254982,
'Sep': [0.007421047426760197, 0.008665643632411957],
'Feb': [-0.02879650704562664, 0.013025006279349327]},
'Model2': {'Feb': -0.05173473060131073,
'Jun': [-0.09126871824264526,
-0.09009774029254913,
0.10458160936832428,
-0.09445420652627945,
-0.04294373467564583],
'Aug': [-0.07917020469903946, 0.011026041582226753],
'Oct': [-0.10164830088615417, ....
....
With disorderly months. Please help me!
If we just focus on merging dictionaries, first, we need to define the normal order of months, then make the merging in that order because Python doesn't know this order. It cannot add "Mar" between "Feb" and "Apr" if it doesn't exist at the first dictionary. So, we need to define the order ourself.
Also, you need two different solution for merging float values and merging lists. I added mode parameter to my solution.
def merge_dicts(list_of_dicts, mode):
keys = set(key for d in list_of_dicts for key in d.keys())
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
ordered_keys = [month for month in months if month in keys]
out = {}
for key in ordered_keys:
out[key] = []
for d in list_of_dicts:
if key in d:
if mode == "append":
out[key].append(d[key])
elif mode == "extend":
out[key].extend(d[key])
return out
CMs_selection = {2006: {'Model1': {'Jan': -0.1, 'Oct': -0.063}, 'Model2': {'Feb': -0.051, 'Jun': -0.04, 'Oct': 0.07}, 'Model3': {'Mar': -0.030, 'Jun': 0.02, 'Aug': 0.0561,}, 'Model4': {'Feb': -0.026, 'Dec': -0.06}}, 2007: {'Model1': {'Jul': -0.07, 'Oct': 0.8, 'Nov': 0.38, 'Dec': 0.1}, 'Model2': {'Jun': -0.09, 'Aug': -0.079, 'Sep': -0.7}}}
for key in CMs_selection:
CMs_selection[key] = merge_dicts(CMs_selection[key].values(), "append")
print(CMs_selection)
result = merge_dicts(CMs_selection.values(), "extend")
print(result)
Output:
{2006: {'Jan': [-0.1], 'Feb': [-0.051, -0.026], 'Mar': [-0.03], 'Jun': [-0.04, 0.02], 'Aug': [0.0561], 'Oct': [-0.063, 0.07], 'Dec': [-0.06]}, 2007: {'Jun': [-0.09], 'Jul': [-0.07], 'Aug': [-0.079], 'Sep': [-0.7], 'Oct': [0.8], 'Nov': [0.38], 'Dec': [0.1]}}
{'Jan': [-0.1], 'Feb': [-0.051, -0.026], 'Mar': [-0.03], 'Jun': [-0.04, 0.02, -0.09], 'Jul': [-0.07], 'Aug': [0.0561, -0.079], 'Sep': [-0.7], 'Oct': [-0.063, 0.07, 0.8], 'Nov': [0.38], 'Dec': [-0.06, 0.1]}

Pandas apply, map and iterrows behaving strangely

I am trying to prune texts in a list based on texts in another list. The following function works fine when called directly on two lists
def remove_texts(texts, texts2):
to_remove = []
for i in texts2:
if i in texts:
to_remove.append(i)
texts = [j for j in texts if j not in to_remove]
return texts
However, the following does nothing and I get no errors
df_other.texts = df_other.texts.map(lambda x: remove_texts(x, df_other.to_remove_split))
Nor does the following. Again no error is returned
for i, row in df_other.iterrows():
row['texts'] = remove_texts(row['texts'], row['to_remove_split'])
Any thoughts appreciated.
You actually want to find the set difference between texts
and texts2. Assume that they contain:
texts = [ 'AAA', 'BBB', 'DDD', 'EEE', 'FFF', 'GGG', 'HHH' ]
texts2 = [ 'CCC', 'EEE' ]
Then, the shortes solution is to compute just the set difference,
without using Pandas:
set(texts).difference(texts2)
gives:
{'AAA', 'BBB', 'DDD', 'FFF', 'GGG', 'HHH'}
Or if you want just a list (not set), write:
sorted(set(texts).difference(texts2))
And if for some reason you want to use Pandas, then start from
creting of both DataFrames:
df = pd.DataFrame(texts, columns=['texts'])
df2 = pd.DataFrame(texts2, columns=['texts'])
Then you can compute the set difference as:
df.query('texts not in #df2.texts')
or
df.texts[~df.texts.isin(df2.texts)]

convert a list of lists to a list of string

I have a list of lists like this
list1 = [['I am a student'], ['I come from China'], ['I study computer science']]
len(list1) = 3
Now I would like to convert it into a list of string like this
list2 = ['I', 'am', 'a', 'student','I', 'come', 'from', 'China', 'I','study','computer','science']
len(list2) = 12
I am aware that I could conversion in this way
new_list = [','.join(x) for x in list1]
But it returns
['I,am,a,student','I,come,from,China','I,study,computer,science']
len(new_list) = 3
I also tried this
new_list = [''.join(x for x in list1)]
but it gives the following error
TypeError: sequence item 0: expected str instance, list found
How can I extract each word in the sublist of list1 and convert it into a list of string? I'm using python 3 in windows 7.
Following your edit, I think the most transparent approach is now the one that was adopted by another answer (an answer which has since been deleted, I think). I've added some whitespace to make it easier to understand what's going on:
list1 = [['I am a student'], ['I come from China'], ['I study computer science']]
list2 = [
word
for sublist in list1
for sentence in sublist
for word in sentence.split()
]
print(list2)
Prints:
['I', 'am', 'a', 'student', 'I', 'come', 'from', 'China', 'I', 'study', 'computer', 'science']
Given a list of lists where each sublist contain strings this could be solved using jez's strategy like:
list2 = ' '.join([' '.join(strings) for strings in list1]).split()
Where the list comprehension transforms list1 to a list of strings:
>>> [' '.join(strings) for strings in list1]
['I am a student', 'I come from China', 'I study computer science']
The join will then create a string from the strings and split will create a list split on spaces.
If the sublists only contain single strings, you could simplify the list comprehension:
list2 = ' '.join([l[0] for l in list1]).split()

Resources