How to customize width table in python-docx - python-3.x

I want to make a table in Document Word using python-docx but the width of the table is always max to the ruler. How can I customize this?
My code:
def table_columns(text, my_rows):
row = table.rows[0].cells
paragraph = row[my_rows].add_paragraph()
get_paragraph = paragraph.add_run(text)
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
get_paragraph.bold = True
font = get_paragraph.font
font.size= Pt(10)
table = doc.add_table(rows = 1, cols = 5, style = 'Table Grid')
columns_width = {
0: 2,
1: 35,
2: 35,
3: 42,
4: 170
}
for column_idx in range(len(table.columns)):
table.cell(0, column_idx).width = Cm(columns_width[column_idx])
for rows_idx in range(len(table.rows)):
table.rows[rows_idx].height = Cm(1.25)
columns_names = {
0: 'NO',
1: 'VALUE1',
2: 'VALUE2',
3: 'VALUE3',
4: 'VALUE4'
}
for column_idx in range(len(table.columns)):
table_columns(columns_names[column_idx], column_idx)
I also change the columns_width but give the same result.
Here are the results and what I want to make to:
Thanks for your help.

Cell width is what matters here. You are using:
columns_width = {
0: 2,
1: 35,
2: 35,
3: 42,
4: 170
}
table.cell(0, column_idx).width = Cm(columns_width[column_idx])
to set the cell widths, which is fine, but you are using large Cm() (centimeter) lengths to do it. For example, 170 cm is 1.7 meters.
If you use Pt() instead or possibly Mm() I think you'll get better results.

Related

Python, print colored duplicates found in 2 list

I'm 63 and just started with Python (My first steps with Udemy).
I'm Croatian so this is croatian language in program but you will understand when you run a program. I know it can be cleaner, shorter, more elegant etc, but as I mentioned before - I'm beginner.
import random
jedan = random.sample(range(1,99),15)
dva = random.sample(range(1,99),15)
def raspaljot(jedan, dva, i):
for x in jedan:
for y in dva:
if y == x:
index1 = jedan.index(x)
index1_str = str(index1)
index2 = dva.index(y)
index2_str = str(index2)
i += 1
x = str(x)
print(" Broj \033[31m" + x + "\033[0m,je dupli i nalazi se u listi jedan: na poziciji: \033[34m"
+ index1_str + "\033[0m a u listi dva na poziciji: \033[35m"+ index2_str + "\033[0m")
print()
print(jedan)
print(dva)
if i != 0:
print("\n *** Ukupno ima ", i, 'duplih brojeva. ***')
elif i == 0:
print("Nema duplih brojeva. :) ")
i = 0
raspaljot(jedan, dva,i)
What program do is finding duplicates in 2 random lists, them print duplicates in color and detecting position inside list[1] and list[2].
What I trying to do is printing list1 and list2 but showing duplicates in color.
For example:
[14, 78, 85, 31, 5, 54, 13, 46, 83, 4, 35, 41, 52, 51, 32]
[72, 40, 67, 85, 54, 76, 77, 39, 51, 36, 91, 70, 71, 38, 55]
here we have 3 duplicates (85,54,51). This above example on the console End was printed in white color, but I wanna these 3 numbers in red color in those two lines above.
Is this possible? I couldn't find a solution.
PS. Wing Pro version 7 on Fedora 33 Workstation / In WIngIde colors are only displayed in an external console and not the Debug I/O tool. :)
Simple solution would be something like this:
# Change list to string
jedan_str = str(jedan)
# Create set with numbers that need new color
num_set = {"85", "54", "51"}
# Iterate over every number and wrap it with color change
for i in num_set:
# Note that I used f-string to format string
# But you can also do this as "\033[31m" + i + "\033[0m"
jedan_str = jedan_str.replace("85", f"\033[31m{i}\033[0m")
# Print string that represent list
print(jedan_str)
Following the idea of using a set to determine which elements are in both lists (as Cv4niak proposed in his answer), I created a function to print the output as you desire. There are numerous other ways of achieving it, but I think this is a simple yet effective way.
The idea is to use the cprint() function from the termcolor package. You can install it with pip install termcolor, and then print normally all elements, except the ones that are duplicates, which will be printed using cprint(item, "red").
The "{:0>2d}" formatting in each ìtem print serves only to pad the number with zeros (so 2 will be printed as 02, for example), in order for the output of both lists to be aligned.
import random
from termcolor import cprint
def mark_duplicates(first, second):
duplicates = list(set(first).intersection(second))
if duplicates:
for list_ in [first, second]:
print("[", end="")
for item in list_:
if item in duplicates:
cprint("{:0>2d}".format(item), "red", end=",")
else:
print("{:0>2d}".format(item), end=",")
print("\b]")
else:
print("No duplicates.")
jedan = random.sample(range(1, 99), 15)
dva = random.sample(range(1, 99), 15)
mark_duplicates(jedan, dva)
With this, if there are no duplicates, the No duplicates. string will be printed. Also you can change the color with not much effort, and use other nice functionalities from termcolor package.

Is there a way to convert a list of integers to a single variable?

I'd like to convert a list of integers to a singe variable.
I tried this (found on another question):
r = len(message) -1
res = 0
for n in message:
res += n * 10 ** r
r -= 1
This does not work for me at all.
I basically need this:
message = [17, 71, 34, 83, 81]
(This can vary in length as I use a variable to change each one)
To convert into this:
new_message = 1771348381
A combination of join, map and str will do.
message = [17, 71, 34, 83, 81]
new_message = int(''.join(map(str, message)))
# 1771348381

Dynamically updating a nested dictionary with multiprocessing.pool (speed issue)

I have written a simple code to understand how lack of communication between the child processes leads to a random result when using multiprocessing.Pool. I input a nested dictionary as a dictproxy object made by multiprocessing.Manager:
manager = Manager()
my_dict = manager.dict()
my_dict['nested'] = nested
into a pool embedding 16 open processes. The nested dictionary is defined below. The function my_function simply generates the second power of each number stored in the elements of the nested dictionary.
As expected because of the shared memory in multithreading, I get the correct result when I use multiprocessing.dummy
{0: 1, 1: 4, 2: 9, 3: 16}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 9, 1: 16, 2: 25, 3: 36}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}
but when I use multiprocessing, the result is incorrect and completely random in each run. One example of the incorrect result is:
{0: 1, 1: 2, 2: 3, 3: 4}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 3, 1: 4, 2: 5, 3: 6}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}
In this particular run, the 'data' in 'element' 1 and 3 was not updated. I understand that this happens due to the lack of communication between the child processes which prohibits the "updated" nested dictionary in each child process to be properly sent to the others. However, can someone help me use Manager.Queue to organize this inter-child communication and get the correct results possibly with minimal runtime?
Code (Python 3.5)
from multiprocessing import Pool, Manager
import numpy as np
def my_function(A):
arg1 = A[0]
my_dict = A[1]
temporary_dict = my_dict['nested']
for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])):
temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2
my_dict['nested'] = temporary_dict
if __name__ == '__main__':
# nested dictionary definition
strs1 = {}
strs2 = {}
strs3 = {}
strs4 = {}
strs5 = {}
strs1['data'] = {}
strs2['data'] = {}
strs3['data'] = {}
strs4['data'] = {}
strs5['data'] = {}
for i in [0,1,2,3]:
strs1['data'][i] = i + 1
strs2['data'][i] = i + 2
strs3['data'][i] = i + 3
strs4['data'][i] = i + 4
strs5['data'][i] = i + 5
nested = {}
nested['elements'] = [strs1, strs2, strs3, strs4, strs5]
nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5']
# parallel processing
pool = Pool(processes = 16)
manager = Manager()
my_dict = manager.dict()
my_dict['nested'] = nested
sequence = np.arange(len(my_dict['nested']['elements']))
pool.map(my_function, ([seq,my_dict] for seq in sequence))
pool.close()
pool.join()
# printing the data in all elements of the nested dictionary
print(my_dict['nested']['elements'][0]['data'])
print(my_dict['nested']['elements'][1]['data'])
print(my_dict['nested']['elements'][2]['data'])
print(my_dict['nested']['elements'][3]['data'])
print(my_dict['nested']['elements'][4]['data'])
One way to go around this and get correct results would be using multiprocessing.Lock, but that kills the speed:
from multiprocessing import Pool, Manager, Lock
import numpy as np
def init(l):
global lock
lock = l
def my_function(A):
arg1 = A[0]
my_dict = A[1]
with lock:
temporary_dict = my_dict['nested']
for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])):
temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2
my_dict['nested'] = temporary_dict
if __name__ == '__main__':
# nested dictionary definition
strs1 = {}
strs2 = {}
strs3 = {}
strs4 = {}
strs5 = {}
strs1['data'] = {}
strs2['data'] = {}
strs3['data'] = {}
strs4['data'] = {}
strs5['data'] = {}
for i in [0,1,2,3]:
strs1['data'][i] = i + 1
strs2['data'][i] = i + 2
strs3['data'][i] = i + 3
strs4['data'][i] = i + 4
strs5['data'][i] = i + 5
nested = {}
nested['elements'] = [strs1, strs2, strs3, strs4, strs5]
nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5']
# parallel processing
manager = Manager()
l = Lock()
my_dict = manager.dict()
my_dict['nested'] = nested
pool = Pool(processes = 16, initializer=init, initargs=(l,))
sequence = np.arange(len(my_dict['nested']['elements']))
pool.map(my_function, ([seq,my_dict] for seq in sequence))
pool.close()
pool.join()
# printing the data in all elements of the nested dictionary
print(my_dict['nested']['elements'][0]['data'])
print(my_dict['nested']['elements'][1]['data'])
print(my_dict['nested']['elements'][2]['data'])
print(my_dict['nested']['elements'][3]['data'])
print(my_dict['nested']['elements'][4]['data'])

Two sample dependent T test in python and rank sum test

I have a data set for which has two labels, label 1 = 0(case), label 2 =1(control). I have already calculated the mean for the two different labels. Furthermore, I need to calculate two sample t test(dependent) and two sample rank sum test. My data set looks like :
SRA ID ERR169499 ERR169500 ERR169501 mean_ctrl mean_case
Label 1 0 1
TaxID PRJEB3251_ERR169499 PRJEB3251_ERR169500 PRJEB3251_ERR169501
333046 0.05 0 0.4
1049 0.03 0.9 0
337090 0.01 0.6 0.7
I am new to statistics.The code I have so far is this:
label = []
data = {}
x = open('final_out_transposed.csv','rt')
for r in x:
datas = r.split(',')
if datas[0] == ' Label':
label.append(r.split(",")[1:])
label = label[0]
label[-1] = label[-1].replace('\n','')
counter = len(label)
for row in file1:
content = row.split(',')
if content[0]=='SRA ID' or content[0]== 'TaxID' or content[0]==' Label':
pass
else:
dt = row.split(',')
dt[-1] = dt[-1].replace('\n','')
data[dt[0]]=dt[1:]
keys = list(data)
sum_file = open('sum.csv','w')
for key in keys:
sum_case = 0
sum_ctrl = 0
count_case = 0
count_ctrl = 0
mean_case = 0
mean_ctrl = 0
print(len(label))
for i in range(counter):
print(i)
if label[i] == '0' or label[i] == 0:
sum_case=np.float64(sum_case)+np.float64(data[key][i])
count_case = count_case+1
mean_case = sum_case/count_case
else:
sum_ctrl = np.float64(sum_ctrl)+np.float64(data[key][i])
count_ctrl = count_ctrl+1
mean_ctrl = sum_ctrl/count_ctrl
Any help will be highly appreciated.
Instead of using open to read your csv file, I would use Pandas. That will place it in a dataframe that will be easier to use
import pandas as pd
data_frame = pd.read_csv('final_out_transposed.csv')
For a Two Sample dependent T-test you want to use ttest_rel
notice ttest_ind is for independent groups. Since you specifically asked for dependent groups, use ttest_rel.
It's hard from your example above to see where your two columns of sample data are, but imagine I had the following made up data of 'case' and 'control'. I could calculate a dependent Two Sample t-test using pandas as shown below:
import pandas as pd
from scipy.stats import ttest_rel
data_frame = pd.DataFrame({
'case':[55, 43, 51, 62, 35, 48, 58, 45, 48, 54, 56, 32],
'control':[48, 38, 53, 58, 36, 42, 55, 40, 49, 50, 58, 25]})
(t_stat, p) = ttest_rel(data_frame['control'], data_frame['case'])
print (t_stat)
print (p)
p would be the p-value, t_stat would be the t-statistic. You can read more about this in the documentation
In a similar manner, once you have your sample .csv data in a dataframe, you can perform a rank sum test:
from scipy.stats import ranksums
(t_stat, p) = ranksums(data_frame['control'], data_frame['case'])
documentation for ranksums

How to do a threshold line on a bar chart using plotly

Currently written this code that produces a bar chart but would like to add a threshold line. Could anyone help me please?
def make_bar_chart(data):
"""Takes a list of dicts with a time and price"""
# Times
chart_x = []
# Prices
chart_y = []
# Create the relevant arrays
for item in data:
chart_x.append(item["time"])
chart_y.append(item["price"])
# Make the chart
the_graph = Bar(x = chart_x, y = chart_y , name = "Stocks")
graph_data = Data([the_graph])
the_layout = Layout(title = "Stocks", xaxis = dict(title = "Time"), yaxis = dict(title = "Price"))
the_figure = Figure(data = graph_data, layout = the_layout)
plotly.offline.plot(the_figure, filename = "stocks.html")
Try something like this. In plotly it seems that lines are provided via shapes.
the_layout = Layout(title = "Stocks",
xaxis = dict(title = "Time"),
yaxis = dict(title = "Price"),
shapes=[
{
'type': 'line',
'xref': 'paper',
'x0': 0,
'y0': 100, # use absolute value or variable here
'x1': 1,
'y1': 100, # ditto
'line': {
'color': 'rgb(50, 171, 96)',
'width': 1,
'dash': 'dash',
},
},
],
)
I haven't tested this as you haven't provided sample data. Well done for supplying code on your first question, but on Stack Overflow it's best to provide a completely self-contained example that people can copy and run 'as is.'

Resources