Multiple value-formats in a formatted string - python-3.x

Is it possible to combine more than one colon-format-string?
Example:
val = 2.123
print(f'This is a float-value with one digit: {val:.1f} and balanced to right with {val:>10}')
so, something like {val:.1f:>10}?

Refer to Format Specification Mini Language . You were just off with the order, f'{val:>10.1f}' works for this specific example.
> is align, 10 is width, .1 is precision and f is type.

Related

Changing printing format for fractions using sympy and PythonTex

Below is a minimal working problem, of what I am working on.
The file is a standard LaTeX file using sympy within pythontex, where I want to change
how sympy displays fractions.
Concretely I would like to make the following changes, but have been struggling:
How can I make sympy display the full, and not inline version of it's fractions for some of its fractions? In particular I would like the last fraction 1/5 to instead be displayed in full.
eg. \fraction{1}{5}
In the expression for the derivative, I have simplified the results, but I struggle to substitute the variable x with the fraction a/b. Whenever I substitute this expression into the fraction it fully simplifies the expression, which is not what I want. I just want to replace x with the fraction a/b (in this case 2/3 or 1/3 depending on the seed).
Below I have attached two images displaying what my code produces, and what I would like it to display. Do note that this is also stated in the two bullets above
Current output
Desired output
Code
\documentclass{article}
\usepackage{pythontex}
\usepackage{mathtools,amssymb}
\usepackage{amsmath}
\usepackage{enumitem}
\begin{document}
\begin{pycode}
import math
from sympy import *
from random import randint, seed
seed(2021)
\end{pycode}
\paragraph{Oppgave 3}
\begin{pycode}
a, b = randint(1,2), 3
ab = Rational(a,b)
pressure_num = lambda x: 1-x
pressure_denom = lambda x: 1+x
def pressure(x):
return (1-x)/(1+x)
pressure_ab = Rational(pressure_num(ab),pressure_denom(ab))
x, y, z = symbols('x y z')
pressure_derivative = simplify(diff(pressure(x), x))
pressure_derivative_ab = pressure_derivative.xreplace({ x : Rational(a,b)})
\end{pycode}
The partial pressure of some reaction is given as
%
\begin{pycode}
print(r"\begin{align*}")
print(r"\rho(\zeta)")
print(r"=")
print(latex(pressure(Symbol('\zeta'))))
print(r"\qquad \text{for} \ 0 \leq \zeta \leq 1.")
print(r"\end{align*}")
\end{pycode}
%
\begin{enumerate}[label=\alph*)]
\item Evaluate $\rho(\py{a}/\py{b})$. Give a physical interpretation of your
answer.
\begin{equation*}
\rho(\py{a}/\py{b})
= \frac{1-(\py{ab})}{1+\py{ab}}
= \frac{\py{pressure_num(ab)}}{\py{pressure_denom(ab)}}
\cdot \frac{\py{b}}{\py{b}}
= \py{pressure_ab}
\end{equation*}
\end{enumerate}
The derivative is given as
%
\begin{pycode}
print(r"\begin{align*}")
print(r"\rho'({})".format(ab))
print(r"=")
print(latex(pressure_derivative))
print(r"=")
print(latex(simplify(pressure_derivative_ab)))
print(r"\end{align*}")
\end{pycode}
\end{document}
Whenever I substitute this expression into the fraction it fully simplifies the expression, which is not what I want. I just want to replace x with the fraction a/b (in this case 2/3 or 1/3 depending on the seed).
It's possible to do this, if we use a with expression to temporarily disable evaluation for that code block, and then we use two dummy variables in order to represent the fraction, and finally we do the substitution with numerical values.
So the following line in your code:
pressure_derivative_ab = pressure_derivative.xreplace({ x : Rational(a,b)})
can be changed to:
with evaluate(False):
a1,b1=Dummy('a'),Dummy('b')
pressure_derivative_ab = pressure_derivative.subs(x,a1/b1).subs({a1: a,b1: b})
The expressions pressure_derivative and pressure_derivative_ab after this are:
How can I make sympy display the full, and not inline version of it's fractions for some of its fractions? In particular I would like the last fraction 1/5 to instead be displayed in full. eg. \fraction{1}{5}
For this, you only need to change this line:
= \py{pressure_ab}
into this line:
= \py{latex(pressure_ab)}
Because we want pythontex to use the sympy latex printer, instead of the ascii printer.
To summarize, the changes between the original code and the modified code can be viewed here.
All the code in this post is also available in this repo.

Groovy: How to define float numbers that would generate output as 2 digit decimal points

Algo is to generate random numbers but in form of 2 decimal points like 4.78, 3.88, etc
How can we achieve this?
You can round a float value using DecimalFormat class like:
new java.text.DecimalFormat('#.##').format(yourFloat)
Demo:
Check out Apache Groovy - Why and How You Should Use It article for more information on using Groovy scripting in JMeter
If you return a BigDecimal, you can tell it how many digits it handles. E.g.:
println(new BigDecimal(0.666).setScale(2, java.math.RoundingMode.HALF_UP))
// => 0.67
Only do this at the very end of your calculations.

How to make sure strings would not overlap each other in java processing?

I'm having a problem that I need to make the words I took from an external file "NOT" overlap each other. I have over 50 words that have random text sizes and places when you run it but they overlap.
How can I make them "NOT" overlap each other? the result would probably look like a word cloud.
if you think my codes would help here they are
String [] words;
int index = 0;
void setup ()
{
size (500,500);
background (255);
String [] lines = loadStrings ("alice_just_text.txt");
String entireplay = join(lines, " "); //splits it by line
words = splitTokens (entireplay, ",.?!:-;:()03 "); //splits it by word
for (int i = 0; i < 50; i++) {
float x = random(width);
float y = random(height);
int index = int(random(words.length));
textSize (random(60)); //random font size
fill (0);
textAlign (CENTER);
text (words[index], x, y, width/2, height/2);
println(words[index]);
index++ ;
}
}
Stack Overflow isn't really designed for general "how do I do this" type questions. You'll have much better luck if you post a more specific "I tried X, expected Y, but got Z instead" type question. But I'll try to help in a general sense:
You need to break your problem down into smaller pieces and then take on those pieces one at a time.
For example, you can isolate your problem to making sure rectangles don't overlap, which you can break down even further. There are a number of ways to do that:
You could use a grid to lay out your rectangles. Figure out how many squares a line of text takes up, then find a place in your grid where that word will fit. You could use something like a 2D array of boolean values, for example.
Or you could generate a random location, and then check whether there's already a rectangle there. If so, pick a new random location until you find a clear spot.
In any case, you'll probably need to use collision detection (either point-rectangle or rectangle-rectangle) to determine whether your rectangles are overlapping.
Start small. Create a small example program that just shows two rectangles on the screen. Hardcode their positions at first, but make it so they turn red if they're colliding. Work your way up from there. Make it so you can add rectangles using the mouse, but only let the user add them if there is no overlap. Then add the random location choosing. If you get stuck on a specific step, then post a MCVE and we'll go from there. Good luck.

What is an efficient way to compute the Dice coefficient between 900,000 strings?

I have a corpus of 900,000 strings. They vary in length, but have an average character count of about 4,500. I need to find the most efficient way of computing the Dice coefficient of every string as it relates to every other string. Unfortunately, this results in the Dice coefficient algorithm being used some 810,000,000,000 times.
What is the best way to structure this program for increased efficiency? Obviously, I can prevent computing the Dice of sections A and B, and then B and A--but this only halves the work required. Should I consider taking some shortcuts or creating some sort of binary tree?
I'm using the following implementation of the Dice coefficient algorithm in Java:
public static double diceCoefficient(String s1, String s2) {
Set<String> nx = new HashSet<String>();
Set<String> ny = new HashSet<String>();
for (int i = 0; i < s1.length() - 1; i++) {
char x1 = s1.charAt(i);
char x2 = s1.charAt(i + 1);
String tmp = "" + x1 + x2;
nx.add(tmp);
}
for (int j = 0; j < s2.length() - 1; j++) {
char y1 = s2.charAt(j);
char y2 = s2.charAt(j + 1);
String tmp = "" + y1 + y2;
ny.add(tmp);
}
Set<String> intersection = new HashSet<String>(nx);
intersection.retainAll(ny);
double totcombigrams = intersection.size();
return (2 * totcombigrams) / (nx.size() + ny.size());
}
My ultimate goal is to output an ID for every section that has a Dice coefficient of greater than 0.9 with another section.
Thanks for any advice that you can provide!
Make a single pass over all the Strings, and build up a HashMap which maps each bigram to a set of the indexes of the Strings which contain that bigram. (Currently you are building the bigram set 900,000 times, redundantly, for each String.)
Then make a pass over all the sets, and build a HashMap of [index,index] pairs to common-bigram counts. (The latter Map should not contain redundant pairs of keys, like [1,2] and [2,1] -- just store one or the other.)
Both of these steps can easily be parallelized. If you need some sample code, please let me know.
NOTE one thing, though: from the 26 letters of the English alphabet, a total of 26x26 = 676 bigrams can be formed. Many of these will never or almost never be found, because they don't conform to the rules of English spelling. Since you are building up sets of bigrams for each String, and the Strings are so long, you will probably find almost the same bigrams in each String. If you were to build up lists of bigrams for each String (in other words, if the frequency of each bigram counted), it's more likely that you would actually be able to measure the degree of similarity between Strings, but then the calculation of Dice's coefficient as given in the Wikipedia article wouldn't work; you'd have to find a new formula.
I suggest you continue researching algorithms for determining similarity between Strings, try implementing a few of them, and run them on a smaller set of Strings to see how well they work.
You should come up with some kind of inequality like: D(X1,X2) > 1-p, D(X1,X3) < 1-q and p D(X2,X3) < 1-q+p . Or something like that. Now, if 1-q+p < 0.9, then probably you don't have to evaluate D(X2,X3).
PS: I am not sure about this exact inequality, but I have a gut feeling that this might be right (but I do not have enough time to actually do the derivations now). Look for some of the inequalities with other similarity measures and see if any of them are valid for Dice co-efficient.
=== Also ===
If there are a elements in set A, and if your threshold is r (=0.9), then set B should have number of elements b should be such that: r*a/(2-r) <= b <= (2-r)*a/r . This should eliminate need for lots of comparisons IMHO. You can probably sort the strings according to length and use the window describe above to limit comparisons.
Disclaimer first: This will not reduce the number of comparisons you'll have to make. But this should make a Dice comparison faster.
1) Don't build your HashSets every time you do a diceCoefficient() call! It should speed things up considerably if you just do it once for each string and keep the result around.
2) Since you only care about if a particular bigram is present in the string, you could get away with a BitSet with a bit for each possible bigram, rather than a full HashMap. Coefficient calculation would then be simplified to ANDing two bit sets and counting the number of set bits in the result.
3) Or, if you have a huge number of possible bigrams (Unicode, perhaps?) - or monotonous strings with only a handful of bigrams each - a sorted Array of bigrams might provide faster, more space-efficent comparisons.
Is their charset limited somehow? If it is, you can compute character counts by their code in each string and compare these numbers. After such pre-computation (it will occupy 2*900K*S bytes of memory [if we assume no character is found more then 65K time in the same string], where S is different character count). Then computing the coefficent would take O(S) time. Sure, this would be helpful if S<4500.

Ways to calculate similarity

I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes:
age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others.
Can anyone tell me how to go about this problem or point me to some resources?
Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved by using the general dissimilarity coefficient of Gower (Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857–874). For more check out this on page 47. If x contains any columns of these data-types, Gower's coefficient will be used as the metric.
For example
x1 <- factor(c(10, 12, 25, 14, 29))
x2 <- factor(c("oily", "dry", "dry", "dry", "oily"))
x3 <- factor(c("medium", "short", "medium", "medium", "long"))
x4 <- factor(c("active outdoor lover", "TV junky", "TV junky", "active outdoor lover", "TV junky"))
x <- cbind(x1,x2,x3,x4)
library(cluster)
daisy(x, metric = "euclidean")
you'll get :
Dissimilarities :
1 2 3 4
2 2.000000
3 3.316625 2.236068
4 2.236068 1.732051 1.414214
5 4.242641 3.741657 1.732051 2.645751
If you are interested on a method for dimensionality reduction for categorical data (also a way to arrange variables into homogeneous clusters) check this
Give each attribute an appropriate weight, and add the differences between values.
enum SkinType
Dry, Medium, Oily
enum HairLength
Bald, Short, Medium, Long
UserDifference(user1, user2)
total := 0
total += abs(user1.Age - user2.Age) * 0.1
total += abs((int)user1.Skin - (int)user2.Skin) * 0.5
total += abs((int)user1.Hair - (int)user2.Hair) * 0.8
# etc...
return total
If you really need similarity instead of difference, use 1 / UserDifference(a, b)
You probably should take a look for
Data Mining and Data Warehousing (Essential)
Machine Learning (Extra)
Artificial Neural Networks (Especially SOM)
Pattern Recognition (Related)
These topics will let you your program recognize similarities and clusters in your users collection and try to adapt to them...
You can then know different hidden common groups of related users... (i.e users with green hair usually do not like watching TV..)
As an advice, try to use ready implemented tools for this feature instead of implementing it yourself...
Take a look at Open Directory Data Mining Projects
Three steps to achieve a simple subjective metric for difference between two datapoints that might work fine in your case:
Capture all your variables in a representative numeric variable, for example: skin type (oily=-1, dry=1), hair type (long=2, short=0, medium=1),lifestyle (active outdoor lover=1, TV junky=-1), age is a number.
Scale all numeric ranges so that they fit the relative importance you give them for indicating difference. For example: An age difference of 10 years is about as different as the difference between long and medium hair, and the difference between oily and dry skin. So 10 on the age scale is as different as 1 on the hair scale is as different as 2 on the skin scale, so scale the difference in age by 0.1, that in hair by 1 and and that in skin by 0.5
Use an appropriate distance metric to combine the differences between two people on the various scales in one overal difference. The smaller this number, the more similar they are. I'd suggest simple quadratic difference as a first attempt at your distance function.
Then the difference between two people could be calculated with (I assume Person.age, .skin, .hair, etc. have already gone through step 1 and are numeric):
double Difference(Person p1, Person p2) {
double agescale=0.1;
double skinscale=0.5;
double hairscale=1;
double lifestylescale=1;
double agediff = (p1.age-p2.age)*agescale;
double skindiff = (p1.skin-p2.skin)*skinscale;
double hairdiff = (p1.hair-p2.hair)*hairscale;
double lifestylediff = (p1.lifestyle-p2.lifestyle)*lifestylescale;
double diff = sqrt(agediff^2 + skindiff^2 + hairdiff^2 + lifestylediff^2);
return diff;
}
Note that diff in this example is not on a nice scale like (0..1). It's value can range from 0 (no difference) to something large (high difference). Also, this method is almost completely unscientific, it is just designed to quickly give you a working difference metric.
Look at algorithms for computing srting difference. Its very similar to what you need. Store your attributes as a bit string and compute the distance between the strings
You should read these two topics.
Most popular clustering algorithm k - means
And similarity matrix are essential in clustering

Resources