Removing special characters in a string - python 3 - python-3.x

st = 'Lorem ipsum dolor sit amet, consectetur adipis.cing elit. Aliquam sem odio...'
n = []
for i in st:
n.append(i)
for i in n:
if i in [',','.']:
n.remove(i)
string = ''
for i in n:
string += i
print(string)
input string :
Lorem ipsum dolor sit amet, consectetur adipis.cing elit. Aliquam sem odio...
output :
Lorem ipsum dolor sit amet consectetur adipiscing elit Aliquam sem odio.
expected output :
Lorem ipsum dolor sit amet consectetur adipiscing elit Aliquam sem odio
There is one dot . at the end of the sentence that is not getting removed.

You can use str.join for the task:
st = "Lorem ipsum dolor sit amet, consectetur adipis.cing elit. Aliquam sem odio..."
print("".join(ch for ch in st if ch not in {*",."}))
Prints:
Lorem ipsum dolor sit amet consectetur adipiscing elit Aliquam sem odio

How about using replace() for both commas and periods?
>>> st.replace(",", "").replace(".", "")
'Lorem ipsum dolor sit amet consectetur adipiscing elit Aliquam sem odio'

Calling some_list.remove(something) while iterating over some_list changes the length of list and introduces the potential to skip elements. The solution is to copy the list first. See this thread.
Also, remove() removes the first occurrence, not the current index, so you may get unusual results using it. Best case scenario, iterating over the list repeatedly from the front is harmful to time complexity. I don't find remove() useful often in practice.
I'd write this using a simple regex:
>>> import re
>>> st = 'Lorem ipsum dolor sit amet, consectetur adipis.cing elit. Aliquam sem odio...'
>>> re.sub(r"[.,]", "", st)
'Lorem ipsum dolor sit amet consectetur adipiscing elit Aliquam sem odio'
Other remarks:
This is Shlemiel the painter's algorithm:
for i in n:
string += i
Better is "".join(n)
n is usually reserved for "number". Prefer lst or L for a generic list.
Allocating a list inside a loop adds unnecessary overhead: if i in [',','.']:.
The code:
n = []
for i in st:
n.append(i)
can be better expressed as list(st).

Related

How to format long text file with spaces?

I have a .txt file that contains text that is formatted like so:
Neque porro, quisquam est qui
dolorem ipsum quia, dolor (sit amet)
consectetur, adipisci velit,Lorem Ipsum
dolor sit amet, consectetur adipiscing elit
tempor ipsum quia, minim (sit minim)
consectetur, adipisci velit,Lorem Ipsum
There are multiple text items like this. I wish to make it so that they are all one liners each so I can paste them into excel like so
Neque porro, quisquam est qui dolorem ipsum quia, dolor (sit amet) consectetur, adipisci velit, Lorem Ipsum
dolor sit amet, consectetur adipiscing elit tempor ipsum quia, minim (sit minim) consectetur, adipisci velit,Lorem Ipsum
Would there be any way to do this for files with a lot of text that are like this?
If you did want to use Excel (as your question indicates) this formula work if you had all text in a single cell
=SUBSTITUTE(SUBSTITUTE(A1,CHAR(10)&" "," "),char(10),REPT(char(10),2))

Regex to find Text between different chars

Im looking for a fast way to get a word in a huge text which starts with "TEST-"
Lorem ipsum dolor sit amet, consectetur adipisici elit "TEST-12345"
Lorem ipsum dolor sit amet, consectetur adipisici elit 'TEST-12345'
Lorem ipsum dolor sit amet, consectetur adipisici elit " TEST-12345 "
Lorem ipsum dolor sit amet, consectetur adipisici elit "TEST-12345" sed diam nonumy eirmod tempor invidunt
Lorem ipsum dolor sit amet, consectetur adipisici elit " TEST-12345 "
I have tried to archive this with different loops, but im getting Unexpected identifier everytime
Lorem ipsum dolor sit amet, consectetur adipisici elit "TEST-12345"
Lorem ipsum dolor sit amet, consectetur adipisici elit 'TEST-12345'
Lorem ipsum dolor sit amet, consectetur adipisici elit " TEST-12345 "
Lorem ipsum dolor sit amet, consectetur adipisici elit "TEST-12345" sed diam nonumy eirmod tempor invidunt
Lorem ipsum dolor sit amet, consectetur adipisici elit " TEST-12345 "
var text = 'Lorem ipsum dolor sit amet, consectetur adipisici elit "TEST-12345"\
Lorem ipsum dolor sit amet, consectetur adipisici elit \'TEST-12345\'\
Lorem ipsum dolor sit amet, consectetur adipisici elit " TEST-12345 "\
Lorem ipsum dolor sit amet, consectetur adipisici elit "TEST-12345" sed diam nonumy eirmod tempor invidunt\
Lorem ipsum dolor sit amet, consectetur adipisici elit " TEST-12345 "'.split("\\")
for (var x in text) {
console.log(text[x])
}
Use .match method.
text.match(/TEST-\d{5}/g)

Generating all Possible Combinations of a Sentence with Variables

Say I have a sentence with multiple variables as follows:
"lorem ipsum {a, b} dolor {c, d, e} sit amet"
Assuming the letters in the braces are variables, how would one go about generating a group of sentences out of all possible combinations of the variables?
Note:
The number of variable groups or variable count within each group of variables is unknown.
The expected output for this particular example would be:
"lorem ipsum {a} dolor {c} sit amet"
"lorem ipsum {b} dolor {c} sit amet"
"lorem ipsum {a} dolor {d} sit amet"
"lorem ipsum {b} dolor {d} sit amet"
"lorem ipsum {a} dolor {e} sit amet"
"lorem ipsum {b} dolor {e} sit amet"
In general case ("number of variable groups... count within each group ... is unknown") we should parse the initial string (let's do it with a help of regular expressions) and then enumerate all the combinations.
C# Code:
using System.Text.RegularExpressions;
...
private static IEnumerable<string> Generator(string source) {
// parsing: variables extracted: array of variables and their possible values
string[][] variables = Regex
.Matches(source, #"\{.*?\}")
.OfType<Match>()
.Select(match => match
.Value
.Trim('{', '}')
.Split(',')
.Select(item => "{" + item.Trim() + "}")
.ToArray())
.ToArray();
// now we should enumerate all possible variables' values
int[] indexes = new int[variables.Length];
do {
// code golf : ugly side effects but short code
int at = 0;
yield return Regex.Replace(source, #"\{.*?\}", match => variables[at][indexes[at++]]);
for (int i = 0; i < indexes.Length; ++i)
if (indexes[i] < variables[i].Length - 1) {
indexes[i] = indexes[i] + 1;
break;
}
else
indexes[i] = 0;
}
while (!indexes.All(index => index == 0));
}
Demo:
string source = #"lorem ipsum {a, b} dolor {c, d, e} sit amet";
string report = string.Join(Environment.NewLine, Generator(source));
Console.Write(report);
Outcome:
lorem ipsum {a} dolor {c} sit amet
lorem ipsum {b} dolor {c} sit amet
lorem ipsum {a} dolor {d} sit amet
lorem ipsum {b} dolor {d} sit amet
lorem ipsum {a} dolor {e} sit amet
lorem ipsum {b} dolor {e} sit amet
Another example:
// 3 groups of variables with strange names
string source = #"lorem ipsum {A + 2, B, C?} dolor {XY, PQR} sit {eh?, bla-bla-bla} amet";
Console.Write(string.Join(Environment.NewLine, Generator(source)));
Outcome:
lorem ipsum {A + 2} dolor {XY} sit {eh?} amet
lorem ipsum {B} dolor {XY} sit {eh?} amet
lorem ipsum {C?} dolor {XY} sit {eh?} amet
lorem ipsum {A + 2} dolor {PQR} sit {eh?} amet
lorem ipsum {B} dolor {PQR} sit {eh?} amet
lorem ipsum {C?} dolor {PQR} sit {eh?} amet
lorem ipsum {A + 2} dolor {XY} sit {bla-bla-bla} amet
lorem ipsum {B} dolor {XY} sit {bla-bla-bla} amet
lorem ipsum {C?} dolor {XY} sit {bla-bla-bla} amet
lorem ipsum {A + 2} dolor {PQR} sit {bla-bla-bla} amet
lorem ipsum {B} dolor {PQR} sit {bla-bla-bla} amet
lorem ipsum {C?} dolor {PQR} sit {bla-bla-bla} amet
So basically you want to iterate two different arrays for all possible combinations of a single value from each array - Nested loops is probably the best option.
Here's a c# code to do that, with comments on each line for easy translation to other languages:
var values0 = new string[] {"a", "b"}; // All possible values for first slot
var values1 = new string[] {"c", "d", "e"}; // All possible values for second slot
foreach(var val0 in values0) // Iterate first array
{
foreach(var val1 in values1) // Iterate second array
{
var result = $"Lorem ipsum {val0} dolor {val1} sit amet"; // Insert values to slots
Console.WriteLine(str); // output
}
}
Result:
Lorem ipsum a dolor c sit amet
Lorem ipsum a dolor d sit amet
Lorem ipsum a dolor e sit amet
Lorem ipsum b dolor c sit amet
Lorem ipsum b dolor d sit amet
Lorem ipsum b dolor e sit amet

LilyPond: formatting long footnotes

When writing a long footnote with LilyPond 2.17.25, the text is not breaking into several lines or respecting the margin limits. I would love to have it set to justified alignment as well, if that is possible.
Here is a tiny example:
\version "2.17.25"
{
\footnote #'(-1 . 1)
\markup{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut eget ante venenatis mi consectetur ornare. Cras facilisis dictum venenatis. Donec.}
a'4 b' c'' d''
}
Thanks a lot!
The solution is to simply add \justify or \wordwrap to the \markup command, as:
\version "2.17.25"
{
\footnote #'(-1 . 1)
\markup\justify{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut eget ante venenatis mi consectetur ornare. Cras facilisis dictum venenatis. Donec.}
a'4 b' c'' d''
}

How to cut some text?

I have long section titles in my document like:
Lorem ipsum dolor sit amet, consectetur adipisicing elit. Proin nibh augue, suscipit a, scelerisque sed, lacinia in, mi.
Now I want to place it in page header but it is to long for it. Is there any way to cut text in LaTeX? I want to have it like that:
Lorem ipsum dolor sit amet, consectetur...
Is that possible?
https://tex.stackexchange.com/questions/6862/how-can-i-display-a-short-chapter-name-in-the-header-and-a-long-chapter-name-in-t

Resources