Random Characters in string python - python-3.x

how can i make a program that will take a sentence like
i can take you places you've never been before to this check photo as it wont let me add the aids code. Basically add extra random characters and words that i choose beforehand in random places

Related

How to find strings of a list in a text with typo's

I'm trying to check if some String in a list are in a given text. But the given text can have some typos. For example let's take this.
text: The brownw focx and the cat are in th eforest.
and my list is: [brown fox, forest, cat]
What I do actually to do this is that I separate my text in multiple groups, groups of one word and two words like so:
[The, brownw, focx, and, the, cat, are, in, th, eforest, The brownw, brownw focx, focx and, and the, the cat, cat are, are in, in th, th eforest]
Than I iterate over each group of word and check with the Levensthein algorithm how much the two strings match with each other. In case it's more than 90% I consider they are the same.
This approach however is very time consuming and I wonder if I can find an alternative to this.
Instead of using the full Levenshtein distance (which is slow to compute), you could do a couple of sanity checks beforehand, to try and exclude candidates which are obviously wrong:
word length: the will never match brown fox, as it is far too short. Count the word length, and exclude all candidates that are more than a few letters shorter or longer.
letters: just check what letters are in the word. for example, the does not contain a single letter from fox, so you can rule it out straightaway. With short words it might not make a big difference in performance, but for longer words it will do. Additional optimisation: look for rare characters (x,q,w) first, or simply ignore common ones (e,t,s) which are more likely to be present anyway.
Heuristics such as these will of course not give you the right answer, but they can help to filter out those that are definitely not going to match. Then you only need to perform the more expensive full check on a much smaller number of candidate words.

Make a model to identify a string

I have a string like this
ODQ1OTc3MzY0MDcyNDk3MTUy.YKoz0Q.wlST3vVZ3IN8nTtVX1tz8Vvq5O8
The first part of the string is a random 18 digit number in base64 format and the second is a unix timestamp in base64 too, while the last is an hmac.
I want to make a model to recognize a string like this.
How may i do it?
While I did not necessarily think deeply about it, this would be what comes to my mind first.
You certainly don't need machine learning for this. In fact, machine learning would not only be inefficient for problems like this but may even be worse, depending on a given approach.
Here, an exact solution can be achieved, simply by understanding the problem.
One way people often go about matching strings with a certain structure is with so called regular expressions or RegExp.
Regular expressions allow you to match string patterns of varying complexity.
To give a simple example in Python:
import re
your_string = "ODQ1OTc3MzY0MDcyNDk3MTUy.YKoz0Q.wlST3vVZ3IN8nTtVX1tz8Vvq5O8"
regexp_pattern = r"(.+)\.(.+)\.(.+)"
re.findall(regexp_pattern, your_string)
>>> [('ODQ1OTc3MzY0MDcyNDk3MTUy', 'YKoz0Q', 'wlST3vVZ3IN8nTtVX1tz8Vvq5O8')]
Now one problem with this is how do you know where your string starts and stops. Most of the times there are certain anchors, especially in strings that were created programmatically. For instance, if we knew that prior to each string you wanted to match there is the word Token: , you could include that in your RegExp pattern r"Token: (.+)\.(.+)\.(.+)".
Other ways to avoid mismatches would be to clearer define the pattern requirements. Right now we simply match a pattern with any amount of characters and two . separating them into three sequences.
If you would know which implementation of base64 you were using, you could limit the alphabet of potential characters from . (thus any) to the alphabet used in your base64 implementation [abcdefgh1234]. In this example it would be abcdefgh1234, so the pattern could be refined like this r"([abcdefgh1234]+).([abcdefgh1234]+).(.+)"`.
The same applies to the HMAC code.
Furthermore, you could specify the allowed length of each substring.
For instance, you said you have 18 random digits. This would likely mean each is encoded as 1 byte, which would translate to 18*8 = 144 bits, which in base64, would translate to 24 tokens (where each encodes a sextet, thus 6 bits of information). The same could be done with the timestamp, assuming a 32 bit timestamp, this would likely necessitate 6 base64 tokens (representing 36 bits, 36 because you could not divide 32 into sextets).
With this information, you could further refine the pattern
r"([abcdefgh1234]{24})\.([abcdefgh1234]{6})\.(.+)"`
In addition, the same could be applied to the HMAC code.
I leave it to you to read a bit about RegExp but I'd guess it is the easiest solution and certainly more appropriate than any kind of machine learning.

Keeping Count Of Words

I am writing a program to keep count of 'good' and 'bad' words. The program is using two text files, one with good words and one with bad words, to detect the score. I currently have the following:
...
The program executes in Python, but I can't get it to keep count of the score. I'm not sure what's wrong.
There are no obvious errors in the code. Here are some things to checks:
1) Do the lines in the pos/neg file have just one word? If not, it needs to be split.
2) Is the case the same? If not, be sure to casefold both the target words and the input text.
3) Use of str.split() usually isn't the best way to split natural text that might contain punctuation. Consider something like re.findall(r"[A-Za-z\'\-]+", text).
4) You will be much better lookup performance is the pos/neg words are stored in sets rather than lists.

Password safety

Many of you probably know this image. Some would argue that it's very vulnerable to dictionary attacks, so I wrote my own password generator. I'm using the default 2048 word list of the bitcoin passphrase, additionally adding a single number, symbol and uppercase letter.
If using a password manager (e.g. Lastpass), I can easily create something like this:
#guwV&zMot#v2YJE0^!x5vCnd1M$r&MNCvJgP*j7k3v6F5nM#hf1jUITOl^HP&D9
But entering this string on your smartphone isn't fun at all. Here some examples how my generated passwords look like (source code):
pull 2 mule slot F milk lecture treat crew _ dry amateur pyramid
hole # agree domain 9 execute exist loud column bounce G another
pledge 4 moral rely slow _ fork M tired fame razor cradle derive
report ! crop addict choice fiction fashion 9 mail W sun weather
My thought is this:
The attacker tries a dictionary attack - without success because of the extra characters. So next try would be bruteforce (lowercase letters). After that he need to try uppercase letters, numbers and symbols too. Because he doesn't know the length of my password and the quantity of numbers/symbols/uppercase letters, he has to try all possible combinations.
So not knowing the pattern and length of the password and the easy interchangeability/modification of the dictionary, chars, numbers and symbols, it should be as strong as the one Lastpass generated. Is my idea correct or am I missing something and this passwords aren't that much safer either?
(Edit) Updated the source code: replaced $RANDOM with a function which generates a random number from /dev/random, added some comments and variable names are clearer now.

Extracting information in a string

I would like to parse strings with an arbitrary number of parameters, such as P1+05 or P2-01 all put together like P1+05P2-02. I can get that data from strings with a rather large (too much to post around...) IF tree and a variable keeping track of the position within the string. When reaching a key letter (like P) it knows how many characters to read and proceeds accordingly, nothing special. In this example say I got two players in a game and I want to give +05 and -01 health to players 1 and 2, respectively. (hence the +-, I want them to be somewhat readable).
It works, but I feel this could be done better. I am using Lua to parse the strings, so maybe there is some built-in function, within Lua, to ease that process? Or maybe some general hints , or references for better approaches?
Here is some code:
for w in string.gmatch("P1+05P2-02","%u[^%u]+") do
print(w)
end
It assumes that each "word" begins with an uppercase letter and its parameters contain no uppercase letters.

Resources