Splitting string into tuple - string

Given a string in this format: "Here is some random text, like 5/4=3./12/3/4", so "Again some spaces\n/12/12/12" would count as well... I have to convert this into a four part tuple, such as (12,3,4,"Here is some random text, like 5/4=3") where the first three numbers are made into integers. how would I go about doing this using split and slicing?

Related

Format in Python

I have a list of values as follows:
no column
1. 111-222-11
2. 112-333-12
3. 113-444-13
I want to format the value from 111-222-11 to 111-222-011 and format the other values similarly. Here is my code snippet in Python 3, which I am trying to use for that:
‘{:03}-{:06}-{:03}.format(column)
I hope that you can help.
Assuming that column is a variable that can be assigned string values 111-222-11, 112-333-12, 113-444-13 and so on, which you want to change to 111-222-011, 112-333-012, 113-444-013 and so on, it appears that you tried to use a combination of slice notation and format method to achieve this.
Slice notation
Slice notation, when applied to a string, treats it as a list-like object consisting of characters. The positional index of a character from the beginning of the string starts from zero. The positional index of a character from the end of the string starts with -1. The first colon : separates the beginning and the end of a slice. The end of the slice is not included into it, unlike its beginning. You indicate slices as you would indicate indexes of items in a list by using square brackets:
'111-222-11'[0:8]
would return
'111-222-'
Usually, the indexes of the first and the last characters of the string are skipped and implied by the colon.
Knowing the exact position where you need to add a leading zero before the last two digits of a string assigned to column, you could do it just with slice notation:
column[:8] + '0' + column[-2:]
format method
The format method is a string formatting method. So, you want to use single quotes or double quotes around your strings to indicate them when applying that method to them:
'your output string here'.format('your input string here')
The numbers in the curly brackets are not slices. They are placeholders, where the strings, which are passed to the format method, are inserted. So, combining slices and format method, you could add a leading zero before the last two digits of a column string like this:
'{0}0{1}'.format(column[:8], column[-2:])
Making more slices is not necessary because there is only one place where you want to insert a character.
split method
An alternative to slicing would be using split method to split the string by a delimiter. The split method returns a list of strings. You need to prefix it with * operator to unpack the arguments from the list before passing them to the format method. Otherwise, the whole list will be passed to the first placeholder.
'{0}-{1}-0{2}'.format(*column.split('-'))
It splits the string into a list treating - as the separator and puts each item into a new string, which adds 0 character before the last one.

Get the index of the first occurrence of integer in a string in Pandas

I would like to ask on what would be the best approach to find the index of the first occurrence of an integer in a string using Pandas.
I have this sample code,
df["column"] = "sample code is 1234 just like that 6789"
My goal is to be able to separate "sample code is" and "1234 just like that 6789". And to do that I have to determine where to separate the string, i.e. to look for the first occurrence of an integer.
I expect this result,
df["column1"] = sample code is
df["column2"] = 1234 just like that 6789
I used this code,
df["column"].str.find(r'[0-9]'))
But, it returns -1 (False).
split
df[['column1', 'column2']] = df.column.str.split('\s*(?=\d)', 1, expand=True)
df
column column1 column2
0 sample code is 1234 just like that 6789 sample code is 1234 just like that 6789
Details
df.column.str.split required three arguments:
A regex pattern that finds some white space of zero to arbitrary length that is followed by a digit. Mind that the found digit isn't included in the split separator.
# The (?=\d) is a look ahead search pattern
'\s*(?=\d)'
The second argument 1 specifies how many splits to perform
The third argument states that this result should be split into a dataframe

Generate partial strings which have predefined minimum lengths (Matlab)

I have an initial string Init={ABCDEFGH}. How can I generate 100 partial strings (randomly) from Init string which have these conditions:
A pre-defined minimum lengths.
The order of elements in each partial string should be from 'A' to 'Z'.
No repeated characters in each partial strings
The expected output should be as follows: 100 partial strings, minimum length of each partial string is 5
Output = {'BCEGH';'ACEFG';'ABCDEF';'BCFGH';'BCDEG';....;'ABEFH';'ABCEGH'}
numel(Output) = 100
To do this, I started by generating random numbers for the length of each partial string. Then I generated random numbers corresponding to each letter in each string. Then I transferred those numbers into their corresponding letters. The comments should explain the rest.
n=100 %// how many samples to take
C='ABCDEFGH' %// take samples from these letters
maxL=numel(C) %// the longest string
minL=5 %// the shortest string
len=randi([minL maxL],[n 1]) %// generate length of each partial string
arrayfun(#(l) C(randsample(1:8,l)),len,'uni',0) %// randomly sample letters to give strings of correct length
and n=4 gives, for example
ans =
'CFHABEDG'
'CFHABE'
'FAHBE'
'DGHFABE'
I'm not sure this is truly random because it assumes that there are the same number of strings of each length, but I don't think this is true. I think len should be weighted with respect to the number of strings of each length. I think (but I'm not sure) that this should fix that:
for i=1:(maxL-minL+1)
w(i)=factorial(minL-1+i)*nchoosek(maxL,minL-1+i);
end
len=minL-1+randsample(1:(maxL-minL+1),n,true,w./sum(w))

Lexicographically larger strings

I'm trying to understand the concept of lexicographically larger or smaller strings. My book gives some examples of strings that are lexicographically larger or smaller than each other and an intermediary string that is between the two in size.
string 1: a
string 2: c
intermediary string: b
string 1: aaa
string 2: zzz
intermediary string: yyy
string 1: abcdefg
string 2: abcdefh
intermediary string: (none)
I'm not sure what the requirement is for a string to be lexicographically in between the two strings. Is it that every letter of the intermediary string has to have a larger ASCII value than the corresponding letter of the first string and smaller ASCII value of the corresponding letter of the second string?
For example, "bcdefg" is the intermediary string between "abcdef" and "cdefgh". Can "stuvx" be the intermediary between "stuvw" and "stuvy"?
Lexicographical ordering simply means dictionary ordering. I say "simply" but there may actually be all sorts of wonderful edge cases such as how you treat apostrophes, what you do with diphthongs, whether you "fold" accented letters into the unaccented ones, such as transforming {À,Á,Â,Ã,Ä} -> A. All these rules on how you collate letters will affect the ordering of words as well.
English is fairly easy if you restrict yourself to the twenty-six actual letters of the alphabet. You can consider a word to be "lesser" than another word if, in the first character position that is different between the two, the character from the first word comes before that of the second.
And, in fact, there is a solution to the third option provided it doesn't have to be the same length as the others, that of:
string 1: abcdefg
string 2: abcdefh
intermediary string: abcdefga

Python 3 - Adding string numbers to a list

So if the user input is:
user = input('Enter numbers here in the following format "10 12 14": ')
then the user inputs numbers 10 11 12 in that exact way, not separated by commas, and all in one string separated by spaces, how can i add the numbers to a list and then convert them to int instead of string
One approach to achieve this would be by the following,
list(map(int,input('Enter Numbers: ').split()))
This will ask the user for numbers to enter which will return back a str object as the user has entered them via input. The split call at the end, returns a comma separated list containing those numbers. They are still considered str at this point.
The next map function accepts a fnc and list arguments, which will apply the fnc argument to each item within the list. In this case, it will cast each value in the list of str objects to an int type.
The last step would be to convert the map object we had created into a list via the list() call.

Resources