Convert a string to a set without splitting the characters - python-3.x

I have a quick question: a='Tom', a type of str. I want to make it into a set with one item. If I use the command b = set(a), I got a set with 3 items in it, which is set(['m',''T','o']). I want set(['Tom']). How could I get it? Thanks.

The set builtin makes sets out of iterables. Iterating over a string yields each character one-by-one, so wrap the string in some other iterable:
set(['Tom'])
set(('Tom',))
If you're used to the mathematical notation for sets, you can just use curly braces (don't get it confused with the notation for dictionaries):
{'Tom'}
{'Tom', 'Bob'}
The resulting sets are equivalent
>>> {'Tom'} == set(['Tom']) == set(('Tom',))
True

set(['Tom'])
You just answered your own question (give list, instead of string).

Like this:
a = "Tom"
b = set([a])

Related

Remove part of string (regular expressions)

I am a beginner in programming. I have a string for example "test:1" and "test:2". And I want to remove ":1" and ":2" (including :). How can I do it using regular expression?
Hi andrew it's pretty easy. Think of a string as if it is an array of chars (letters) cause it actually IS. If the part of the string you want to delete is allways at the end of the string and allways the same length it goes like this:
var exampleString = 'test:1';
exampleString.length -= 2;
Thats it you just deleted the last two values(letters) of the string(charArray)
If you cant be shure it's allways at the end or the amount of chars to delete you'd to use the version of szymon
There are at least a few ways to do it with Groovy. If you want to stick to regular expression, you can apply expression ^([^:]+) (which means all characters from the beginning of the string until reaching :) to a StringGroovyMethods.find(regexp) method, e.g.
def str = "test:1".find(/^([^:]+)/)
assert str == 'test'
Alternatively you can use good old String.split(String delimiter) method:
def str = "test:1".split(':')[0]
assert str == 'test'

Pyparsing - matching the outermost set of nested brackets

I'm trying to use pyparsing to build a parser that will match on all text within an arbitrarily nested set of brackets. If we consider a string like this:
"[A,[B,C],[D,E,F],G] Random Middle text [H,I,J]"
What I would like is for a parser to match in a way that it returns two matches:
[
"[A,[B,C],[D,E,F],G]",
"[H,I,J]"
]
I was able to accomplish a somewhat-working version of this using a barrage of originalTextFor mashed up with nestedExpr, but this breaks when your nesting is deeper than the number of OriginalTextFor expressions.
Is there a straightforward way to only match on the outermost expression grabbed by nestedExpr, or a way to modify its logic so that everything after the first paired match is treated as plaintext rather than being parsed?
update: One thing that seems to come close to what I want to accomplish is this modified version of the logic from nestedExpr:
def mynest(opener='{', closer='}'):
content = (empty.copy()+CharsNotIn(opener+closer+ParserElement.DEFAULT_WHITE_CHARS))
ret = Forward()
ret <<= ( Suppress(opener) + originalTextFor(ZeroOrMore( ret | content )) + Suppress(closer) )
return ret
This gets me most of the way there, although there's an extra level of list wrapping in there that I really don't need, and what I'd really like is for those brackets to be included in the string (without getting into an infinite recursion situation by not suppressing them).
parser = mynest("[","]")
result = parser.searchString("[A,[B,C],[D,E,F],G] Random Middle text [H,I,J]")
result.asList()
>>> [['A,[B,C],[D,E,F],G'], ['H,I,J']]
I know I could strip these out with a simple list comprehension, but it would be ideal if I could just eliminate that second, redundant level.
Not sure why this wouldn't work:
sample = "[A,[B,C],[D,E,F],G] Random Middle text [H,I,J]"
scanner = originalTextFor(nestedExpr('[',']'))
for match in scanner.searchString(sample):
print(match[0])
prints:
'[A,[B,C],[D,E,F],G]'
'[H,I,J]'
What is the situation where "this breaks when your nesting is deeper than the number of OriginalTextFor expressions"?

match part of string in R

I'm stuck with something that usually is pretty easily in other programming languages.
I want to test whether a string is inside another one in R. For example I tried:
match("Diagnosi Prenatale,Esercizio Fisico", "Diagnosi Prenatale")
pmatch("Diagnosi Prenatale,Esercizio Fisico", "Diagnosi Prenatale")
grep("Diagnosi Prenatale,Esercizio Fisico", "Diagnosi Prenatale")
And none worked. To make it work I should fist split the first string with strsplit and extract the first element.
NOTE: I'd like to do this on a vector of strings to receive a yes/no vector, so in the function I wrote should go a vector not a single string. But of course if the single string doesn't work, image a full vector of them...
Any ideas?
Try grepl
grepl("Diagnosi Prenatale","Diagnosi Prenatale,Esercizio Fisico" )
[1] TRUE
You can also do this with character vectors, for example:
x <- c("Diagnosi Prenatale,Esercizio Fisico", "Diagnosi Prenatale")
grepl("Diagnosi Prenatale",x)
#[1] TRUE TRUE

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

MATLAB string handling

I want to calculate the frequency of each word in a string. For that I need to turn string into an array (matrix) of words.
For example take "Hello world, can I ask you on a date?" and turn it into
['Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?']
Then I can go over each entry and count every appearance of a particular word.
Is there a way to make an array (matrix) of words in MATLAB, instead of array of just chars?
Here is a little simpler regexp:
words = regexp(s,'\w+','match');
\w here means any symbol that can appear in words (including underscore).
Notice that the last question mark will not be included. Do you need it for counting words actually?
Regular expressions
s = 'Hello world, can I ask you on a date?'
slist = regexp(s, '[^ ]*', 'match')
yield
slist =
'Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?'
Another way to do it is like this:
s = cell(java.lang.String('Hello world, can I ask you on a date?').split('[^\w]+'));
I.e. by creating a Java String object and using its methods to do the work, then converting back to a cell array of strings. Not necessarily the best way to do a job this simple, but Java has a rich library of string handling methods & classes that can come in handy.
Matlab's ability to switch into Java at the drop of a hat can come in handy sometimes - for example, when parsing & writing XML.

Resources