What happens when creating string in smalltalk? - object

I am noobie in Smalltak, but I need to understand some things for my thesis. What is exactly happening when creating string or any other object? For example, lets do this:
fruit <- 'apple'
When I try to inspect object fruit, I see it has 5 inst vars. If I had assigned 'pear' to fruit, it would have had 4 inst vars. So interpreter created new instance of bytestring, added required inst vars for every character and assigned them with proper values? I believe there is something more going on, but I can't find it anywhere and I don't have time to properly learn smalltalk. Can you explain it to me, or give me some link where I can find it?

Strings are objects. Objects contain instance variables and respond to messages. In Smalltalk there are basically two kinds of instance variables: named instance variables are referenced by name (like name or phoneNumber in a Person object) and indexed instance variables are referenced by numbers. String uses indexed instance variables.
Consider the following example:
fruit := String new: 5.
fruit at: 1 put: $a;
at: 2 put: $p;
at: 3 put: $p;
at: 4 put: $l;
at: 5 put: $e.
This creates a String with space for 5 characters. It then gets the fruit variable to point to that object. Then it writes 5 characters into the string. The result is the string 'apple'.
Since Strings are so commonly used, the compiler supports a special syntax to create strings at compile time.
fruit := 'apple'
In this example, 'apple' is a String literal. The Smalltalk compiler creates the string when it compiles the line. When you run the line, you will make fruit point to the string 'apple' which has 5 indexed instance variables containing Character objects.

They're not instance variables, they're positions in an indexable object, pretty similar to what happens when you create an Array or any other kind of collection.
A String, in Smalltalk, is just a Collection of Characters, where each Character is stored in the position it occupies inside the String.
Some examples to get you acquainted with Strings being just like Arrays:
'Good Morning' at: 3.
#(1 'hi' $d 5.34) at: 3.
'Good Morning' fourth.
#(1 'hi' $d 5.34) fourth.
'Good Morning' reversed.
#(1 'hi' $d 5.34) reversed.
'Good Morning' select: [ :each | each ~= $d ].
#(1 'hi' $d 5.34) select: [ :each | each ~= $d ].
As you can see, Strings are just another kind of Collection.

Strings are indexable objects, which means that they are arrays and the slots are numbered instead of "labeled" ...

First thing, is that an expression which you giving as an example does not creates a string.
It is simple assignment.
fruit := 'apple'
does not creates a string. It assigns existing string 'apple' to fruit variable.
If you want to create new strins, you should use
(Byte)String new:
similar to
Array new: ..
This is how compiler actually creating the new strings when compiling source code.

Related

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS?

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS EG ?
For example,
row1: the sun is nice
row2: the sun looks great
row3: the sun left me
Is there a code that would produce the following result column (2 words where sun is the first):
SUN IS
SUN LOOKS
SUN LEFT
and possibly a second column with COUNT in case of duplicate matches.
So if there was 20 SUN LOOKS then it they would be grouped and have a count of 20.
Thanks
I think you can use functions findw() and scan() to do want you want. Both of those functions operate on the concept of word boundaries. findw() returns the position of the word in the string. Once you know the position, you can use scan() in a loop to get the next word or words following it.
Here is a simple example to show you the concept. It is by no means a finished or polished solution, but intended you point you in the right direction. The input data set (text) contains the sentences you provided in your question with slight modifications. The data step finds the word "sun" in the sentence and creates a variable named fragment that contains 3 words ("sun" + the next 2 words).
data text2;
set text;
length fragment $15;
word = 'sun'; * search term;
fragment_len = 3; * number of words in target output;
word_pos = findw(sentence, word, ' ', 'e');
if word_pos then do;
do i = 0 to fragmen_len-1;
fragment = catx(' ', fragment, scan(sentence, word_pos+i));
end;
end;
run;
Here is a partial print of the output data set.
You can use a combination of the INDEX, SUBSTR and SCAN functions to achieve this functionality.
INDEX - takes two arguments and returns the position at which a given substring appears in a string. You might use:
INDEX(str,'sun')
SUBSTR - simply returns a substring of the provided string, taking a second numeric argument referring to the starting position of the substring. Combine this with your INDEX function:
SUBSTR(str,INDEX(str,'sun'))
This returns the substring of str from the point where the word 'sun' first appears.
SCAN - returns the 'words' from a string, taking the string as the first argument, followed by a number referring to the 'word'. There is also a third argument that specifies the delimiter, but this defaults to space, so you wouldn't need it in your example.
To pick out the word after 'sun' you might do this:
SCAN(SUBSTR(str,INDEX(str,'sun')),2)
Now all that's left to do is build a new string containing the words of interest. That can be achieved with concatenation operators. To see how to concatenate two strings, run this illustrative example:
data _NULL_;
a = 'Hello';
b = 'World';
c = a||' - '||b;
put c;
run;
The log should contain this line:
Hello - World
As a result of displaying the value of the c variable using the put statement. There are a number of functions that can be used to concatenate strings, look in the documentation at CAT,CATX,CATS for some examples.
Hopefully there is enough here to help you.

Perl Morgan and a String?

I am trying to solve this problem on hackerrank:
So the problem is:
Jack and Daniel are friends. Both of them like letters, especially upper-case ones.
They are cutting upper-case letters from newspapers, and each one of them has their collection of letters stored in separate stacks.
One beautiful day, Morgan visited Jack and Daniel. He saw their collections. Morgan wondered what is the lexicographically minimal string, made of that two collections. He can take a letter from a collection when it is on the top of the stack.
Also, Morgan wants to use all the letters in the boys' collections.
This is my attempt in Perl:
#!/usr/bin/perl
use strict;
use warnings;
chomp(my $n=<>);
while($n>0){
chomp(my $string1=<>);
chomp(my $string2=<>);
lexi($string1,$string2);
$n--;
}
sub lexi{
my($str1,$str2)=#_;
my #str1=split(//,$str1);
my #str2=split(//,$str2);
my $final_string="";
while(#str2 && #str1){
my $st2=$str2[0];
my $st1=$str1[0];
if($st1 le $st2){
$final_string.=$st1;
shift #str1;
}
else{
$final_string.=$st2;
shift #str2;
}
}
if(#str1){
$final_string=$final_string.join('',#str1);
}
else{
$final_string=$final_string.join('',#str2);
}
print $final_string,"\n";
}
Sample Input:
2
JACK
DANIEL
ABACABA
ABACABA
The first line contains the number of test cases, T.
Every next two lines have such format: the first line contains string A, and the second line contains string B.
Sample Output:
DAJACKNIEL
AABABACABACABA
But for Sample test-case it is giving right results while it is giving wrong results for other test-cases. One case for which it gives an incorrect result is
1
AABAC
AACAB
It outputs AAAABACCAB instead of AAAABACABC.
I don't know what is wrong with the algorithm and why it is failing with other test cases?
Update:
As per #squeamishossifrage comments If I add
($str1,$str2)=sort{$a cmp $b}($str1,$str2);
The results become same irrespective of user-inputs but still the test-case fails.
The problem is in your handling of the equal characters. Take the following example:
ACBA
BCAB
When faced with two identical characters (C in my example), you naïvely chose the one from the first string, but that's not always correct. You need to look ahead to break ties. You may even need to look many characters ahead. In this case, next character after C of the second string is lower than the next character of the first string, so you should take the C from the second string first.
By leaving the strings as strings, a simple string comparison will compare as many characters as needed to determine which character to consume.
sub lexi {
my ($str1, $str2) = #_;
utf8::downgrade($str1); # Makes sure length() will be fast
utf8::downgrade($str2); # since we only have ASCII letters.
my $final_string = "";
while (length($str2) && length($str1)) {
$final_string .= substr($str1 le $str2 ? $str1 : $str2, 0, 1, '');
}
$final_string .= $str1;
$final_string .= $str2;
print $final_string, "\n";
}
Too little rep to comment thus the answer:
What you need to do is to look ahead if the two characters match. You currently do a simple le match and in the case of
ZABB
ZAAA
You'll get ZABBZAA since the first match Z will be le Z. So what you need to do (a naive solution which most likely won't be very effective) is to keep looking as long as the strings/chars match so:
Z eq Z
ZA eq ZA
ZAB gt ZAA
and at that point will you know that the second string is the one you want to pop from for the first character.
Edit
You updated with sorting the strings, but like I wrote you still need to look ahead. The sorting will solve the two above strings but will fail with these two:
ZABAZA
ZAAAZB
ZAAAZBZABAZA
Because here the correct answer is ZAAAZABAZAZB and you can't find that will simply comparing character per character

AWK - enclose found strings with symbols in one command

I have a large body of text and I print only lines that contain one of several strings. Each line can contain more than one string.
Example of the rule:
(house|mall|building)
I want to mark the found string for making the result easier to read.
Example of the result I want:
New record: Two New York houses under contract for nearly $5 million each.
New record: Two New York #house#s under contract for nearly $5 million each.
I know I can find the location, trim, add marker, add string etc.
I am asking if there is a way to mark the found string in one command.
Thanks.
http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
gsub(ere, repl[, in])
Behave like sub (see below), except that it shall replace all occurrences of the regular expression ...
sub(ere, repl[, in ])
Substitute the string repl in place of the first instance of the
extended regular expression ERE in string in and return the number of
substitutions. An ampersand ( '&' ) appearing in the string repl shall
be replaced by the string from in that matches the ERE ...
BEGIN {
r = "house|mall|building"
s = "Two New York houses under contract for nearly $5 million each."
gsub(r, "#&#", s)
print s
}

How can I grep nth column of tab delimited file in Groovy?

My source file is tab delimited and I need to grep the 4th column of values. How can I do this in Groovy? Here's my code which doesn't work. Is it even close?
def tab_file = new File('source_file.tab')
tab_file.eachline { line -> println line.grep('\t\t\t\t'}
You could split by tab character, that would give you an array you can index into to get the column:
groovy:000> s = "aaa\tbbb\tccc\tddd\teee";
===> aaa bbb ccc ddd eee
groovy:000> s.split("\\t")[3]
===> ddd
Something like the following should work:
tab_file.eachLine { line ->
println ((line =~ /([^\t]*\t){3}([^\t]*)/)[0][2])
}
Explanation:
The =~ operator creates a java.util.regex.Matcher object using the pattern on the right-hand side. Groovy lets you then implicitly execute find() via the array subscript operator. If your regex has groups in it, this results in a List for each result. This list has the whole matched area as element 0, then the groups as further elements. So [0][2] is the first match of the regex (zero-indexed), specifically the 2nd group match. (Btw, if there were no groups in the regex, the result is just a string with the match). Details/Examples here.
Update/Aside:
I was just looking into the grep() fxnality added to Object, as I was curious. I'm not sure I see the utility outside of collection types, but when applied to Strings, it doesn't do as you might expect - it appears to loop through the characters in the string, and compares each character against the passed-in String (collecting matches in a list). If your passed-in String is >1 character, you'll never get a match, as the character under inspection per iteration will never equal the whole string passed-in (in your example, any \t != "\t\t\t\t")

MATLAB string handling

I want to calculate the frequency of each word in a string. For that I need to turn string into an array (matrix) of words.
For example take "Hello world, can I ask you on a date?" and turn it into
['Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?']
Then I can go over each entry and count every appearance of a particular word.
Is there a way to make an array (matrix) of words in MATLAB, instead of array of just chars?
Here is a little simpler regexp:
words = regexp(s,'\w+','match');
\w here means any symbol that can appear in words (including underscore).
Notice that the last question mark will not be included. Do you need it for counting words actually?
Regular expressions
s = 'Hello world, can I ask you on a date?'
slist = regexp(s, '[^ ]*', 'match')
yield
slist =
'Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?'
Another way to do it is like this:
s = cell(java.lang.String('Hello world, can I ask you on a date?').split('[^\w]+'));
I.e. by creating a Java String object and using its methods to do the work, then converting back to a cell array of strings. Not necessarily the best way to do a job this simple, but Java has a rich library of string handling methods & classes that can come in handy.
Matlab's ability to switch into Java at the drop of a hat can come in handy sometimes - for example, when parsing & writing XML.

Resources