how to display specific words per page - string

hey guys,
i have a string which contains say 100 words, now i want to split that string for words and want to display 10 words per page. how this could be done... i am not getting the logic for the same..
please reply with how this could be resolved.

if the words are seperated by white spaces,count for it and make arrays containing 10 words each. if words are separated, split(java) or explode(php) to arrays and combine 10 to make a new array n display it in each page

if you are talking about php, you can do something like this :
<?php
$words = explode(' ', $string,); // create an array with each word as items.
$start = 0; // index of the first word
$number = 10; // number of words per page
$words_part = array_slice($words, $start, $number); // take 10 words from the array, starting at $start
foreach($words_part as $word) {
echo $word;
}
?>
Basically, we separate each words using a space as separator, then we take the relevant part of the resulting array (the ten first words).
You can easily the number of words taken in each part by modifying the $number variable, or the start place with the $start variable.
Implementation in another language will be identical.
I hope this help.

Related

I have a string, need that string to be compared with list of strings in TCL

Need to compare string1 with string2 in TCL
set string1 {laptop Keyboard mouse MONITOR PRINTER}
set string2 {mouse}
Well, you can use:
if {$string2 in $string1} {
puts "present in the list"
}
Or you can use lsearch if you want to know where (it returns the index that it finds the element at, or -1 if it isn't there). This is most useful when you want to know where in the list the value is. It also has options to do binary searching (if you know the list is sorted) which is far faster than a linear search.
set idx [lsearch -exact $string1 $string2]
if {$idx >= 0} {
puts "present in the list at index $idx"
}
But if you are doing a lot of searching, it can be best to create a hash table using an array or a dictionary. Those are extremely fast but require some setup. Whether the setup costs are worth it depends on your application.
set words {}
foreach word $string1 {dict set words $word 1}
if {[dict exists $words $string2]} {
puts "word is present"
}
Note that if you're dealing with ordinary user input, you probably want a sanitization step or two. Tcl lists aren't exactly sentences, and the differences can really catch you out once you move to production. The two main tools for that are split and regexp -all -inline.
set words [split $sentence]
set words [regexp -all -inline {\S+} $sentence]
Understanding how to do the cleanup requires understanding your input data more completely than I do.
There is string first
if {[string first $string2 $string1] != -1} {
puts "string1 contains string2"
}
or
if {[string match *$string2* $string1]} {
puts "string1 contains string2"
}

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS?

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS EG ?
For example,
row1: the sun is nice
row2: the sun looks great
row3: the sun left me
Is there a code that would produce the following result column (2 words where sun is the first):
SUN IS
SUN LOOKS
SUN LEFT
and possibly a second column with COUNT in case of duplicate matches.
So if there was 20 SUN LOOKS then it they would be grouped and have a count of 20.
Thanks
I think you can use functions findw() and scan() to do want you want. Both of those functions operate on the concept of word boundaries. findw() returns the position of the word in the string. Once you know the position, you can use scan() in a loop to get the next word or words following it.
Here is a simple example to show you the concept. It is by no means a finished or polished solution, but intended you point you in the right direction. The input data set (text) contains the sentences you provided in your question with slight modifications. The data step finds the word "sun" in the sentence and creates a variable named fragment that contains 3 words ("sun" + the next 2 words).
data text2;
set text;
length fragment $15;
word = 'sun'; * search term;
fragment_len = 3; * number of words in target output;
word_pos = findw(sentence, word, ' ', 'e');
if word_pos then do;
do i = 0 to fragmen_len-1;
fragment = catx(' ', fragment, scan(sentence, word_pos+i));
end;
end;
run;
Here is a partial print of the output data set.
You can use a combination of the INDEX, SUBSTR and SCAN functions to achieve this functionality.
INDEX - takes two arguments and returns the position at which a given substring appears in a string. You might use:
INDEX(str,'sun')
SUBSTR - simply returns a substring of the provided string, taking a second numeric argument referring to the starting position of the substring. Combine this with your INDEX function:
SUBSTR(str,INDEX(str,'sun'))
This returns the substring of str from the point where the word 'sun' first appears.
SCAN - returns the 'words' from a string, taking the string as the first argument, followed by a number referring to the 'word'. There is also a third argument that specifies the delimiter, but this defaults to space, so you wouldn't need it in your example.
To pick out the word after 'sun' you might do this:
SCAN(SUBSTR(str,INDEX(str,'sun')),2)
Now all that's left to do is build a new string containing the words of interest. That can be achieved with concatenation operators. To see how to concatenate two strings, run this illustrative example:
data _NULL_;
a = 'Hello';
b = 'World';
c = a||' - '||b;
put c;
run;
The log should contain this line:
Hello - World
As a result of displaying the value of the c variable using the put statement. There are a number of functions that can be used to concatenate strings, look in the documentation at CAT,CATX,CATS for some examples.
Hopefully there is enough here to help you.

Perl Morgan and a String?

I am trying to solve this problem on hackerrank:
So the problem is:
Jack and Daniel are friends. Both of them like letters, especially upper-case ones.
They are cutting upper-case letters from newspapers, and each one of them has their collection of letters stored in separate stacks.
One beautiful day, Morgan visited Jack and Daniel. He saw their collections. Morgan wondered what is the lexicographically minimal string, made of that two collections. He can take a letter from a collection when it is on the top of the stack.
Also, Morgan wants to use all the letters in the boys' collections.
This is my attempt in Perl:
#!/usr/bin/perl
use strict;
use warnings;
chomp(my $n=<>);
while($n>0){
chomp(my $string1=<>);
chomp(my $string2=<>);
lexi($string1,$string2);
$n--;
}
sub lexi{
my($str1,$str2)=#_;
my #str1=split(//,$str1);
my #str2=split(//,$str2);
my $final_string="";
while(#str2 && #str1){
my $st2=$str2[0];
my $st1=$str1[0];
if($st1 le $st2){
$final_string.=$st1;
shift #str1;
}
else{
$final_string.=$st2;
shift #str2;
}
}
if(#str1){
$final_string=$final_string.join('',#str1);
}
else{
$final_string=$final_string.join('',#str2);
}
print $final_string,"\n";
}
Sample Input:
2
JACK
DANIEL
ABACABA
ABACABA
The first line contains the number of test cases, T.
Every next two lines have such format: the first line contains string A, and the second line contains string B.
Sample Output:
DAJACKNIEL
AABABACABACABA
But for Sample test-case it is giving right results while it is giving wrong results for other test-cases. One case for which it gives an incorrect result is
1
AABAC
AACAB
It outputs AAAABACCAB instead of AAAABACABC.
I don't know what is wrong with the algorithm and why it is failing with other test cases?
Update:
As per #squeamishossifrage comments If I add
($str1,$str2)=sort{$a cmp $b}($str1,$str2);
The results become same irrespective of user-inputs but still the test-case fails.
The problem is in your handling of the equal characters. Take the following example:
ACBA
BCAB
When faced with two identical characters (C in my example), you naïvely chose the one from the first string, but that's not always correct. You need to look ahead to break ties. You may even need to look many characters ahead. In this case, next character after C of the second string is lower than the next character of the first string, so you should take the C from the second string first.
By leaving the strings as strings, a simple string comparison will compare as many characters as needed to determine which character to consume.
sub lexi {
my ($str1, $str2) = #_;
utf8::downgrade($str1); # Makes sure length() will be fast
utf8::downgrade($str2); # since we only have ASCII letters.
my $final_string = "";
while (length($str2) && length($str1)) {
$final_string .= substr($str1 le $str2 ? $str1 : $str2, 0, 1, '');
}
$final_string .= $str1;
$final_string .= $str2;
print $final_string, "\n";
}
Too little rep to comment thus the answer:
What you need to do is to look ahead if the two characters match. You currently do a simple le match and in the case of
ZABB
ZAAA
You'll get ZABBZAA since the first match Z will be le Z. So what you need to do (a naive solution which most likely won't be very effective) is to keep looking as long as the strings/chars match so:
Z eq Z
ZA eq ZA
ZAB gt ZAA
and at that point will you know that the second string is the one you want to pop from for the first character.
Edit
You updated with sorting the strings, but like I wrote you still need to look ahead. The sorting will solve the two above strings but will fail with these two:
ZABAZA
ZAAAZB
ZAAAZBZABAZA
Because here the correct answer is ZAAAZABAZAZB and you can't find that will simply comparing character per character

String partitioning (converting complex string to an array) in perl

There is a large string s, that contains item codes which are comma delimited.
e.g.:
$s="90320,328923,SKJS32767,DSIKUDIU,829EUE,AUSIUD0Q897,AJIUE98,
387493420DA,93RE,AKDJ93,SADI983,90439,JADKJ84";
In my application these strings are passed to a function, which returns the price of these items, i.e. the output of the function is corresponding price for the item code input.
However, due to certain limitations, the maximum length of $s should not exceed 16. If the length of $s exceeds 16, then an exception is thrown. Thus, these strings should be partitioned into an array, such that, the length of each element of array is less than or equal to 16.
e.g: After partitioning $s, the array is:
$Arr[0]='90320,328923',#Note First 16 char is 0320,328923,SK.
However, SK is neglected as its an incomplete(being partial) item code.
$Arr[1]='SKJS32767',
$Arr[2]='DSIKUDIU,829EUE',
$Arr[3]='AUSIUD0Q897',
$Arr[4]='AJIUE98',
$Arr[5]='387493420DA,93RE'
For a given $s, the function should return an array, following the constraints noted above.
My approach has been to use the substr function, and extract a string up to a 16 offset, from an updated position index. Can it be done in a better way?
This is very simple using a global /g regular expression match.
This program demonstrates. The regex pattern looks for as many characters as possible up to a maximum of sixteen that must be followed by a comma or the end of the string.
However, my first thought was the same as RobEarl's comment - why not just put one field from the string into each element of the array? Is there really a need to pack more than one into an element just because it is possible?
use strict;
use warnings;
use 5.010;
my $s = '90320,328923,SKJS32767,DSIKUDIU,829EUE,AUSIUD0Q897,AJIUE98,387493420DA,93RE,AKDJ93,SADI983,90439,JADKJ84';
my #partitions;
while ( $s =~ /\G(.{0,16})(?:,|\z)/g ) {
push #partitions, $1;
}
say for #partitions;
output
90320,328923
SKJS32767
DSIKUDIU,829EUE
AUSIUD0Q897
AJIUE98
387493420DA,93RE
AKDJ93,SADI983
90439,JADKJ84
You need to look at the length of the current string plus the current article number to determine if it is too long.
Split the long string into single articles. Concatenate the last element of the new list of strings if it's below 17 chars or push the article number as a fresh string into the list.
my $s="90320,328923,SKJS32767,DSIKUDIU,829EUE,AUSIUD0Q897,AJIUE98,387493420DA,93RE,AKDJ93,SADI983,90439,JADKJ84";
my #items = split /,/, $s;
my #strings = ( shift #items );
while ( my $item = shift #items ) {
if ( length($strings[-1]) + length($item) > 15) { # 15 because of the comma
push #strings, $item;
} else {
$strings[-1] .= ',' . $item;
}
}
dd \#strings;
__END__
[
"90320,328923",
"SKJS32767",
"DSIKUDIU,829EUE",
"AUSIUD0Q897",
"AJIUE98",
"387493420DA,93RE",
"AKDJ93,SADI983",
"90439,JADKJ84",
]

Programmatically determining the difference between a comma-separated list and a paragraph

I am working on a data migration where on the old system the users were allowed to enter their interests in a large text-field with no formatting instructions followed at all. As a result some wrote in bio format and others wrote in comma-separated list format. There are a few other formats, but these are the primary ones.
Now I know how to identify a comma-separated list (CSL). That is easy enough. But how about determining if a string is a CSL (maybe a short one with two terms or phrases) or just a paragraph someone wrote that contains a comma?
One thought that I have is to automatically ignore strings that contain punctuation and strings that don't contain commas. However, I am concerned that this won't be enough or will leave much to be desired. So I would like to query the community to see what you guys think. In the mean time I will try out my idea.
UPDATE:
Ok guys, I have my algorithm. Here it is below...
MY CODE:
//Process our interests text field and get the list of interests
function process_interests($interests)
{
$interest_list = array();
if ( preg_match('/(\.)/', $interests) 0 && $word_cnt > 0)
$ratio = $delimiter_cnt / $word_cnt;
//If delimiter is found with the right ratio then we can go forward with this.
//Should not be any more the 5 words per delimiter (ratio = delimiter / words ... this must be at least 0.2)
if (!empty($delimiter) && $ratio > 0 && $ratio >= 0.2)
{
//Check for label with colon after it
$interests = remove_colon($interests);
//Now we make our array
$interests = explode($delimiter, $interests);
foreach ($interests AS $val)
{
$val = humanize($val);
if (!empty($val))
$interest_list[] = $val;
}
}
}
return $interest_list;
}
//Cleans up strings a bit
function humanize($str)
{
if (empty($str))
return ''; //Lets not waste processing power on empty strings
$str = remove_colon($str); //We do this one more time for inline labels too.
$str = trim($str); //Remove unused bits
$str = ltrim($str, ' -'); //Remove leading dashes
$str = str_replace(' ', ' ', $str); //Remove double spaces, replace with single spaces
$str = str_replace(array(".", "(", ")", "\t"), '', $str); //Replace some unwanted junk
if ( strtolower( substr($str, 0, 3) ) == 'and')
$str = substr($str, 3); //Remove leading "and" from term
$str = ucwords(preg_replace('/[_]+/', ' ', strtolower(trim($str))));
return $str;
}
//Check for label with colon after it and remove the label
function remove_colon($str)
{
//Check for label with colon after it
if (strstr($str, ':'))
{
$str = explode(':', $str); //If we find it we must remove it
unset($str[0]); //To remove it we just explode it and take everything to the right of it.
$str = trim(implode(':', $str)); //Sometimes colons are still used elsewhere, I am going to allow this
}
return $str;
}
Thank you for all your help and suggestions!
You could, in addition to the filtering you mentioned, create a ratio of number of commas to string length. In CSLs, this ratio will tend to be high, in paragraphs low. You could set some kind of a threshold, and choose based on whether or not the entry has a high enough ratio. Ones with ratios close to the threshold could be marked as prone to error, and could then be check by a moderator.

Resources