Make first letter of words uppercase in a string - string

I have a large array of strings such as this one:
"INTEGRATED ENGINEERING 5 Year (BSC with a Year in Industry)"
I want to capitalise the first letter of the words and make the rest of the words lowercase. So INTEGRATED would become Integrated.
A second spanner in the works - I want an exception to a few words such as and, in, a, with.
So the above example would become:
"Integrated Engineering 5 Year (Bsc with a Year in Industry)"
How would I do this in Go? I can code the loop/arrays to manage the change but the actual string conversion is what I struggle with.

There is a function in the built-in strings package called Title.
s := "INTEGRATED ENGINEERING 5 Year (BSC with a Year in Industry)"
fmt.Println(strings.Title(strings.ToLower(s)))
https://go.dev/play/p/THsIzD3ZCF9

You can use regular expressions for this task. A \w+ regexp will match all the words, then by using Regexp.ReplaceAllStringFunc you can replace the words with intended content, skipping stop words. In your case, strings.ToLower and strings.Title will be also helpful.
Example:
str := "INTEGRATED ENGINEERING 5 Year (BSC with a Year in Industry)"
// Function replacing words (assuming lower case input)
replace := func(word string) string {
switch word {
case "with", "in", "a":
return word
}
return strings.Title(word)
}
r := regexp.MustCompile(`\w+`)
str = r.ReplaceAllStringFunc(strings.ToLower(str), replace)
fmt.Println(str)
// Output:
// Integrated Engineering 5 Year (Bsc with a Year in Industry)
https://play.golang.org/p/uMag7buHG8
You can easily adapt this to your array of strings.

The below is an alternate to the accepted answer, which is now deprecated:
package main
import (
"fmt"
"golang.org/x/text/cases"
"golang.org/x/text/language"
)
func main() {
msg := "INTEGRATED ENGINEERING 5 Year (BSC with a Year in Industry)"
fmt.Println(cases.Title(language.English, cases.Compact).String(msg))
}

In Go 1.18 strings.Title() is deprecated.
Here you can read the following to know what to use now
you should use cases.Title instead.

Well you didn't specify the language you're using, so I'll give you a general answer. You have an array with a bunch of strings in it. First I'd make the entire string lower case, then just go through each character in the string (capitalize the first one, rest stay lower case). At this point you need to look for the space, this will help you divide up the words in each string. The first character after finding a space is obviously a different word and should be capitalized. You can verify the next word isn't and in with Or a as well.
I'm not at a computer so I can't give to a specific example, but I hope this gets to in the right direction at least

Related

Basic string slicing from indices

I will state the obvious that I am a beginner. I should also mention that I have been coding in Zybooks, which affects things. My textbook hasn't helped me much
I tried sub_lyric= rhyme_lyric[ : ]
Zybooks should be able to input an index number can get only that part of the sentence but my book doesnt explain how to do that. If it throws a [4:7] then it would output cow. Hopefully I have exolained everything well.
You need to set there:
sub_lyric = rhyme_lyric[start_index:end_index]
The string is as a sequence of characters and you can use string slicing to extract any sub-text from the main one. As you have observed:
sub_lyric = rhyme_lyric[:]
will copy the entire content of rhyme_lyric to sub_lyric.
To select only a portion of the text, specify the start_index (strings start with index 0) to end_index (not included).
sub_lyric = rhyme_lyric[4:7]
will extract characters in rhyme_lyric from position 4 (included) to position 7 (not included) so the result will be cow.
You can check more on string slicing here: Python 3 introduction

Perl Morgan and a String?

I am trying to solve this problem on hackerrank:
So the problem is:
Jack and Daniel are friends. Both of them like letters, especially upper-case ones.
They are cutting upper-case letters from newspapers, and each one of them has their collection of letters stored in separate stacks.
One beautiful day, Morgan visited Jack and Daniel. He saw their collections. Morgan wondered what is the lexicographically minimal string, made of that two collections. He can take a letter from a collection when it is on the top of the stack.
Also, Morgan wants to use all the letters in the boys' collections.
This is my attempt in Perl:
#!/usr/bin/perl
use strict;
use warnings;
chomp(my $n=<>);
while($n>0){
chomp(my $string1=<>);
chomp(my $string2=<>);
lexi($string1,$string2);
$n--;
}
sub lexi{
my($str1,$str2)=#_;
my #str1=split(//,$str1);
my #str2=split(//,$str2);
my $final_string="";
while(#str2 && #str1){
my $st2=$str2[0];
my $st1=$str1[0];
if($st1 le $st2){
$final_string.=$st1;
shift #str1;
}
else{
$final_string.=$st2;
shift #str2;
}
}
if(#str1){
$final_string=$final_string.join('',#str1);
}
else{
$final_string=$final_string.join('',#str2);
}
print $final_string,"\n";
}
Sample Input:
2
JACK
DANIEL
ABACABA
ABACABA
The first line contains the number of test cases, T.
Every next two lines have such format: the first line contains string A, and the second line contains string B.
Sample Output:
DAJACKNIEL
AABABACABACABA
But for Sample test-case it is giving right results while it is giving wrong results for other test-cases. One case for which it gives an incorrect result is
1
AABAC
AACAB
It outputs AAAABACCAB instead of AAAABACABC.
I don't know what is wrong with the algorithm and why it is failing with other test cases?
Update:
As per #squeamishossifrage comments If I add
($str1,$str2)=sort{$a cmp $b}($str1,$str2);
The results become same irrespective of user-inputs but still the test-case fails.
The problem is in your handling of the equal characters. Take the following example:
ACBA
BCAB
When faced with two identical characters (C in my example), you naïvely chose the one from the first string, but that's not always correct. You need to look ahead to break ties. You may even need to look many characters ahead. In this case, next character after C of the second string is lower than the next character of the first string, so you should take the C from the second string first.
By leaving the strings as strings, a simple string comparison will compare as many characters as needed to determine which character to consume.
sub lexi {
my ($str1, $str2) = #_;
utf8::downgrade($str1); # Makes sure length() will be fast
utf8::downgrade($str2); # since we only have ASCII letters.
my $final_string = "";
while (length($str2) && length($str1)) {
$final_string .= substr($str1 le $str2 ? $str1 : $str2, 0, 1, '');
}
$final_string .= $str1;
$final_string .= $str2;
print $final_string, "\n";
}
Too little rep to comment thus the answer:
What you need to do is to look ahead if the two characters match. You currently do a simple le match and in the case of
ZABB
ZAAA
You'll get ZABBZAA since the first match Z will be le Z. So what you need to do (a naive solution which most likely won't be very effective) is to keep looking as long as the strings/chars match so:
Z eq Z
ZA eq ZA
ZAB gt ZAA
and at that point will you know that the second string is the one you want to pop from for the first character.
Edit
You updated with sorting the strings, but like I wrote you still need to look ahead. The sorting will solve the two above strings but will fail with these two:
ZABAZA
ZAAAZB
ZAAAZBZABAZA
Because here the correct answer is ZAAAZABAZAZB and you can't find that will simply comparing character per character

String matching without using builtin functions

I want to search for a query (a string) in a subject (another string).
The query may appear in whole or in parts, but will not be rearranged. For instance, if the query is 'da', and the subject is 'dura', it is still a match.
I am not allowed to use string functions like strfind or find.
The constraints make this actually quite straightforward with a single loop. Imagine you have two indices initially pointing at the first character of both strings, now compare them - if they don't match, increment the subject index and try again. If they do, increment both. If you've reached the end of the query at that point, you've found it. The actual implementation should be simple enough, and I don't want to do all the work for you ;)
If this is homework, I suggest you look at the explanation which precedes the code and then try for yourself, before looking at the actual code.
The code below looks for all occurrences of chars of the query string within the subject string (variables m; and related ii, jj). It then tests all possible orders of those occurrences (variable test). An order is "acceptable" if it contains all desired chars (cond1) in increasing positions (cond2). The result (variable result) is affirmative if there is at least one acceptable order.
subject = 'this is a test string';
query = 'ten';
m = bsxfun(#eq, subject.', query);
%'// m: test if each char of query equals each char of subject
[ii jj] = find(m);
jj = jj.'; %'// ii: which char of query is found within subject...
ii = ii.'; %'// jj: ... and at which position
test = nchoosek(1:numel(jj),numel(query)).'; %'// test all possible orders
cond1 = all(jj(test) == repmat((1:numel(query)).',1,size(test,2)));
%'// cond1: for each order, are all chars of query found in subject?
cond2 = all(diff(ii(test))>0);
%// cond2: for each order, are the found chars in increasing positions?
result = any(cond1 & cond2); %// final result: 1 or 0
The code could be improved by using a better approach as regards to test, i.e. not testing all possible orders given by nchoosek.
Matlab allows you to view the source of built-in functions, so you could always try reading the code to see how the Matlab developers did it (although it will probably be very complex). (thanks Luis for the correction)
Finding a string in another string is a basic computer science problem. You can read up on it in any number of resources, such as Wikipedia.
Your requirement of non-rearranging partial matches recalls the bioinformatics problem of mapping splice variants to a genomic sequence.
You may solve your problem by using a sequence alignment algorithm such as Smith-Waterman, modified to work with all English characters and not just DNA bases.
Is this question actually from bioinformatics? If so, you should tag it as such.

MATLAB string handling

I want to calculate the frequency of each word in a string. For that I need to turn string into an array (matrix) of words.
For example take "Hello world, can I ask you on a date?" and turn it into
['Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?']
Then I can go over each entry and count every appearance of a particular word.
Is there a way to make an array (matrix) of words in MATLAB, instead of array of just chars?
Here is a little simpler regexp:
words = regexp(s,'\w+','match');
\w here means any symbol that can appear in words (including underscore).
Notice that the last question mark will not be included. Do you need it for counting words actually?
Regular expressions
s = 'Hello world, can I ask you on a date?'
slist = regexp(s, '[^ ]*', 'match')
yield
slist =
'Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?'
Another way to do it is like this:
s = cell(java.lang.String('Hello world, can I ask you on a date?').split('[^\w]+'));
I.e. by creating a Java String object and using its methods to do the work, then converting back to a cell array of strings. Not necessarily the best way to do a job this simple, but Java has a rich library of string handling methods & classes that can come in handy.
Matlab's ability to switch into Java at the drop of a hat can come in handy sometimes - for example, when parsing & writing XML.

function to confirm the presence of both letters and numbers/ Ignoring excedents

So, I'm trying to build up a program with MATLAB according to some indications from my teacher and I came up with some obstacles which would give me a better grade if I could get them right. Here they are:
The user is asked to insert a string but it can't have more than 20 characters. If it does, the excedents will be ignored and the string is saved with the first 20 characters the user inserted. How do I ignore the excedents in a string and save it anyway?
isletter is a function that tells us if the elements are all letters. In this program, the user is asked to insert a string that needs to include both numbers and letters, so that strings with just letters or just numbers are excluded, and then I'll use a while to keep asking for a string with these characteristics.
Could you please help me? This is my first semester with MATLAB. Thank you!
If you want to disallow characters other than letters and numbers (i.e. '/#!' or whitespace) and require that the string they enter has to have at least 1 letter and 1 number, then you can use the ISSTRPROP function (which is more general than ISLETTER) to check for other types of characters. The idea to use INPUTDLG to prompt for the string (as suggested in Aabaz's answer) is a good one, so here's a nice condensed solution using INPUTDLG that achieves what you want:
answer = ''; %# Initialize answer to be an empty string
while any(~isstrprop(answer, 'alphanum')) || ... %# Check for alphanumeric chars
~any(isletter(answer)) || ... %# Check for at least 1 letter
~any(isstrprop(answer, 'digit')) %# Check for at least 1 number
answer = inputdlg('Enter string:'); %# Prompt for input
answer = answer{1}(1:min(20, end)); %# Trim answer to max of 20 chars
end
Note how the functions MIN and END are used to trim the string to 20 characters.
For the first part of your problem you can use the Matlab function inputdlg which prompts a dialog box asking for user input. Then you can trim the input as you like.
For the second part of your problem the function isletter that you mentioned will tell you for each character individually if they are alphabetic letters, so you could sum that result and check if it is between 1 and 19 for example. That will tell you that your string contains both letters and numbers.
Finally, you can put your code inside a while loop and change a variable when your conditions are met so that you can break outside of the loop.
This example code demonstrates this:
tryagain=1;
while(tryagain)
answer=inputdlg('Insert a 20 character string that contains both letters and numbers','User input');
answer=answer{1};
if(numel(answer)>20)
answer=answer(1:20);
end
letters=sum(isletter(answer));
numbers=sum(~arrayfun(#(x)isempty(str2num(x)),answer));
if(letters>0 && numbers>0)
tryagain=0;
end
end

Resources