Convert string to sentence case but leave acronyms untouched - string

This is not a language-specific question.
I have a string in ALL CAPS. This string comes in from a separate source and for some reason is always in all caps.
I've been given the task of making the string a little more reader-friendly so I decided to just slap a sentence case converter method on it using simple regex.
The thing is, there are a lot of acronyms used in this string and I would like to keep them unaffected. Things like country codes(US, CA, JP, FR, etc...), or airport codes(LAX, LGA) and sometimes many others.
Now I'm guessing I would first need a list of the acronyms in a database or something, of all the possible airport codes, country codes and a list of commonly used acronyms like ETA, COD, etc...
Once I have this database created, how can I apply it to the string in question?? How can I prevent the word "us" being changed to US and vice-versa?? What I basically wanna know is, how do I take what's in the DB and apply all the necessary changes to the string?
Remember, I get the original string in ALL CAPS so there's no way to differentiate.
Any ideas would be greatly appreciated!!
Thanks!!!

Something close to this can be done with ActiveSupport::Inflector, which provides the titleize method (which does the work for String.titleize).
First, define your own inflections in an initializer.
# config/initializers/inflections.rb
ActiveSupport::Inflector.inflections do |inflect|
inflect.acronym 'US'
end
Restart your app to pick up the change. Now titleize knows how to handle "US". Fire up a Rails console to check it out:
> "us".titleize
=> "US"
Next, check out the source code for titleize. Once you understand it, reopen the Inflector class in an initializer and define your own method that doesn't capitalize the first letter of each word. Call it something nifty, like decapitalize.
module ActiveSupport::Inflector
def decapitalize(word)
humanize(underscore(word)) # you may enhance this a bit
end
end
class String
def decapitalize
ActiveSupport::Inflector.decapitalize(self)
end
end
Caveats and Limitations
You may need to tweak the code, but I think it's close.
Here are some sentences this solution won't handle very well:
> "US STATES VISITED BY US".titleize
=> "US States Visited By US"
> "COLUMBIA (CO) EXPORTS ARE PROCESSED BY ACME BUILDING CO.".decapitalize
=> "Columbia (CO) exports are processed by acme building CO."

Related

Terraform string manipulation to remove X elements

I have several strings that I just want to get a subset of like so:
my-bucket-customer-staging-ie-app-data
my-bucket-customer2-longname-prod-uk-app-data
and I just need to get the customers name from the string, so with the above examples I'd be left with
customer
customer2-longname
There's probably a simple way of doing this with regex although I've failed miserably in my attempts.
I'm able to strip the first part of the string using
trimprefix("my-bucket-customer-staging-ie-app-data", "my-bucket-")
trimprefix("my-bucket-customer-longname-prod-uk-app-data", "my-bucket-")
resulting in
customer-staging-ie-app-data
customer-longname-prod-uk-app-data
However Terraform's trimsuffix won't work as there can be several different regions/environments used.
What I'd like to do is slice the string and ignore the last 4 elements, which should then return the customer name regardless of whether it contains an additional delimiter in it.
Something like this captures the customer, however fails for long customer names:
element(split("-",trimprefix("my-bucket-customer-staging-uk-app-data", "my-bucket-")), length(split("-",trimprefix("my-bucket-customer-staging-uk-app-data", "my-bucket-")))-5)
customer
and is also quite messy.
Is there a more obvious solution I'm missing
I believe it's what regex is for.
> try(one(regex("\\w*-\\w*-(\\w*(?:-\\w*)*)-\\w*-\\w*-\\w*-\\w*","my-bucket-customer-staging-ie-app-data")),"")
"customer"
> try(one(regex("\\w*-\\w*-(\\w*(?:-\\w*)*)-\\w*-\\w*-\\w*-\\w*","my-bucket-customer2-longname-prod-uk-app-data")),"")
"customer2-longname"
> try(one(regex("\\w*-\\w*-(\\w*(?:-\\w*)*)-\\w*-\\w*-\\w*-\\w*","my-bucket-customer2-longname-even-longer-prod-uk-app-data")),"")
"customer2-longname-even-longer"
Reference: https://www.terraform.io/language/functions/regex

Way to find a number at the end of a string in Smalltalk

I have different commands my program is reading in (i.e., print, count, min, max, etc.). These words can also include a number at the end of them (i.e., print3, count1, min2, max6, etc.). I'm trying to figure out a way to extract the command and the number so that I can use both in my code.
I'm struggling to figure out a way to find the last element in the string in order to extract it, in Smalltalk.
You didn't told which incarnation of Smalltalk you use, so I will explain what I would do in Pharo, that is the one I'm familiar with.
As someone that is playing with Pharo a few months at most, I can tell you the sheer amount of classes and methods available can feel overpowering at first, but the environment actually makes easy to find things. For example, when you know the exact input and output you want, but doesn't know if a method already exists somewhere, or its name, the Finder actually allow you to search by giving a example. You can open it in the world menu, as shown bellow:
By default it seeks selectors (method names) matching your input terms:
But this default is not what we need right now, so you must change the option in the upper right box to "Examples", and type in the search field a example of the input, followed by the output you want, both separated by a ".". The input example I used was the string 'max6', followed by the desired result, the number 6. Pharo then gives me a list of methods that match that:
To get what would return us the text part, you can make a new search, changing the example output from number 6 to the string 'max':
Fortunately there is several built-in methods matching the description of your problem.
There are more elegant ways, I suppose, but you can make use of the fact that String>>#asNumber only parses the part it can recognize. So you can do
'print31' reversed asNumber asString reversed asNumber
to give you 31. That only works if there actually is a number at the end.
This is one of those cases where we can presume the input data has a specific form, ie, the only numbers appear at the end of the string, and you want all those numbers. In that case it's not too hard to do, really, just:
numText := 'Kalahari78' select: [ :each | each isDigit ].
num := numText asInteger. "78"
To get the rest of the string without the digits, you can just use this:
'Kalahari78' withoutTrailingDigits. "Kalahari"6
As some of the Pharo "OGs" pointed out, you can take a look at the String class (just type CMD-Return, type in String, hit Return) and you will find an amazing number of methods for all kinds of things. Usually you can get some ideas from those. But then there are times when you really just need an answer!

How to compare Strings and put it into another program?

i´ve got small problem and before I spend even more time in trying to solve it i´d like to know if what I want to do is even possible ( and maybe input on how to do it^^).
My problem:
I want to take some text and then split it into different strings at every whitespace (for example "Hello my name is whatever" into "Hello" "my" "name" "is" "whatever").
Then I want to set every string with it´s own variable so that I get something alike to a= "Hello" b= "my" and so on. Then I want to compare the strings with other strings (the idea is to get addresses from applications without having to search through them so I thought I could copy a telephone book to define names and so on) and set matching input to variables like Firstname , LastName and street.
Then, and here comes the "I´d like to know if it´s possible" part I want it to put it into our database, this means I want it to copy the string into a text field and then to go to the next field via tab. I´ve done something like this before with AutoIT but i´ve got no idea how to tell AutoIT whats inside the strings so I guess it must be done through the programm itself.
I´ve got a little bit of experience with c++, python and BATCH files so it would be nice if anyone could tell me if this can even be done using those languages (and I fear C++ can do it and I´m just to stupid to do so).
Thanks in advance.
Splitting a string is very simple, there is usually a built in method called .split() which will help you, the method varies from language to language.
When you've done a split, it will be assigned to an array, you can then use an index to get the variables, for example you'd have:
var str = "Hello, my name is Bob";
var split = str.split(" ");
print split[0]; // is "Hello,"
print split[1]; // is "my" etc
You can also use JSON to return data so you could have an output like
print split["LastName"];
What you're asking for is defiantly possible.
Some links that could be useful:
Split a string in C++?
https://code.google.com/p/cpp-json/

How to get (translatable) strings from specific domain with POEdit

I have been trying for hours finding a way to setup POEdit so that it can grab the text from specific domain only
My gettext function looks like this:
function ri($id, $parameters = array(), $domain = 'default', $locale = null)
A sample call:
echo ri('Text %xyz%', array('%xyz%'=>100), 'myDomain');
I will need to grab only the text with the domain myDomain to translate, or at least I want POEdit to put these texts into domain specific files. Is there a way to do it?
I found several questions that are similar but the answers don't really tell me what to do (I think I'm such a noob it must be explained in plain English for me to understand):
How to set gettext text domain in Poedit?
How to get list of translatable messages
So I finally figured it out after days of searching, I finally found the answer here:
http://sourceforge.net/mailarchive/message.php?msg_id=27691818
xgettext recognizes context in strings, and gives a msgctxt field in the *.pot file, which is recognized by translation software as a
context and is shown as such (check image of Pootle showing context
below)
This can be done in 3 ways:
String in code should be in the format _t('context','string'); and xgettext invocation should be in the form --keyword=_t:1c,2
(this basically explains to xgettext that there are 2 arguments in
the keyword function, 1st one is context, 2nd one is string)
String in code in the format _t('string','context'); and xgettext invocation should be in the form --keyword=_t:1,2c
String in the code should be as _t('context|string') and xgettext invocation should be in the form --keyword=_t:1g
So to answer my own question, I added this to the "sources keywords" tab of Poedit:
ri:1,3c
ri is the function name, 1 is the location of the stringid, 3 is the location of the context/domain
Hope this helps someone else, I hate all these cryptic documents
(This is a repost of my answer to the same thing here.)
Neither GNU gettext tools nor Poedit (which uses them) support this particular misuse of gettext.
In gettext, domain is roughly “a piece of software” — a program, a library, a plugin, a theme. As such, it typically resides in a single directory tree and is alone there — or at the very least, if you have multiple pieces=domains, you have them organized sanely into some subdirectories that you can limit the extraction to.
Mixing and matching domains within a single file as you do is not how gettext was intended to be used, and there’s no reasonable solution to handle it other than using your own helper function, e.g. by wrapping all myDomain texts into __mydomain (which you must define, obviously) and adding that to the list of keywords in Poedit when extracting for myDomain and not adding that to the list of keywords for other domains' files.

sas generate all possible miss spelling

Does any one know how to generate the possible misspelling ?
Example : unemployment
- uemployment
- onemploymnet
-- etc.
If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.
Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.
data misspell;
input misspell $16.;
length misspell string $16.;
retain string "unemployment";
GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
label GED='Generalized Edit Distance';
var misspell string GED;
run;
Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).
Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.
If you are looking for a general spell checker, SAS does have proc spell.
It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.
filename name temp lrecl=256;
options caps;
data _null_;
file name;
informat name $256.;
input name &;
put name;
cards;
uemployment
onemploymnet
;
proc spell in=name
dictionary=SASHELP.BASE.NAMES
suggest;
run;
options nocaps;

Resources