Convert string to title case - Emacs Lisp - string

I am looking for an elisp function that accepts a string and returns the same in title case (i.e., all words capitalized, except for "a", "an", "on", "the", etc.).
I found this script, which requires a marked region.
Only, I need a function that accepts a string variable, so I can use it with replace-regex. I would love to see a version of the above script that can accept either or...

Something like this?
(progn
(defun title-case (input) ""
(let* (
(words (split-string input))
(first (pop words))
(last (car(last words)))
(do-not-capitalize '("the" "of" "from" "and" "yet"))) ; etc
(concat (capitalize first)
" "
(mapconcat (lambda (w)
(if (not(member (downcase w) do-not-capitalize))
(capitalize w)(downcase w)))
(butlast words) " ")
" " (capitalize last))))
(title-case "the presentation of this HEADING OF my own from my keyboard and yet\n"))

I'd say that the script you linked to does a good job at title casing. You can use it as-is.
That leaves us with two more questions:
How can we make it accept a string?
How can we write a function which accepts both a string or a (marked) region?
Working with strings in Emacs is idiomatically done in temporary buffers which are not displayed. You could write a wrapper like this:
(defun title-capitalization-string (s)
(with-temp-buffer
(erase-buffer)
(insert s)
(title-capitalization (point-min)
(point-max))
(buffer-substring-no-properties (point-min)
(point-max))))
Now, for a function which magically does what you mean, consider something like this:
(defun title-capitalization-dwim (&optional arg)
(interactive)
(cond
(arg
(title-capitalization-string arg))
((use-region-p)
(title-capitalization-string
(buffer-substring-no-properties (region-beginning)
(region-end))))
(t
(title-capitalization-string
(buffer-substring-no-properties (point-at-bol)
(point-at-eol))))))
It accepts an optional argument, or an active region or falls back to the text on the current line. Note that this function is not really useful when used interactively, because it doesn't show any effects. Hat tip also to https://www.emacswiki.org/emacs/titlecase.el
License
I put all this code under the Apache License 2.0 and the GPL 2.0 (or later at your option) in addition to the site's default license.

Use M-x
upcase-initials-region is an interactive built-in function in ‘C
source code’.
(upcase-initials-region BEG END)
Upcase the initial of each word in the region. This means that each
word’s first character is converted to either title case or upper
case, and the rest are left unchanged. In programs, give two
arguments, the starting and ending character positions to operate on.

Related

what is the interactive REPL IO function?

I have been learning Common Lisp for a while, there was a question I have met that
how I can implement such a function which allows user to input some words until user input exit.
(actually I want to know what kind of command line interactive function APIs fit such requirement)
e.g.
prompt "please input a word: " in the REPL, then store user inputs into a global my-words , exit when user input "exit".
You specification is a little bit incomplete (e.g. what constitutes a word in your problem? What if the user add multiple words? What if the input is empty?). Here below I am using CL-PPCRE to split the input into different words and add them all at once, because it seems useful in general. In your case you might want to add more error checking.
If you want to interact with the user, you should read and write from and to the *QUERY-IO* stream. Here I'll present a version with a global variables, as you requested, as well as another one without side-effects (apart from input/output).
With a global variable
Define the global variable and initialize it with an empty adjustable array.
I am using an array so that it is easy to add words at the end, but you could also use a queue.
(defvar *my-words* (make-array 10 :fill-pointer 0 :adjustable t))
The following function mutates the global variable:
(defun side-effect-word-repl ()
(loop
(format *query-io* "~&Please input a word: ")
(finish-output *query-io*)
(let ((words (ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
(read-line *query-io*))))
(dolist (w words)
(when (string-equal w "exit") ; ignore case
(return-from side-effect-word-repl))
(vector-push-extend w *my-words*)))))
The LOOP uses the simple syntax where there are only expressions and no loop-specific keywords. I first write the prompt to *QUERY-IO*. The ~& FORMAT directive performs the same operation as FRESH-LINE. As Rainer pointed out in comments, we have to call FINISH-OUTPUT to ensure the message is effectively printed before the user is expected to reply.
Then, I read a whole line from the same bidirectional stream, and split it into a list of words, where a word is a string of non-whitespace characters.
With DOLIST, I iterate over the list and add words into the global array with VECTOR-PUSH-EXTEND. But as soon as I encouter "exit", I terminate the loop; since I rely on STRING-EQUAL, the test is done case-insensitively.
Side-effect free approach
Having a global variable as done above is discouraged. If you only need to have a prompt which returns a list of words, then the following will be enough. Here, I use the PUSH/NREVERSE idiom to built the resulting list of words.
(defun pure-word-repl ()
(let ((result '()))
(loop
(format *query-io* "~&Please input a word: ")
(finish-output *query-io*)
(let ((words (ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
(read-line *query-io*))))
(dolist (w words)
(when (string-equal w "exit")
(return-from pure-word-repl (nreverse result)))
(push w result))))))
Note about words
As jkiiski commented, it might be better to split words at :word-boundary. I tried different combinations and the following result seems satisfying with weird example strings:
(mapcan (lambda (string)
(ppcre:split :word-boundary string))
(ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
"amzldk 'amlzkd d;:azdl azdlk"))
=> ("amzldk" "'" "amlzkd" "d" ";:" "azdl" "azdlk")
I first remove all whitespaces and split the string into a list of strings, which can contain punctuation marks. Then, each string is itself splitted at :word-boundary, and concatenated with MAPCAN to form a list of separate words. However, I can't really guess what your actual needs are, so you should probably define your own SPLIT-INTO-WORDS function to validate and split an input string.
CL-USER 23 > (progn
(format t "~%enter a list of words:~%")
(finish-output)
(setf my-words (read))
(terpri))
enter a list of words:
(foo bar baz)
or
CL-USER 28 > (loop with word = nil
do
(format t "~%enter a word or exit:~%")
(finish-output)
(setf word (read))
(terpri)
until (eql word 'exit)
collect word)
enter a word or exit:
foo
enter a word or exit:
bar
enter a word or exit:
baz
enter a word or exit:
exit
(FOO BAR BAZ)

Lisp - Displaying a String to List

I've been looking for a way to convert user input (read-line) to a list of atoms that I can manipulate more easily.
For example:
SendInput()
This is my input. Hopefully this works.
and I want to get back..
(This is my input. Hopefully this works.)
Eventually it'd be ideal to remove any periods, commas, quotes, etc. But for now I just wanna store the users input in a list (NOT AS A STRING)
So. For now i'm using
(setf stuff (coerce (read-line) 'list))
and that returns to me as...
(#\T #\h #\i #\s #\Space #\i #\s #\Space #\m #\y #\Space #\i #\n #\p #\u #\t #. #\Space #\H #\o #\p #\e #\f #\u #\l #\l #\y #\Space #\t #\h #\i #\s #\Space #\w #\o #\r #\k #\s #.)
So now i'm on the hunt for a function that can take that list and format it properly...
Any help would be greatly appreciated!
Rainer's answer is better in that it's a bit more lightweight (and general), but you could also use CL-PPCRE , if you already have it loaded (I know I always do).
You can use SPLIT directly on the string you get from READ-LINE, like so:
(cl-ppcre:split "[ .]+" (read-line))
(Now you have two problems)
What you want to do is to split a sequence of characters (a String) into a list of smaller strings or symbols.
Use some of the split sequence functions available from a Lisp library (see for example cl-utilities).
In LispWorks, which comes with a SPLIT-SEQUENCE function) I would for example write:
CL-USER 8 > (mapcar #'intern
(split-sequence '(#\space #\.)
"This is my input. Hopefully this works."
:coalesce-separators t))
(|This| |is| |my| |input| |Hopefully| |this| |works|)
Remember, to get symbols with case preserving names, they are surrounded by vertical bars. The vertical bars are not part of the symbol name - just like the double quotes are not part of a string - they are delimiters.
You can also print it:
CL-USER 19 > (princ (mapcar #'intern
(split-sequence '(#\space #\.)
"This is my input. Hopefully this works."
:coalesce-separators t)))
(This is my input Hopefully this works)
(|This| |is| |my| |input| |Hopefully| |this| |works|)
Above prints the list. The first output is the data printed by PRINC and the second output is done by the REPL.
If you don't want symbols, but strings:
CL-USER 9 > (split-sequence '(#\space #\.)
"This is my input. Hopefully this works."
:coalesce-separators t)
("This" "is" "my" "input" "Hopefully" "this" "works")

Make String from Sequence of Characters

This code does not work as I expected. Could you please explain why?
(defn make-str [s c]
(let [my-str (ref s)]
(dosync (alter my-str str c))))
(defn make-str-from-chars
"make a string from a sequence of characters"
([chars] make-str-from-chars chars "")
([chars result]
(if (== (count chars) 0) result
(recur (drop 1 chars) (make-str result (take 1 chars))))))
Thank you!
This is very slow & incorrect way to create string from seq of characters. The main problem, that changes aren't propagated - ref creates new reference to existing string, but after it exits from function, reference is destroyed.
The correct way to do this is:
(apply str seq)
for example,
user=> (apply str [\1 \2 \3 \4])
"1234"
If you want to make it more effective, then you can use Java's StringBuilder to collect all data in string. (Strings in Java are also immutable)
You pass a sequence with one character in it to your make-str function, not the character itself. Using first instead of take should give you the desired effect.
Also there is no need to use references. In effect your use of them is a gross misuse of them. You already use an accumulator in your function, so you can use str directly.
(defn make-str-from-chars
"make a string from a sequence of characters"
([chars] (make-str-from-chars chars ""))
([chars result]
(if (zero? (count chars))
result
(recur (drop 1 chars) (str result (first chars))))))
Of course count is not very nice in this case, because it always has to walk the whole sequence to figure out its length. So you traverse the input sequence several times unnecessarily. One normally uses seq to identify when a sequence is exhausted. We can also use next instead of drop to save some overhead of creating unnecessary sequence objects. Be sure to capture the return value of seq to avoid overhead of object creations later on. We do this in the if-let.
(defn make-str-from-chars
"make a string from a sequence of characters"
([chars] (make-str-from-chars chars ""))
([chars result]
(if-let [chars (seq chars)]
(recur (next chars) (str result (first chars)))
result)))
Functions like this, which just return the accumulator upon fully consuming its input, cry for reduce.
(defn make-str-from-chars
"make a string from a sequence of characters"
[chars]
(reduce str "" chars))
This is already nice and short, but in this particular case we can do even a little better by using apply. Then str can use the underlying StringBuilder to its full power.
(defn make-str-from-chars
"make a string from a sequence of characters"
[chars]
(apply str chars))
Hope this helps.
You can also use clojure.string/join, as follows:
(require '[clojure.string :as str] )
(assert (= (vec "abcd") [\a \b \c \d] ))
(assert (= (str/join (vec "abcd")) "abcd" ))
There is an alternate form of clojure.string/join which accepts a separator. See:
http://clojuredocs.org/clojure_core/clojure.string/join

Call function based on a string

I am passing in command line arguments to my Lisp program and they are formatted like this when they hit my main function:
("1 1 1" "dot" "2 2 2")
I have a dot function (which takes two vectors as arguments) and would like to call it directly from the argument, but this isn't possible because something like (funcall (second args)...) receives "dot" and not dot as the function name.
I tried variations of this function:
(defun remove-quotes (s)
(setf (aref s 0) '""))
to no avail, before realizing that the quotes were not really a part of the string. Is there a simple way to do this, or should I just check each string and then call the appropriate function?
"1 1 1" is a string of five characters: 1, space, 1, space and 1. The double quotes are not part of the string.
("1 1 1" "dot" "2 2 2") is a list of three strings.
There are no " characters above. The " are used to delimit strings in s-expressions.
If you have a dot function you need to tell us what kind of input data it expects.
Does it expect two lists of numbers? Then you have to convert the string "1 1 1" into a list of numbers.
(with-input-from-string (in "1 1 1")
(loop for data = (read in nil in)
until (eq data in)
collect data)))
To get the function DOT from the string "dot" first find the symbol DOT and then get its symbol function.
(symbol-function (find-symbol (string-upcase "dot")))
For find-symbol one might need to specify also the package, if there is a special package where the symbol is in.
Converting a list to a vector then is the next building block.
So you need to convert the arguments for your function to vectors (probably first converting them to lists as I showed above). Then you need to find the function (see above). If you have then the function and the arguments, then you can call the function using FUNCALL or APPLY (whatever is more convenient).
The question is a bit unclear, but as far as I understand it you want, when given the list ("1 1 1" "dot" "2 2 2") as input to evaluate the expression (dot "1 1 1" "2 2 2"). In that case you can do this:
(defun apply-infix (arg1 f arg2)
(apply (intern (string-upcase f)) (list arg1 arg2)))
(defun apply-list-infix (lst)
(apply 'apply-infix lst))
(apply-list-infix '("1 1 1" "dot" "2 2 2"))
funcall does not accept a string as a function designator. You need to give it a symbol instead. What you probably want to do is:
Convert the string to upper case (Lisp symbols are usually upper case, and even though it may look like Lisp is case-insensitive, that's just because the reader upcases all symbols it reads by default) (string-upcase).
Create or find a symbol with the given name (intern). Note that, if *package* is not set according to the package your function's name lives in, you need to supply the package name as the second argument to intern.
For instance (for a function named dot in package cl-user:
(funcall (intern (string-upcase "dot") 'cl-user) ...)

How can I emulate Vim's * search in GNU Emacs?

In Vim the * key in normal mode searches for the word under the cursor. In GNU Emacs the closest native equivalent would be:
C-s C-w
But that isn't quite the same. It opens up the incremental search mini buffer and copies from the cursor in the current buffer to the end of the word. In Vim you'd search for the whole word, even if you are in the middle of the word when you press *.
I've cooked up a bit of elisp to do something similar:
(defun find-word-under-cursor (arg)
(interactive "p")
(if (looking-at "\\<") () (re-search-backward "\\<" (point-min)))
(isearch-forward))
That trots backwards to the start of the word before firing up isearch. I've bound it to C-+, which is easy to type on my keyboard and similar to *, so when I type C-+ C-w it copies from the start of the word to the search mini-buffer.
However, this still isn't perfect. Ideally it would regexp search for "\<" word "\>" to not show partial matches (searching for the word "bar" shouldn't match "foobar", just "bar" on its own). I tried using search-forward-regexp and concat'ing \<> but this doesn't wrap in the file, doesn't highlight matches and is generally pretty lame. An isearch-* function seems the best bet, but these don't behave well when scripted.
Any ideas? Can anyone offer any improvements to the bit of elisp? Or is there some other way that I've overlooked?
Based on your feedback to my first answer, how about this:
(defun my-isearch-word-at-point ()
(interactive)
(call-interactively 'isearch-forward-regexp))
(defun my-isearch-yank-word-hook ()
(when (equal this-command 'my-isearch-word-at-point)
(let ((string (concat "\\<"
(buffer-substring-no-properties
(progn (skip-syntax-backward "w_") (point))
(progn (skip-syntax-forward "w_") (point)))
"\\>")))
(if (and isearch-case-fold-search
(eq 'not-yanks search-upper-case))
(setq string (downcase string)))
(setq isearch-string string
isearch-message
(concat isearch-message
(mapconcat 'isearch-text-char-description
string ""))
isearch-yank-flag t)
(isearch-search-and-update))))
(add-hook 'isearch-mode-hook 'my-isearch-yank-word-hook)
The highlight symbol emacs extension provides this functionality. In particular, the recommend .emacsrc setup:
(require 'highlight-symbol)
(global-set-key [(control f3)] 'highlight-symbol-at-point)
(global-set-key [f3] 'highlight-symbol-next)
(global-set-key [(shift f3)] 'highlight-symbol-prev)
Allows jumping to the next symbol at the current point (F3), jumping to the previous symbol (Shift+F3) or highlighting symbols matching the one under the cursor (Ctrl+F3). The commands continue to do the right thing if your cursor is mid-word.
Unlike vim's super star, highlighting symbols and jumping between symbols are bound to two different commands. I personally don't mind the separation, but you could bind the two commands under the same keystroke if you wanted to precisely match vim's behaviour.
There are lots of ways to do this:
http://www.emacswiki.org/emacs/SearchAtPoint
scottfrazer's answer works well for me, except for words that end in '_' (or perhaps other non-word characters?). I found that the code for light-symbol mode was using a different regex for word boundary depending on the version of emacs, and that fixed it for me. Here is the modified code:
(defconst my-isearch-rx-start
(if (< emacs-major-version 22)
"\\<"
"\\_<")
"Start-of-symbol regular expression marker.")
(defconst my-isearch-rx-end
(if (< emacs-major-version 22)
"\\>"
"\\_>")
"End-of-symbol regular expression marker.")
(defun my-isearch-word-at-point ()
(interactive)
(call-interactively 'isearch-forward-regexp))
(defun my-isearch-yank-word-hook ()
(when (equal this-command 'my-isearch-word-at-point)
(let ((string (concat my-isearch-rx-start
(buffer-substring-no-properties
(progn (skip-syntax-backward "w_") (point))
(progn (skip-syntax-forward "w_") (point)))
my-isearch-rx-end)))
(if (and isearch-case-fold-search
(eq 'not-yanks search-upper-case))
(setq string (downcase string)))
(setq isearch-string string
isearch-message
(concat isearch-message
(mapconcat 'isearch-text-char-description
string ""))
isearch-yank-flag t)
(isearch-search-and-update))))
(add-hook 'isearch-mode-hook 'my-isearch-yank-word-hook)
How about built in commands M-b C-s C-w (start of word,search,word search)
Mickey of Mastering Emacs blog reintroduced a cool "Smart Scan" lib that gives global bindings of M-n and M-p for navigating symbols under the cursor in the buffer. Doesn't affect search register so it's not a * replacement as is, but a clever and usable alternative to the navigation problem.
I have not tried it but there is some code here called Grep-O-Matic.
With this you should be able to do C-* while in isearch mode.
(define-key isearch-mode-map [?\C-*] 'kmk-isearch-yank-thing)
(defun kmk-isearch-yank-thing ()
"Pull next thing from buffer into search string."
(interactive)
(let ((string (regexp-quote (thing-at-point 'word))))
(setq isearch-string
(concat isearch-string "\\")
isearch-message
(concat isearch-message
(mapconcat 'isearch-text-char-description
string ""))
;; Don't move cursor in reverse search.
isearch-yank-flag t))
(setq isearch-regexp t isearch-word nil isearch-success t isearch-adjusted t)
(isearch-search-and-update))
;Here is my version: Emulates Visual Studio/Windows key bindings
; C-F3 - Start searching the word at the point
; F3 searches forward and Shift F3 goes reverse
(setq my-search-wrap nil)
(defun my-search-func (dir)
(interactive)
(let* ((text (car search-ring)) newpoint)
(when my-search-wrap
(goto-char (if (= dir 1) (point-min) (point-max)))
(setq my-search-wrap nil))
(setq newpoint (search-forward text nil t dir))
(if newpoint
(set-mark (if (= dir 1) (- newpoint (length text))
(+ newpoint (length text))))
(message "Search Failed: %s" text) (ding)
(setq my-search-wrap text))))
(defun my-search-fwd () (interactive) (my-search-func 1))
(defun my-search-bwd () (interactive) (my-search-func -1))
(defun yank-thing-into-search ()
(interactive)
(let ((text (if mark-active
(buffer-substring-no-properties (region-beginning)(region-end))
(or (current-word) ""))))
(when (> (length text) 0) (isearch-update-ring text) (setq my-search-wrap nil)
(my-search-fwd))))
(global-set-key (kbd "") 'my-search-fwd) ; Visual Studio like search keys
(global-set-key (kbd "") 'my-search-bwd)
(global-set-key (kbd "") 'yank-thing-into-search)
Since Emacs 24.4, searching a symbol under the cursor is available with the global key sequence M-s .
Here is more information about this key sequence and the function it invokes obtained using the describe system of Emacs (C-h k M-s .):
M-s . runs the command isearch-forward-symbol-at-point (found in
global-map), which is an interactive compiled Lisp function in
‘isearch.el’.
It is bound to M-s ., <menu-bar> <edit> <search> <i-search>
<isearch-forward-symbol-at-point>.
(isearch-forward-symbol-at-point &optional ARG)
Do incremental search forward for a symbol found near point.
Like ordinary incremental search except that the symbol found at point
is added to the search string initially as a regexp surrounded
by symbol boundary constructs \_< and \_>.
See the command ‘isearch-forward-symbol’ for more information.
With a prefix argument, search for ARGth symbol forward if ARG is
positive, or search for ARGth symbol backward if ARG is negative.
Probably introduced at or before Emacs version 24.4.

Resources