Given the following string:
(def text "this is the first sentence . And this is the second sentence")
I wanted to count the number of times a word like "this" appears in the text, by appending the count after each occurrence of the word. Like this:
["this: 1", "is" "the" "first" "sentence" "." "and" "this: 2" ...]
As a first step, I tokenized the string:
(def words (split text #" "))
Then I created a helper function to get the number of times "this" appears in the text:
(defn count-this [x] (count(re-seq #"this" text)))
Finally I tried to use the result of the count-this function inside this loop:
(for [x words]
(if (= x "this")
(str "this: "(apply str (take (count-this)(iterate inc 0))))
x))
Here is what I get:
("this: 01" "is" "the" "first" "sentence" "." "And" "this: 01" "is" ...)
This can be achieved fairly succinctly using reduce to thread a counter through your vector traversal, in addition to building the new strings as needed:
(def text "this is the first sentence. And this is the second sentence.")
(defn notate-occurences [word string]
(->
(reduce
(fn [[count string'] member]
(if (= member word)
(let [count' (inc count)]
[count' (conj string' (str member ": " count'))])
[count (conj string' member)]))
[0 []]
(clojure.string/split string #" "))
second))
(notate-occurences "this" text)
;; ["this: 1" "is" "the" "first" "sentence." "And" "this: 2" "is" "the" "second""sentence."]
(defn split-by-word [word text]
(remove empty?
(flatten
(map #(if (number? %) (str word ": " (+ 1 %)) (clojure.string/split (clojure.string/trim %) #" "))
(butlast (interleave
(clojure.string/split (str text " ") (java.util.regex.Pattern/compile (str "\\b" word "\\b")))
(range)))))))
You need to keep some state as you are going along. reduce, loop/recur and iterate all do this. iterate just transitions from one state to another. Here is the transition function:
(defn transition [word]
(fn [[[head & tail] counted out]]
(let [[next-counted to-append] (if (= word head)
[(inc counted) (str head ": " (inc counted))]
[counted head])]
[tail next-counted (conj out to-append)])))
Then you can use iterate to exercise this function until there is no input left:
(let [in (s/split "this is the first sentence . And this is the second sentence" #" ")
step (transition "this")]
(->> (iterate step [in 0 []])
(drop-while (fn [[[head & _] _ _]]
head))
(map #(nth % 2))
first))
;; => ["this: 1" "is" "the" "first" "sentence" "." "And" "this: 2" "is" "the" "second" "sentence"]
The problem with that approach is that (apply str (take (count-this)(iterate inc 0))) is going to evaluate to the same thing every time.
To exert complete control over variables you generally want to use the loop form.
e.g.
(defn add-indexes [word phrase]
(let [words (str/split phrase #"\s+")]
(loop [src words
dest []
counter 1]
(if (seq src)
(if (= word (first src))
(recur (rest src) (conj dest (str word " " counter)) (inc counter))
(recur (rest src) (conj dest (first src)) counter))
dest))))
user=> (add-indexes "this" "this is the first sentence . And this is the second sentence")
["this 1" "is" "the" "first" "sentence" "." "And" "this 2" "is" "the" "second" "sentence"]
loop allows you to specify the value of every of the loop variables on each pass. So you can decide to change them or not according to your own logic.
If you're willing to dip into Java and maybe do something that feels like cheating, this would work too.
(defn add-indexes2 [word phrase]
(let [count (java.util.concurrent.atomic.AtomicInteger. 1)]
(map #(if (= word %) (str % " " (.getAndIncrement count)) %)
(str/split phrase #"\s+"))))
user=> (add-indexes2 "this" "this is the first sentence . And this is the second sentence")
("this 1" "is" "the" "first" "sentence" "." "And" "this 2" "is" "the" "second" "sentence")
Using the mutable counter may not be pure, but on the other hand, it never escapes the context of the function, so its behavior cannot be changed by external forces.
Usually, you can find a simple way of composing your solution from existing Clojure functions in a very succinct way.
Here's two quite short solutions to your problem. First, if you don't need the result as a sequence, but replacements to the string are ok:
(require '(clojure.string))
(def text "this is the first sentence . And this is the second sentence")
(defn replace-token [ca token]
(swap! ca inc)
(str token ": " #ca))
(defn count-this [text]
(let [counter (atom 0)
replacer-fn (partial replace-token counter)]
(clojure.string/replace text #"this" replacer-fn)))
(count-this text)
; => "this: 1 is the first sentence . And this: 2 is the second sentence"
The above solution makes use of the fact that a function can be supplied to clojure.string/replace.
Second, if you need the result as a sequence, there is some overhead from tokenizing:
(defn count-seq [text]
(let [counter (atom 0)
replacer-fn (partial replace-token counter)
converter (fn [tokens] (map #(if (not= % "this")
%
(replacer-fn %))
tokens))]
(-> text
(clojure.string/split #" ")
(converter))))
(count-seq text)
; => ("this: 1" "is" "the" "first" "sentence" "." "And" "this: 2" "is" "the" "second" "sentence")
The loop-recur pattern is very common for beginning Clojurians, who come from non-functional languages. In most cases, there is a cleaner and more idiomatic solution using functional processing with map, reduce, and friends.
Like other answers have stated, the main issue in your original attempt is the binding of your counter. In fact, (iterate inc 0) is not bound to anything. Look at my examples above to think through the scope of the bound atom counter. As a reference, here is an example of using closures, which could also be used in this case with great success!
As a footnote for above examples: For cleaner code, you should make a more general solution by extracting and reusing the common parts of count-seq and count-this functions. Also, the local converter function could be extracted out of count-seq. replace-token is already general for all tokens, but consider how the whole solution could be expanded beyond matching text other than "this". These are left as exercises for the reader.
I have a sentence "china beijing shanghai USA australia", and a set of words #{"USA" "australia"}
Now i am writting a function which takes input as sentence and set of words, and remove those from sentence :
(defn remove-words-from-sentence [sentence words]
(for [w words] (-> sentence
(.replaceAll w "")))
Note : I wish to replace exact word occurance.. so if words contains letter "a", then all a's should not be replaced in sentence, only word a should be replaced.
But the above function doesn't work, any help??
One way you can do it is by splitting the sentence into individual words, and having the words to be removed in a set, and filter out the words from the sentence.
(let [sentence (clojure.string/split (read-line) #" ")
words (set (clojure.string/split (read-line) #" "))]
(clojure.string/join " "
(filter (complement words)
sentence)))
user=> china beijing shanghai USA australia ;;input sentence
user=> china USA ;;input words
user=> "beijing shanghai australia" ;;output
EDIT:
Thumbnail brought to my attention that (filter (complement pred) coll) is equivalent to (remove pred coll). You can verify that by viewing the source code of remove
(source remove)
(defn remove
"Returns a lazy sequence of the items in coll for which
(pred item) returns false. pred must be free of side-effects."
{:added "1.0"
:static true}
[pred coll]
(filter (complement pred) coll))
nil
So one could just use remove instead
(let [sentence (clojure.string/split (read-line) #" ")
words (set (clojure.string/split (read-line) #" "))]
(clojure.string/join " " (remove words sentence)))
It's even more readable that way. You can read it as "remove words from sentence".
for iterates over the seq given to it, producing another sequence. So, you're generating a list with elements representing each replacement separately but not combined.
What you want is first replacing the first word, then - on the result of that replacement - remove the second one, and so on. This is a typical case for reduce:
(defn remove-words-from-sentence
[sentence words]
(reduce #(.replace % %2 "") sentence words))
(Note that replace does the same as replaceAll but with literal replacements, not allowing a regular expression.)
EDIT: This is only fixing what the OP was trying to do. It will probably produce unwanted results if e.g. one of the words is "eij" (since it will remove that portion of "Beijing"). One way to fix that would be to use (.replaceAll % (str "\\b\\Q" %2 "\\E\\b\\s*") "") to do the replacement; and then trim the result. A more reliable version might thus look like this:
(require '[clojure.string :as string])
(defn remove-words-from-sentence
[sentence words]
(let [pattern (->> (for [w words] (str "\\b\\Q" w "\\E\\b"))
(string/join "|")
(format "(%s)\\s*"))]
(.trim (.replaceAll sentence pattern ""))))
But it all depends on what OP wants.
user> (defn remove-words-from-sentence
[sentence & words]
(loop [sentence sentence
ws words]
(if-not (seq ws)
sentence
(recur
(clojure.string/replace sentence (first ws) "")
(rest ws)))))
#'user/remove-words-from-sentence
user> (remove-words-from-sentence "Hello, World" "World")
;=> "Hello, "
user> (remove-words-from-sentence "Hello, World" "ll" "o" "H")
;=> "e, Wrld"
The answers so far don't deal with the questions specified input types (string and set)
As the input words are specified in the question as a set, and the sentence a string - the easiest solution would probably be using sets - easy to understand too;
(defn remove-words-from-sentence [sentence words]
(str/join " "(set/difference (into #{} (str/split sentence #" ")) words))
)
Works as advertised:
(remove-words-from-sentence "china beijing shanghai USA australia" #{"USA" "australia"})
"beijing china shanghai"
I'm trying to take user input and storing it in a list, only instead of a list consisting of a single string, I want each word scanned in to be its own string.
Example:
> (input)
This is my input. Hopefully this works
would return:
("this" "is" "my" "input" "hopefully" "this" "works")
Taking note that I don't want any spaces or punctuation in my final list.
Any input would be greatly appreciated.
split-sequence is the off-the-shelf solution.
you can also roll your own:
(defun my-split (string &key (delimiterp #'delimiterp))
(loop :for beg = (position-if-not delimiterp string)
:then (position-if-not delimiterp string :start (1+ end))
:for end = (and beg (position-if delimiterp string :start beg))
:when beg :collect (subseq string beg end)
:while end))
where delimiterp checks whether you want to split on this character, e.g.
(defun delimiterp (c) (or (char= c #\Space) (char= c #\,)))
or
(defun delimiterp (c) (position c " ,.;/"))
PS. looking at your expected return value, you seem to want to call string-downcase before my-split.
PPS. you can easily modify my-split to accept :start, :end, :delimiterp &c.
PPPS. Sorry about bugs in the first two versions of my-split. Please consider that an indicator that one should not roll one's own version of this function, but use the off-the-shelf solution.
For that task in Common-Lisp I found useful (uiop:split-string str :separator " ") and the package uiop, in general, has a lot of utilities, take a look at the docs https://common-lisp.net/project/asdf/uiop.html#index-split_002dstring.
There's cl-ppcre:split:
* (split "\\s+" "foo bar baz
frob")
("foo" "bar" "baz" "frob")
* (split "\\s*" "foo bar baz")
("f" "o" "o" "b" "a" "r" "b" "a" "z")
* (split "(\\s+)" "foo bar baz")
("foo" "bar" "baz")
* (split "(\\s+)" "foo bar baz" :with-registers-p t)
("foo" " " "bar" " " "baz")
* (split "(\\s)(\\s*)" "foo bar baz" :with-registers-p t)
("foo" " " "" "bar" " " " " "baz")
* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
("foo" "," NIL "bar" NIL ";" "baz")
* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
("foo" "," "bar" ";" "baz")
* (split ":" "a:b:c:d:e:f:g::")
("a" "b" "c" "d" "e" "f" "g")
* (split ":" "a:b:c:d:e:f:g::" :limit 1)
("a:b:c:d:e:f:g::")
* (split ":" "a:b:c:d:e:f:g::" :limit 2)
("a" "b:c:d:e:f:g::")
* (split ":" "a:b:c:d:e:f:g::" :limit 3)
("a" "b" "c:d:e:f:g::")
* (split ":" "a:b:c:d:e:f:g::" :limit 1000)
("a" "b" "c" "d" "e" "f" "g" "" "")
http://weitz.de/cl-ppcre/#split
For common cases there is the (new, "modern and consistent") cl-str string manipulation library:
(str:words "a sentence with spaces") ; cut with spaces, returns words
(str:replace-all "," "sentence") ; to easily replace characters, and not treat them as regexps (cl-ppcr treats them as regexps)
You have cl-slug to remove non-ascii characters and also punctuation:
(asciify "Eu André!") ; => "Eu Andre!"
as well as str:remove-punctuation (that uses cl-change-case:no-case).
; in AutoLisp usage (splitStr "get off of my cloud" " ") returns (get off of my cloud)
(defun splitStr (src delim / word letter)
(setq wordlist (list))
(setq cnt 1)
(while (<= cnt (strlen src))
(setq word "")
(setq letter (substr src cnt 1))
(while (and (/= letter delim) (<= cnt (strlen src)) ) ; endless loop if hits NUL
(setq word (strcat word letter))
(setq cnt (+ cnt 1))
(setq letter (substr src cnt 1))
) ; while
(setq cnt (+ cnt 1))
(setq wordlist (append wordlist (list word)))
)
(princ wordlist)
(princ)
) ;defun
(defun splitStr (src pat /)
(setq wordlist (list))
(setq len (strlen pat))
(setq cnt 0)
(setq letter cnt)
(while (setq cnt (vl-string-search pat src letter))
(setq word (substr src (1+ letter) (- cnt letter)))
(setq letter (+ cnt len))
(setq wordlist (append wordlist (list word)))
)
(setq wordlist (append wordlist (list (substr src (1+ letter)))))
)
In Emacs or Vim, what's a smooth way to join strings as in this example:
Transform from:
(alpha, beta, gamma) blah (123, 456, 789)
To:
(alpha=123, beta=456, gamma=789)
It would need to scale to:
many lines of these
many elements in the parentheses
I have recently found myself needing this kind of transformation often.
I use Evil in Emacs which is why a Vim answer would likely also help.
UPDATE:
The solutions were not as general as I had hoped. For example, I'd like the solution to also work when I have a list of strings and wish to distribute them into a large XML document. eg:
<item foo="" bar="barval1"/>
<item foo="" bar="barval2"/>
<item foo="" bar="barval3"/>
<item foo="" bar="barval4"/>
fooval1
fooval2
fooval3
fooval4
I formulated a solution and have added it as an answer.
%s/(\(\S\{-}\), \(\S\{-}\), \(\S\{-}\)).\{-}(\(\S\{-}\), \(\S\{-}\), \(\S\{-}\))/(\1=\4, \2=\5, \3=\6)
%s: global search and replace
\(\S{-}\),: non greedy search for non-whitespace characters up to the next comma, enclosed by "(" for backreferencing
\1=\4 : prints out the first match, an "=" sign, then the fourth match
for such text transformation, I would go with awk:
this one-liner may help:
awk -F'\\(|\\)' '{split($2,t,",");split($4,v,",");printf "( "; for(x in t)s=s""sprintf("%s=%s, ", t[x],v[x]);sub(", $","",s);printf s")\n";s=""}' file
little test:
kent$ cat test
(alpha, beta, gamma) blah (123, 456, 789)
(a, b, c) foo (1, 2, 3)
(x, y, z, m, n) bar (100, 200, 300, 400, 500)
kent$ awk -F'\\(|\\)' '{split($2,t,",");split($4,v,",");printf "( "; for(x in t)s=s""sprintf("%s=%s, ", t[x],v[x]);sub(", $","",s);printf s")\n";s=""}' test
( alpha=123, beta= 456, gamma= 789)
( a=1, b= 2, c= 3)
( m= 400, n= 500, x=100, y= 200, z= 300)
Emacs Lisp version of Prince Goulash answer
(require 'cl)
(defun split-and-trim (str separator)
(let ((strs (split-string str separator)))
(mapcar (lambda (s)
(replace-regexp-in-string "^\\s-+" "" s))
(mapcar (lambda (s)
(replace-regexp-in-string "\\s-$" "" s)) strs))))
(defun my/merge-list (beg end)
(interactive "r")
(goto-char beg)
(let ((endmark (set-mark end))
(regexp "(\\([^)]+\\))[^(]+(\\([^)]+\\))"))
(while (re-search-forward regexp end t)
(let ((replace-start (match-beginning 0))
(replace-end (match-end 0))
(keys-str (match-string-no-properties 1))
(values-str (match-string-no-properties 2)))
(let* ((keys (split-and-trim keys-str ","))
(values (split-and-trim values-str ",")))
(while (> (length keys) (length values))
(setq values (append values '(""))))
(let* ((pairs (mapcar* (lambda (k v)
(format "%s=%s" k v)) keys values))
(transformed (format "(%s)" (mapconcat #'identity pairs ", "))))
(goto-char replace-start)
(delete-region replace-start replace-end)
(insert transformed)))))
(goto-char (marker-position endmark))))
For example, you select region as following
(alpha, beta, gamma) blah (123, 456, 789)
(alpha, beta, gamma, delta) blah (123, 456, 789, aaa)
After M-x my/merge-list
(alpha=123, beta=456, gamma=789)
(alpha=123, beta=456, gamma=789, delta=aaa)
This method I'm going to describe is a bit wacky, but it involves the minimum amount of Elisp code I could manage. It's only applicable if the lists to be joined can be interpreted as Lisp lists once the commas in them are removed. Numbers and sequences of alphabetic characters, as in your example, would be fine.
First, make sure that the Common Lisp library is loaded: M-:(require 'cl)RET.
Now, starting with the cursor at the start of the first list:
M-C-k ; kill-forward-sexp
C-e ; move-end-of-line
M-C-b ; backward-sexp
M-C-k ; kill-forward-sexp
C-a ; move-beginning-of-line
C-k ; kill-line
Now blah (or whatever) is the first entry in the kill ring, the second list is the second entry, and the first list is the third entry.
Type (, then M-: (eval-expression), take a deep breath, and type this:
(loop with (a b) = (mapcar (lambda (x) (car (read-from-string (remove ?, x))))
(subseq kill-ring 1 3))
for x in a for y in b do (insert (format "%s=%s, " y x)))
(I've broken it up for presentation purposes, but you can type it all on one line.)
Then finally DELDEL), and you're done! You could turn it into a macro, if you wanted.
Here is a Vimscript solution. It is nowhere near as elegant as ash's answer, but it works with lists of any length.
function! ListMerge()
" Get line, remove text between lists, split lists at parentheses:
let curline = getline('.')
let curline = substitute(curline,')\zs.*\ze(','','g')
let curline = substitute(curline,'(','','g')
let lists = map(split(curline,')'),'split(v:val,",")')
" Return if we don't have two lists of equal length:
if len(lists) != 2 || len(lists[0]) != len(lists[1])
return
endif
" Loop over the lists, remove whitespace, build the replacement string:
let i=0
let string = '('
while i<len(lists[0])
let string .= substitute(lists[0][i],'^ *','','')
let string .= '='
let string .= substitute(lists[1][i],'^ *','','')
let string .= ', '
let i+=1
endwhile
" Add the concluding bracket:
let string = substitute(string,', $',')','')
" Replace the current line with the string:
execute "normal! S" . string
endfunction
You can then call this function on all lines like this:
:%call ListMerge()
My approach is to create one command to set a match-list, then use replace-regexp as the second command to distribute match-list, leveraging replace-regexp's existing \, facility.
Evaluate Elisp, such as in the .emacs file:
(defvar match-list nil
"A list of matches, as set through the set-match-list and consumed by the cycle-match-list function. ")
(defvar match-list-iter nil
"Iterator through the global match-list variable. ")
(defun reset-match-list-iter ()
"Set match-list-iter to the beginning of match-list and return it. "
(interactive)
(setq match-list-iter match-list))
(defun make-match-list (match-regexp use-regexp beg end)
"Set the match-list variable as described in the documentation for set-match-list. "
;; Starts at the beginning of region, searches forward and builds match-list.
;; For efficiency, matches are appended to the front of match-list and then reversed
;; at the end.
;;
;; Note that the behavior of re-search-backward is such that the same match-list
;; is not created by starting at the end of the region and searching backward.
(let ((match-list nil))
(save-excursion
(goto-char beg)
(while
(let ((old-pos (point)) (new-pos (re-search-forward match-regexp end t)))
(when (equal old-pos new-pos)
(error "re-search-forward makes no progress. old-pos=%s new-pos=%s end=%s match-regexp=%s"
old-pos new-pos end match-regexp))
new-pos)
(setq match-list
(cons (replace-regexp-in-string match-regexp
use-regexp
(match-string 0)
t)
match-list)))
(setq match-list (nreverse match-list)))))
(defun set-match-list (match-regexp use-regexp beg end)
"Set the match-list global variable to a list of regexp matches. MATCH-REGEXP
is used to find matches in the region from BEG to END, and USE-REGEXP is the
regexp to place in the match-list variable.
For example, if the region contains the text: {alpha,beta,gamma}
and MATCH-REGEXP is: \\([a-z]+\\),
and USE-REGEXP is: \\1
then match-list will become the list of strings: (\"alpha\" \"beta\")"
(interactive "sMatch regexp: \nsPlace in match-list: \nr")
(setq match-list (make-match-list match-regexp use-regexp beg end))
(reset-match-list-iter))
(defun cycle-match-list (&optional after-end-string)
"Return the next element of match-list.
If AFTER-END-STRING is nil, cycle back to the beginning of match-list.
Else return AFTER-END-STRING once the end of match-list is reached."
(let ((ret-elm (car match-list-iter)))
(unless ret-elm
(if after-end-string
(setq ret-elm after-end-string)
(reset-match-list-iter)
(setq ret-elm (car match-list-iter))))
(setq match-list-iter (cdr match-list-iter))
ret-elm))
(defadvice replace-regexp (before my-advice-replace-regexp activate)
"Advise replace-regexp to support match-list functionality. "
(reset-match-list-iter))
Then to solve the original problem:
M-x set-match-list
Match regexp: \([0-9]+\)[,)]
Place in match-list: \1
M-x replace-regexp
Replace regexp: \([a-z]+\)\([,)]\)
Replace regexp with: \1=\,(cycle-match-list)\2
And to solve the XML example:
[Select fooval strings.]
M-x set-match-list
Match regexp: .+
Place in match-list: \&
[Select XML tags.]
M-x replace-regexp
Replace regexp: foo=""
Replace regexp with: foo="\,(cycle-match-list)"