Let's have another answer, without external libraries.
Like you already did, we can split in the problem into smaller parts:
- define a function which builds a list of tokens from a string,
all-tokens
apply this function on all strings in your input list, and concatenate the result:
(mapcan #'all-tokens strings)
The first part, taking a state and building a list from it, looks like an unfold operation (anamorphism).
Fold (catamorphism), called reduce in Lisp, builds a value from a list of values and a function (and optionally an initial value).
The dual operation, unfold, takes a value (the state), a function, and generate a list of values.
In the case of unfold, the step function accepts a state and returns new state along with the resulting list.
Here, let's define a state as 3 values: a string, a starting position in the string, and a stack of tokens parsed so far.
Our step function next-token returns the next state.
;; definition follows below
(declare (ftype function next-token))
The main function which gets all tokens from a string just computes a fixpoint:
(defun all-tokens (string)
(do (;; initial start value is 0
(start 0)
;; initial token stack is nil
(tokens))
;; loop until start is nil, then return the reverse of tokens
((not start) (nreverse tokens))
;; advance state
(multiple-value-setq (string start tokens)
(next-token string start tokens))))
We need an auxiliary function:
(defun parenthesisp (c)
(find c "()"))
The step function is defined as follows:
(defun next-token (string start token-stack)
(let ((search (position-if #'parenthesisp string :start start)))
(typecase search
(number
;; token from start to parenthesis
(when (> search start)
(push (subseq string start search) token-stack))
;; parenthesis
(push (subseq string search (1+ search)) token-stack)
;; next state
(values string (1+ search) token-stack))
(null
;; token from start to end of string
(when (< start (1- (length string)))
(push (subseq string start) token-stack))
;; next-state
(values string nil token-stack)))))
You can try with a single string:
(next-token "(aviyon" 0 nil)
"(aviyon"
1
("(")
If you take the resulting state values and reuse them, you have:
(next-token "(aviyon" 1 '("("))
"(aviyon"
NIL
("aviyon" "(")
And here, the second return value is NIL, which ends the generation process.
Finally, you can do:
(mapcan #'all-tokens '("(aviyon" "213" "flyingman" "no))"))
Which gives:
("(" "aviyon" "213" "flyingman" "no" ")" ")")
The above code is not fully generic in the sense that all-tokens knows too much about next-token: you could rewrite it to take any kind of state.
You could also handle sequences of strings using the same mechanism, by keeping more information in your state variable.
Also, in a real lexer you would not want to reverse the whole list of tokens, you would use a queue to feed a parser.
("(aviyon","213","flyingman","no))")is not a valid list in Common Lisp, since the commas as separators are allowed only in a string. So there is an external string which is not shown in the example?