Questions tagged [lexer]
a lexer is a program performing lexical analysis: it converts a sequence of characters into a sequence of tokens.
49 questions
2
votes
5
answers
2k
views
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
We are using a Software as a Service platform that allows to create custom code which integrates in the platform and all its features (dialogues for common objects like Account, Customer, Address, and ...
0
votes
1
answer
112
views
Concatenating strings given a BNF grammar
<Definition> ::= <Name> <LeftPar> <param> <RightPar>
<Name> ::= <Letter><LetterTail>
<LetterTail> ::= <Letter><LetterTail> | ‘’
A ...
0
votes
2
answers
679
views
How Should Lexers Be Stateful?
Aside from modes, Antlr grammars can use "actions" which have to be written in the target language, sometimes seen used to conditionally push and pop from the mode stack.
If I were to make a ...
-1
votes
1
answer
188
views
Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc? [closed]
For example, keywords have a special prefix. Objective-C has @interface, @implementation, but that's for compatibility with C. It inherits all the C keywords of course, with no @.
How about a language ...
0
votes
1
answer
766
views
How is it possible to store the AST nodes location in the source code?
I created a simple parser in Rust and defined the AST like this:
enum Expression {
Number(i32),
BinaryOperator(Box<Expression>, Operator, Box<Expression>),
Identifier(String),
}...
-1
votes
1
answer
726
views
How does a lexer handle template strings?
So lexers are supposed to emit tokens for key structures like INDENT and DEDENT for indentation stuff, or these:
NUMBER ::= [0-9]+
ID ::= [a-Z]+, except for keywords
IF ::= 'if'
LPAREN ::= '('
...
0
votes
2
answers
407
views
Do lexers have to go word by word or can they go line by line
So I'm trying to write a interpreter with a lexer. Currently, it adds a token line by line and does some more processing later on. But when I look at sources online, they all seem to go word by word ...
15
votes
3
answers
4k
views
How would you test a lexer?
I'm wondering how to effectively test a lexer (tokenizer).
The number of combinations of tokens in a source file can be huge, and the only way I've found is to make a batch of representative source ...
0
votes
0
answers
488
views
Parsing custom if statement input in PHP
I'm working on a feature where users can get data based on the if statement they write. The if statement looks something like the excel's conditionals.
Basic syntax:
IF ( lhs == rhs, ifTrue, ifFalse)...
1
vote
4
answers
3k
views
Should my lexer allow what is obviously a syntax error?
This is kinda like a concrete version of the question Coming up with tokens for a lexer.
I'm writing a lexer for a small subset of HTML. I'm wondering what should I do when the input stream ends and ...
0
votes
1
answer
420
views
Differences between enumeration-based and hierarchical token typing
When writing a lexer/parser, why/when would an advised developer chose to define the tokens' types through an enumeration field/type hierarchy?
The closest question I've found here so far was Lexing: ...
0
votes
2
answers
447
views
Do any programming languages let you use other languages without restriction within them?
This may be a stupid question, and it would certainly take one Hell of a lexer, but do any extant programming languages allow you to do something like:
c# (1.2) {
// c# code
}
Perl (5) {
# ...
-2
votes
1
answer
214
views
Multiple variable declaration, multiple variable assignment, context-sensitive 'in' statement
Lately I've been playing with writing my own programming language, following the excellent Crafting Interpreters book but I've hit something of a snag.
I'd like to extend the parser to accept ...
0
votes
1
answer
442
views
In which way should I structure a compiler/Interpeter? [closed]
For a couple of months now Im writing a interpeter / compiler for a programming language in C#.
I have encountered some issues recently which make the code feel incorrectly written
Classes change a ...
3
votes
3
answers
591
views
How to create simulator for distributed algorithms written in simple language
I started development of simulator for simulation of distributed algorithms in language C. My work consist of creating simple language for algorithm description and simulator which takes the described ...
125
votes
4
answers
26k
views
When to use a Parser Combinator? When to use a Parser Generator?
I've taken a deep dive into the world of parsers recently, wanting to create my own programming language.
However, I found out that there exist two somewhat different approaches of writing parsers: ...
16
votes
1
answer
3k
views
What is the procedure that is followed when writing a lexer based upon a grammar?
While reading through an answer to the question Clarification about Grammars , Lexers and Parsers, the answer stated that:
[...] a BNF grammar contains all the rules you need for lexical analysis ...
4
votes
1
answer
256
views
Is there any need to have an evaluation stage for a lexer to properly work?
Wikipedia says that the lexical process is often divided into two phases. The scanning process, and the evaluation process. Wikipedia defines:
The scanning process as:
The first stage, the scanner, ...
5
votes
1
answer
677
views
How do parsers search for token patterns?
Could you explain how parsers search for token patterns like in markdown?
I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it ...
0
votes
3
answers
164
views
What is a good strategy for reading XML like hiearchical text data?
I want to read data in a format like the following using Java.
[scenario]
id=my_first_scenario
next_scenario=null
name=_"My First Scenario."
map_data="{~add-ons/my_first_campaign/...
5
votes
1
answer
2k
views
How should a lexer deal with multi-line statements(eg. Functions definitions, Control-Flow statements)?
tl;dr-ers:
How does a lexer normally deal with none-inline statements. statements that do not end with a specified statement delimiter. Such as control flow statements?
I believe that I have a fairly ...
4
votes
2
answers
7k
views
Is it appropite for a tokenizer to use regex to gather tokens?
I have recently caught the 'Toy Language' bug, and have been experimenting with various simple tokenizer configurations. The most recent one, makes use of the boost.regex library to identify and get ...
23
votes
3
answers
10k
views
What should be the datatype of the tokens a lexer returns to its parser?
As said in the title, which data type should a lexer return/give the parser? When reading the lexical analysis article that Wikipedia has, it stated that:
In computer science, lexical analysis is ...
0
votes
1
answer
737
views
Is it possible to parse my grammar with multi-line productions without backtracking?
I'm playing around with creating a parser in PHP for my own flavor of BNF, to match strings against grammar in this BNF variant. It's still a work in progress and subject to change (I may even end up ...
-2
votes
1
answer
777
views
How do I create my own Objective-C to Swift converter? [closed]
I'm really interested in writing my own converter.
I know C++/Python/Objective-C/Swift and a little Haskell.
There are website like objectivec2swift and iswift.org, which can convert OC to Swift ...
3
votes
6
answers
2k
views
Should I use a source-to-source or a traditional compiler in order to develop my own Programming Language?
I'm really interested in writing my own general-purpose high-level programming language, but I'm somewhat confused.
I know that Python and Ruby were written in C, which makes me wonder that if I want ...
1
vote
2
answers
671
views
Passing context around AST nodes
I have various objects inside my AST, such as IfBlock, FunctionBlock, LogicExpression, etc. All of those objects share a context, which is basically a hashmap with some variables. It's a very simple ...
8
votes
4
answers
7k
views
When to use ANTLR and when to use a parsing library
I've always wanted to learn how to write a compiler - I've decided to use ANTLR, and am currently reading through the book (its very good by the way)
I'm pretty new to this, so go easy, but the jist ...
26
votes
6
answers
7k
views
Why implement a lexer as a 2d array and a giant switch?
I'm slowly working to finish my degree, and this semester is Compilers 101. We're using the Dragon Book. Shortly into the course and we're talking about lexical analysis and how it can be implemented ...
4
votes
2
answers
750
views
Lexer/Parser for multidimensional Languages
How does Lexer/Parser work in a 2D programming language like Funciton in order to transform such an unusual source-code to the correct AST?
6
votes
1
answer
2k
views
Should a lexer un-escape strings?
Is it a lexer's job to undo any escaping done to a string literal? For example:
"Me: \"Hello World!\""
Becomes:
Me: "Hello World!"
Should this conversion be done inside the lexer? I am guessing it ...
5
votes
4
answers
2k
views
Lexing: One token per operator, or one universal operator token?
When lexing, what would be the best way to tokenize operators? Would one just create a BinaryOperator token, or a separate token for every single binary operator? Examples: PlusOperator, MinusOperator,...
0
votes
2
answers
801
views
Storing tokens during lexing stage
I am currently implementing a lexer that breaks XML files up into tokens, I'm considering ways of passing the tokens onto a parser to create a more useful data structure out of said tokens - my ...
0
votes
0
answers
975
views
Parsing Razor-style Templates
I want to build a template engine (ITT not another template engine...) based on Razor.
I've been at it for quite a long time not getting anywhere and quite frankly I'm at my limit. I've tried rolling ...
7
votes
2
answers
2k
views
Can you apply the same lexer rules to all programming languages?
I'm trying to understand the theory behind a lexer with the purpose of building one (just for my own fun and experience and to compensate for not taking proper CS courses :)).
What I have yet to ...
9
votes
3
answers
2k
views
Clarification about Grammars , Lexers and Parsers
Background info (May Skip): I am working on a task we have been set at uni in which we have to design a grammar for a DSL we have been provided with. The grammar must be in BNF or EBNF. As well as ...
6
votes
3
answers
3k
views
What is the proper way to distinguish between keywords and identifiers?
I'm aware that most modern languages use reserved words to prevent things like keywords from being used as identifiers.
Reserved words aside, let's assume a language that allows keywords to be used ...
4
votes
3
answers
731
views
Is this a viable approach to resolving multiple matches in a lexer?
I'm writing a lexer in JavaScript. It's pretty typical - rules are specified with regular expressions and produce a token.
I am unsure of the best way to handle when multiple rules are matched. The ...
7
votes
1
answer
1k
views
Chosing a parser for a code beautifier
I'm in the planning stage of making a code beautifier (similar to AStyle or Uncrustify) - originally I was going to just contribute to one of those projects,
but reviewing their source led me to the ...
1
vote
4
answers
2k
views
Is it possible to create a single tokenizer to parse this?
This extends off this other Q&A thread, but is going into details that are out of scope from the original question.
I am generating a parser that is to parse a context-sensitive grammar which can ...
-3
votes
2
answers
368
views
What is the name of a grammar which can change its tokenizer in mid parse?
I was creating a language and discovered that my language tokenizer would have to change depending where in the parse it is.
I.e. abc[1] would be parsed as 4 tokens (abc, [, 1, ]), where as { abc[1] }...
3
votes
1
answer
1k
views
Practical reference for learning about graph reduction
Are there any practical references (with actual examples) for getting started implementing a small, lazy functional programming language with graph reduction? A reference that included the lexing and ...
3
votes
1
answer
830
views
How are "Json.org"-like specs graphs called and how can I generate them?
In http://www.json.org Douglas Crockford shows the specs of the JSON format in two interesting ways:
In the right side column he lists a text spec that looks like a YACC or LEX listing.
In the main ...
2
votes
1
answer
463
views
What follows after lexical analysis?
I'm working on a toy compiler (for some simple language like PL/0) and I have my lexer up and running. At this point I should start working on building the parse tree, but before I start I was ...
5
votes
5
answers
7k
views
Understanding hand written lexers
I am going to make a compiler for C and looking up on how compilers work on Wikipedia has told me a lot. However, after reading up on lexers has confused me. The Wikipedia page states that:
the GNU ...
16
votes
5
answers
7k
views
Coming up with tokens for a lexer
I'm writing a parser for a markup language that I have created (writing in python, but that's not really relevant to this question -- in fact if this seems like a bad idea, I'd love a suggestion for a ...
9
votes
5
answers
3k
views
Lexical Analysis without regular expressions
I've been looking at a few lexers in various higher level langauges (Python, PHP, Javascript among others) and they all seem to use regular expressions in one form or another. While I'm sure regex's ...
24
votes
5
answers
9k
views
Are separate parsing and lexing passes good practice with parser combinators?
When I began to use parser combinators my first reaction was a sense of liberation from what felt like an artificial distinction between parsing and lexing. All of a sudden everything was just ...
20
votes
4
answers
41k
views
Writing a lexer in C++
What are good resources on how to write a lexer in C++ (books, tutorials, documents), what are some good techniques and practices?
I have looked on the internet and everyone says to use a lexer ...