Questions tagged [parsing]
Analyzing (un)structured data to convert it into a structured, normalized format.
301 questions
2
votes
5
answers
2k
views
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
We are using a Software as a Service platform that allows to create custom code which integrates in the platform and all its features (dialogues for common objects like Account, Customer, Address, and ...
1
vote
1
answer
118
views
Can DAO parse local files?
We need to retrieve data in the form of entities
We have DAOs that hit a DB
But sometimes we need to parse (local) XMLs to retrieve essentially the same entities
Should we have a separate type for ...
2
votes
8
answers
1k
views
Fail fast is brittle
I am creating a CSV consumer (with Java). There is one field / column that should contain one of the values "Rename" or "Move".
I implemented this by allowing mixed case of letters,...
0
votes
1
answer
112
views
Concatenating strings given a BNF grammar
<Definition> ::= <Name> <LeftPar> <param> <RightPar>
<Name> ::= <Letter><LetterTail>
<LetterTail> ::= <Letter><LetterTail> | ‘’
A ...
1
vote
2
answers
147
views
How do I reduce number of FieldValidator derivations?
I am trying to write RSQL Parser which checks if the RSQL is logically correct.
while the RSQL Java library checks whether the RSQL expression is grammatically correct, it doesn't check if the ...
1
vote
2
answers
292
views
How do I solve this graphing dependency cycle in an AST?
I'm playing around with AST generation and exploring how it relates to a directed acyclic graph. I hit a logical snag that I don't understand.
var literal = 3; // expression 1
literal = literal+1; // ...
0
votes
1
answer
196
views
What lessons can be learned from the architecture/combination of ESLint and Prettier for linting and code formatting?
I was looking through the prettier docs and the prettier source code. It essentially has those defined helper functions to layout the text, given an AST. It operates on the level of the whole file, on ...
0
votes
2
answers
678
views
How Should Lexers Be Stateful?
Aside from modes, Antlr grammars can use "actions" which have to be written in the target language, sometimes seen used to conditionally push and pop from the mode stack.
If I were to make a ...
-1
votes
1
answer
188
views
Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc? [closed]
For example, keywords have a special prefix. Objective-C has @interface, @implementation, but that's for compatibility with C. It inherits all the C keywords of course, with no @.
How about a language ...
-1
votes
2
answers
223
views
When writing a tokenizer, what is the standard practice for handling aliased language keywords?
When writing a tokenizer, what is the standard practice for handling aliased language keywords?
For example, notethat signed short int is a language keyword in C++ and several aliases might be allowed....
-1
votes
1
answer
242
views
Better way to represent grammar symbols in C
I'm trying to build a simple compiler for a subset of the C language in C. To achieve this, I needed to figure out a way to represent the grammar symbols. Basically, each symbol can either be a "...
0
votes
1
answer
762
views
How is it possible to store the AST nodes location in the source code?
I created a simple parser in Rust and defined the AST like this:
enum Expression {
Number(i32),
BinaryOperator(Box<Expression>, Operator, Box<Expression>),
Identifier(String),
}...
1
vote
3
answers
2k
views
Parse 8 bytes to date time
I am trying to parse a file created by another software, but I cant identify a pattern on how this datetime is saved. There doesnt seem to be any consistency.
Programming language of the software is C+...
-2
votes
2
answers
246
views
How to filter and concatenate multiple sql files into one database [closed]
I have an issue where I have multiple databases from previous projects that I would like to combine into one large database. These databases are stored in .sql files. The issue is that I only need ...
1
vote
3
answers
248
views
Using source code instead of XML/JSON or other custom serialization schemes and binary file formats
For a while now I have been toying with the idea of using source code as a file storage format.
My question: How to support format version changes? (loading of older files with structural differences)
...
0
votes
0
answers
313
views
What is it about kdb/q that makes the grammar not suitable for ANTLR style parser generators?
I want to build a code analysis tool for personal use when programming in kdb/q.
In order to do this, I need to be able to parse q code into an AST. I have never written a parser before. ANTLR4 seems ...
1
vote
2
answers
167
views
What do you call a process which transforms objects of complex types into simple objects of primitive types? [closed]
My first thought was that I'm "serializing" the complex object, but from what I understand that means I'm reducing it down to a string or binary format which could be passed over a network. ...
0
votes
1
answer
217
views
Parsing complex data type names
How do compilers parse complex data types names like function pointers. The type has to be somehow put into the AST or it has to be processed during parsing. What are the pros and cons of different ...
-1
votes
1
answer
726
views
How does a lexer handle template strings?
So lexers are supposed to emit tokens for key structures like INDENT and DEDENT for indentation stuff, or these:
NUMBER ::= [0-9]+
ID ::= [a-Z]+, except for keywords
IF ::= 'if'
LPAREN ::= '('
...
0
votes
3
answers
986
views
Is it possible to make a compiler for any dynamic/script/interpreter language
Logically, not based on how cost we will spend or how much we will hire programmers to do it.
Can we (Is it possible to) make a compiler for any dynamic/script/interpreter language, like Lua, Python, ...
8
votes
1
answer
406
views
What type of syntax notation is this?
SQL Server documentation uses this notation, which is very easy to understand and consume. Is this a BNF Syntax Diagram? Or is this a different type of notation?
Source: SQL server documentation page ...
8
votes
2
answers
2k
views
How does the GLL parsing algorithm work?
I'm very interested in the topic of parsers, especially in the topic of parser combinators like Superpower. The problem with them is that the grammars that they can work with are a bit limited. For ...
0
votes
1
answer
396
views
Benefits of using a tokenizer/lexer before parsing for recusive descent parser
I am trying to build a static program analyzer for a proprietary progamming language for a school project, and am currently trying to implement the parser from scratch.
I was wondering, what are the ...
15
votes
3
answers
4k
views
How would you test a lexer?
I'm wondering how to effectively test a lexer (tokenizer).
The number of combinations of tokens in a source file can be huge, and the only way I've found is to make a batch of representative source ...
1
vote
0
answers
115
views
Is there a easy and useful error handling algorithm for bottom-up based parser?
My English skill is poor because I'm not a native English speaker.
Please understand.
I wonder that there is a error handling algorithm easy and useful in LR parser.
LR Parser is bottom up based so it ...
8
votes
3
answers
5k
views
What are the 'practical' advantages of LR parser over LL parser 'in today'?
My English skill is poor because I'm not a native English speaker.
Please understand.
I write this article because want to discuss about this topic.
I think LR parser has no 'practical' advantages ...
6
votes
5
answers
1k
views
Reasons to use (and not to use) a repeated delimiter to escape that delimiter?
For the designer of a language syntax, what are some reasons to choose a repeated delimiter to escape that delimiter, instead of having a separate escape character to escape that delimiter. A common ...
2
votes
1
answer
170
views
Ask for suggestion: data type for parsing stringified fractional numbers
I am the author of a C library for parsing INI files. So far I have delegated the task of parsing values as numbers to the standard atoi() family of functions. However I think time has come that I ...
-3
votes
1
answer
357
views
Data structures or coding styles in C++ for avoiding long elseif chain when parsing?
Lately I have created some small parsers of data. My initial code structure
// more cases here ...
else if(!strcmp(X,"somekey")){
// Parse according to "somekey" behavior.
}
// ...
1
vote
3
answers
212
views
Why are code readability and debugging arguments often expressed as a counter-argument for the use of generated LR parsers?
When it comes to using an LR parser generated by a tool, such as Bison, a disadvantage that often comes up as counterarguments is that the resulting parser will be unreadable and complicated to debug, ...
0
votes
0
answers
303
views
How can you multithread an html parser [ in C++ and similar languages ]?
Ive done two HTML parsers.
Done with Regular Expressions [that accounted for nesting]. It was quick, but error prone.
Done by evaluating Character by Character through switches. Here was the basic ...
0
votes
1
answer
397
views
How can I efficiently test that a parser handles multiple levels of operator precedence correctly?
I'm working on a parser for a (very small) toy language, and I want to test that it's parsing expressions with the appropriate precedence. Previously I just had arithmetic operators, so there weren't ...
2
votes
1
answer
3k
views
How to parse a dynamically changing Json file? (c#)
So I know a little bit about parsing Json data but not too much so pardon if I am not describing everything as I should. Lets use this Json file as an example:
{
"firstname": "John",
"...
1
vote
2
answers
353
views
How to concat lists with logical ANDs and ORs
Having multiple lists of integers, like e.g.:
var p1 = new[] { 3, 9, 5, 8, 9 };
var p2 = new[] { 12, 1, 18, 27, 103, 99, 4 };
var p3 = new[] { 23, 930, 15 };
// ...
I want to concatenate them with ...
1
vote
1
answer
88
views
Lexicon for syntax patterns? [closed]
I am having trouble finding a lexicon which provides terminology for the explicit patterns that are employed when parsing syntax. I am trying to write about the niggling differences between the 10+ ...
-4
votes
1
answer
386
views
Is it a good idea to use a Parser Combinator to parse unstructured input?
I'm writing a parser that needs to accept unstructured input. By that I mean it needs to take in a raw signal (text, in this case) and look for significant character sequences while accumulating the ...
0
votes
1
answer
86
views
Parsing tree with many node types
I have a tree data structure where each node describes how to use its child nodes, and this tree is stored in a standard data format - XML for now. However, there are a large number of different kinds ...
0
votes
1
answer
330
views
Are parser generators useful for parsing a shell language?
From my understanding, parser generators accept as input some form of context-free grammar description. The context-sensitive features are handled during semantic rather than syntactic analysis (...
5
votes
2
answers
380
views
How to keep parser code and grammar definition in sync?
I am working with a custom, fairly simple DSL for specifying how various scripts will be run. The DSL takes the form of config files that are very simple and easy for humans to read. They define what ...
0
votes
1
answer
456
views
Object oriented parsing: Is there a pattern or is my approach wrong?
The problem that I am chewing on comes from parsing, i.e. constructing objects in a sequential manner. The grammar is not prefix free, that is, there are more than one syntactical elements sharing the ...
3
votes
3
answers
2k
views
C++ Recursive Descent Parser: Global Variable Dilemma
I'll come straight to the point. I'm trying to create a Recursive Descent Parser in C++ for a hobby project which involves creating my own minimalist programming language.
One thing that puzzles me, ...
0
votes
0
answers
488
views
Parsing custom if statement input in PHP
I'm working on a feature where users can get data based on the if statement they write. The if statement looks something like the excel's conditionals.
Basic syntax:
IF ( lhs == rhs, ifTrue, ifFalse)...
0
votes
1
answer
94
views
Interruptible parsers in Javascript
I’m trying to write a parser in JavaScript that is able to be interrupted by the fact that the entire input source is not available during the parse. When subsequent chunks of the input become ...
0
votes
0
answers
57
views
Which components of an HTML Element, can I assume will be static over the course of its lifetime?
This is a question for people who are familiar with how HTML typically is built and behaves on webpages.
Backstory and requirements
I am building an HTML tracker with a C++/Qt backend. I am trying ...
2
votes
3
answers
711
views
Is a literal out of range a syntax error or a semantic error?
I am reading more about the differences between syntax and semantics, but I am still wondering about this one.
Let's assume that we have a language that only allows integers to be in the range of 0-...
0
votes
3
answers
133
views
Options for parsing input that will include math
I'm working on a program that takes in user input which often includes things like math.sqrt(), imaginary numbers, multiplication, division, and similar (basically, standard math plus imaginary ...
1
vote
1
answer
307
views
Question about Backus-NaurForm (BNF)
To write the grammar for Whole Numbers (0,1,2...) in BNF, we may write:
Number ::⇒ Digit MoreDigits
MoreDigits ::⇒
MoreDigits ::⇒ Number
Digit ::⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
For a ...
1
vote
2
answers
2k
views
Is JavaScript added and executed in parsing or in rendering?
As far as I know, each webpage is created in a two stage process, initiated by a webserver request and ended in a webserver response:
Parsing: markup (Say HTML) is executed as is, or created by ...
5
votes
1
answer
8k
views
Is it wrong to parse YAML `true` as a string?
Given these lines of YAML:
version: 1.00
y: 1
What does this represent?
According to the YAML spec (I'm not a delicate enough to read the spec like a lawyer), does this necessarily represent that a ...
3
votes
3
answers
2k
views
Generating an AST directly vs. converting from a CST
As I understand it, some parsers generate an abstract syntax tree on the fly, while others first generate a concrete syntax tree and then convert it. What are the tradeoffs between the two? Is there ...