0

I have a ruby script that’ll do some text parsing (à lá markdown). It does it in a sequence of steps, like

string = string.gsub # more code here
string = string.gsub # more code here
# and so on

what is the best (i.e. most reliable) way to feed text into string in the first place? It’s a script, and the text it’ll be fed can vary a lot — it can be multilingual, have some characters that might trip a shell (like ", ', , &, $ you get the idea), and will likely be multi-line.

Is there some trick on the lines of

cat << EOF
bunch of text here
EOF

Additional considerations

I’m not looking for a markdown parser, this is something I want to do, not something I want a tool for.

I’m not a big ruby user (I’m starting to use it), so the more detailed the answer you can provide, the better.

It must be completely scriptable (i.e., no interrupting to ask the user for information).

3
  • What do you mean by “reliable”? Commented Nov 20, 2013 at 2:32
  • Something that’ll handle weird/unpredictable characters such as the ones that might be interpreted by the shell. Commented Nov 20, 2013 at 2:46
  • Ruby will handle those with absolutely no problem whatsoever. Your problem is with the shell, not with Ruby. If you’re manually entering text into the shell you ought to know what’s being entered anyway. Commented Nov 20, 2013 at 2:53

3 Answers 3

1

The Kernel#gets method will read a string separated using the record separator from stdin or files specified on the command line. So if you use that you can do things like:

yourscript <filename #read from filename
yourscript file1 file2 # read both file1 and file2
yourscript  #lets you type at your script

So to run something like:

cat <<'eof' |ruby yourscript.rb
This' & will $all 'eof' be 'fine'''
eof

Script might contain something like:

s = gets() # read a line
lines = readlines() # read all lines into an array

That's fairly standard for command-line scripts. If you want to have a user-interface then you'll want something more complex. There is an option to the Ruby interpreter to set the encoding of files as they are read.

Sign up to request clarification or add additional context in comments.

3 Comments

I’ll need more details on how to accomplish that, please (I’ve updated the question).
I hope that's sufficient
It still doesn’t account for some edge cases (like backticks), but I doubt we’ll get something better — I’ve asked for the most reliable solution, not a bulletproof one (which probably does not exist).
1

Just read from stdin (which is an IO object):

$stdin.read

As you can see, stdin is provided in the global variable $stdin. Since it’s an IO object, there are a lot of other methods available if read doesn’t suit your needs.

Here’s a simple one-line example in the shell:

$ echo "foo\nbar" | ruby -e 'puts $stdin.read.upcase'
FOO
BAR

Obviously reading from stdin is extremely flexible since you can pipe input in from anywhere.

3 Comments

It’s also extremely unreliable if I don’t know what text will be passed through it.
@user137369 Not at all. Why do you think so?
@user137369 See also my comment on the question itself.
0

Ruby is very adept at encodings (see eg. Encoding docs). To get text into Ruby, one typically uses either gets, or reads File objects, or uses a GUI, which one can build with gtk2 gem or rugui (if already finished). In case you are getting texts from the wild internet, security should be your concern. Ruby used to have 4 $SAFE levels, but after some discussions, now there might be only 3 of them left. In any case, the best strategy to handle strings is to know as much as possible about the properties of the string that you expect in advance. Handling absolutely arbitrary strings is a surprisingly difficult task. Try to limit the number of possible encodings and figure the maximum size for the string that you expect.

Also, with respect to your original stated goal writing a markdown-processor-like something, you might want to not reinvent the wheel (unless it is for didactic purposes). There is this SO post: Better ruby markdown interpreter?

The answer will direct you to kramdown gem, which gets a lot of praise, though I have not tried it personally.

1 Comment

“In any case, the best strategy to handle strings is to know as much as possible about the properties of the string that you expect in advance”. Not an option at all. I’ve updated the question with more details, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.