0

When I get a full command line string from the registry, or elsewhere, for example:

mycommand -p -o c:\file\1.txt -r output

or

"c:\program files\dir\executable.exe" -options -o1 -o2

How could I split easily between the executable file and the arguments?

Thanks

2 Answers 2

5

Windows command line parsing is rather wonky, since it's done by the C runtime library (you can examine the code in your Visual Studio install directory at something like

  • C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\stdargv.c

The actual path on your machine may vary, of course. And, it won't be their if you didn't install the C runtime sources with Visual Studio.

I believe the "logic" such as it is was inherited from DOS, so it's rather crufty.

The basic grammar goes something like this:

  • A Command Line is a sequence of 1 or more WORDS separated by Whitespace.

  • Each Word is a sequence of 1 or more of the following: BARE_WORD, QUOTED_WORD or ESCAPE_SEQUENCE. Words are terminated by whitespace or the end of the command line.

  • A BARE_WORD is a sequence of 1 or more characters other than backslash ('\'), double-quote ('"') or whitespace.

  • A QUOTED_WORD is introduced by a LEAD_IN_QUOTE ('"'), followed by zero or more of the following:

    • Whitespace
    • ESCAPE_SEQUENCE
    • BAREWORD

    and is terminated by a LEAD_OUT_QUOTE ('"'). The lead-in and lead-out quotes are removed from the quoted word.

  • An ESCAPE_SEQUENCE is one of the following constructs:

    • An even number of backslashes ('"'), followed by a quotation mark ('"').
      This represents a series of escaped backslashes followed by a lead-in/lead-out quote. Each pair of backslashes represents a single backslash.
    • An odd number of backslashes, followed by a quotation mark ('"').
      This represents a series of escaped backslashes, followed by a literal quotation mark.
    • A sequence of backslashes, that is not followed by a quotation mark.
      This represents a series of unescaped backslashes and is passed through as-is.

That's about it.

The first word on the command line is the command name (e.g., the name/path of the executable). Strictly speaking parsing the command name should be simpler than the other words as it must be represent a valid NTFS file name. That's not necessarily true, however, depending on who composed the command line.

Here's some sample C# code that should parse any given command line in the same way that the Windows OS does, though I should note that this has not been throughly tested.

The method Parse() returns an IEnumerable<string>, the first element of which is the command/program name, with the remainder the words that make up the arguments.

class CommandLineParser
{
    char[]        cmd; // source buffer
    StringBuilder buf; // output buffer
    int           i; // current position within the source buffer

    public CommandLineParser()
    {
        cmd = null;
        buf = null;
        i = -1;
        return;
    }

    public IEnumerable<string> Parse( string commandLine )
    {
        cmd = commandLine.ToCharArray();
        buf = new StringBuilder();
        i = 0;

        while ( i < cmd.Length )
        {
            char ch = cmd[i];

            if ( char.IsWhiteSpace( ch ) ) { throw new InvalidOperationException(); }
            else if ( ch == '\\' ) { ParseEscapeSequence(); }
            else if ( ch == '"' ) { ParseQuotedWord(); }
            else { ParseBareWord(); }

            if ( i >= cmd.Length || char.IsWhiteSpace( cmd[i] ) )
            {
                string arg = buf.ToString();

                yield return arg;

                buf.Length = 0;
                ConsumeWhitespace();                    
            }                
        }            
    }

    /// <summary>
    /// Parse a quoted word
    /// </summary>
    private void ParseQuotedWord()
    {

        // scan over the lead-in quotation mark w/o adding it to the buffer
        ++i;

        // scan the contents of the quoted word into the buffer
        while ( i < cmd.Length && cmd[i] != '"' )
        {
            char ch = cmd[i];
            if ( ch == '\\' ) { ParseEscapeSequence(); }
            else { buf.Append( ch ); ++i; }
        }

        // scan over the lead-out quotation mark w/o adding it to the buffer
        if ( i < cmd.Length )
        {
            ++i;
        }            
        return;
    }

    /// <summary>
    /// Parse a bareword
    /// </summary>
    private void ParseBareWord()
    {            
        while ( i < cmd.Length )
        {
            char ch = cmd[i];
            if ( char.IsWhiteSpace( ch ) ) break; // whitespace terminates a bareword
            else if ( ch == '"' ) break; // lead-in quote starts a quoted word
            else if ( ch == '\\' ) break; // escape sequence terminates the bareword

            buf.Append(ch); // otherwise, keep reading this word                

            ++i;                
        }            
        return;
    }

    /// <summary>
    /// Parse an escape sequence of one or more backslashes followed an an optional trailing quotation mark
    /// </summary>
    private void ParseEscapeSequence()
    {
        //---------------------------------------------------------------------------------------------------------
        // The rule is that:
        //
        // * An even number of backslashes followed by a quotation mark ('"') means that
        //   - the backslashes are escaped, so half that many get injected into the buffer, and
        //   - the quotation mark is a lead-in/lead-out quotation mark that marks the start of a quoted word
        //     which does not get added to the buffer.
        //
        // * An odd number of backslashes followed by a quotation mark ('"') means that
        //   - the backslashes are escaped, so half that many get injected into the buffer, and
        //   - the quotation mark is escaped. It's a literal quotation mark that also gets injected into the buffer
        //
        // * Any number of backslashes that aren't followed by a quotation mark ('"') have no special meaning:
        //   all of them get added to the buffer as-sis.
        //
        //---------------------------------------------------------------------------------------------------------

        //
        // scan in the backslashes
        //
        int p = i; // start of the escape sequence
        while ( i < cmd.Length && cmd[i] == '\\' )
        {
            buf.Append( '\\' );
            ++i;
        }

        //
        // if the backslash sequence is followed by a quotation mark, it's an escape sequence
        //
        if ( i < cmd.Length && cmd[i] == '"' )
        {
            int n         = ( i - p ); // find the number of backslashes seen
            int quotient  = n >> 1; // n divide 2 ( 5 div 2 = 2 , 6 div 2 = 3 )
            int remainder = n & 1; // n modulo 2 ( 5 mod 2 = 1 , 6 mod 2 = 0 )

            buf.Length -= ( quotient + remainder ); // remove the unwanted backslashes

            if ( remainder != 0 )
            {
                // the trailing quotation mark is an escaped, literal quotation mark
                // add it to the buffer and increment the pointer
                buf.Append( '"' );
                ++i;
            }                
        }            
        return;
    }

    /// <summary>
    /// Consume inter-argument whitespace
    /// </summary>
    private void ConsumeWhitespace()
    {
        while ( i < cmd.Length && char.IsWhiteSpace( cmd[i] ) )
        {
            ++i;
        }
        return;
    }        
}

class Program
{
    static void Main()
    {
        CommandLineParser parser      = new CommandLineParser();
        string            commandLine = RetrieveUnparsedCommandLine();
        int i = 0;

        IEnumerable<string> args = parser.Parse( commandLine );
        Console.WriteLine( "-------------------" );
        foreach ( string arg in args )
        {
            string template = i > 0 ? "argv[0:#0]" : "command";
            string label    = string.Format( template , i++ );

            Console.WriteLine( "{0}: {1}" , label , arg );

        }
        Console.WriteLine( "-------------------------" );            
        return;
    }

    static string RetrieveUnparsedCommandLine()
    {
        // get the raw command line. Source might be registry, config file, whatever
        string commandLine = Environment.CommandLine;
        return commandLine;
    }       
}

Good Luck.

Sign up to request clarification or add additional context in comments.

Comments

1

Given a string which contains a presumably valid command line entry:

static string[] SplitArgs(string input)
{
    var args = new List<string>();
    var parts = input.Split(' ');

    for (int ii = 0; ii < parts.Length; ++ii)
    {
        // if it starts with a quote, search to the end
        // NB: this does not handle the case of --x="hello world"
        // an arguments post processor is required in that case
        if (parts[ii].StartsWith("\""))
        {
            var builder = new StringBuilder(parts[ii].Substring(0));
            while (ii + 1 < parts.Length
                && !parts[++ii].EndsWith("\""))
            {
                builder.Append(' ');
            }

            // if we made it here before the end of the string
            // it is the end of a quoted argument
            if (ii < parts.Length)
                builder.Append(parts[ii].Substring(0, parts[ii].Length - 1));

            args.Add(builder.ToString());
        }
        else
            args.Add(part[ii]);
    }

    return args.ToArray();
}

5 Comments

Sorry, but I need a generic method to do it, my application is not a ConsoleApplication
Can you then expand on what you mean, is it a WinForms, WPF, etc? Or is it a library given a string or an array of strings?
I get the string from the user, from a file, this is the complete command line, and I need to differentiate between the executable and the arguments.
Environment.GetcommandLineArgs() should still work, shouldn't it?
@cadrell0: not necessarily, in his case I believe it is coming from a file rather than the actual command line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.