I've written a generic file system viewer in php and I'd like to add context highlighting. Geshi looks good for this, but appears to require me to send in the language I want to highlight the code in.
Any existing methods on how to determine the scripting language of a given file by contents and/or location?
I have the mime type from:
$finfo = finfo_open(FILEINFO_MIME_TYPE);
$mime_type = @finfo_file($finfo, $full_path );
That lets me know it's text at least (I allow download of non text too).
I'm thinking that parsing the bang line/file extension or looking for simple tags like php would get me a good chunk of the way for things like perl/shell scripts/php.
I also have the path to the file, as these files are coming directly off the source servers so path based rules may work for things like /etc/httpd/conf.d/*, /etc/passwd.
Perfect accuracy isn't really a problem as I'll allow the user to override the language used for the syntax. I just want to provide a low overhead educated guess to start with without writing this from scratch.
One other caveat. Some of these files can be > 150mb so I'd like to only read part of the file although I could just turn off this feature for large files if needed.