User:9cfilorux/syntaxhighlight
Introduction[edit]
Syntax-highlighting works by detecting various types of strings and characters and assigning spans with different classes to them; the span classes call for sets of colours that vary depending on the language. There are several subsets of classes beginning with the same two characters; these subsets often appear to be categories of similar aspects of syntax, and the characters in their names appear to be derived from their functions. Some classes have seemingly universal meaning, such as br0
and coMULTI
; one, ln-xtra
, is always used in the same way--for specific highlighted lines--and always has the same background colour (#ffc
): it will not be discussed further.
This page attempts to document the mechanics of Extension:SyntaxHighlight GeSHi - that is, the exact code involved in highlighting different parts of the syntax. I'm sure this is already documented somewhere, but I doubt it's on mediawiki.org and I have no idea where else I would look for it.
It's a thoroughly incomplete work in progress (you'll notice the vast majority of the languages aren't covered), as well as a squirrelly mess. I may finish it someday, but I probably won't. The original point was just to have a bit of fun, anyway; any resulting usefulness is an accident.
PHP[edit]
(Maybe this should be a table...)
- co1 co2 co3 co4 coMULTI
- Comments (co): detects strings beginning with
/*
(coMULTI
-- at least semi-universal comment syntax, hence the name),#
(co2),//
(co1)
- Comments (co): detects strings beginning with
- re0
- Variable names: detects strings beginning with
$
- Variable names: detects strings beginning with
- br0
- Bracket characters: [] (brackets), () (parentheses), {} (braces)
The meaning ofbr0
, when it is present, appears to be universal. As such, its usage will not be discussed further where it appears.
- Bracket characters: [] (brackets), () (parentheses), {} (braces)
- st0 st_h
- Values for variables:
st0
detects strings enclosed by '',st_h
detects strings enclosed by ""
- Values for variables:
- sy0 sy1
sy0
seems to highlight all punctuation characters other than ones that take different classes. The usage ofsy1
is unclear.
- kw1 kw2 kw3 kw4
- Various words with different meanings -- haven't yet figured out how to categorise them
- es0 es1 es2 es3 es4
- Stuff in double quotes - first four are for certain things that begin with backslashes (various letters and numbers, respectively); es4 is for variables inside quotes
- nu0 nu8 nu12 nu19
- Numbers:
nu0
is for ordinary number strings,nu8
is for numbers that require commas (detects strings of digits that contain a comma for every three digits; commas are marked withsy0
),nu19
is for number strings containing decimal points (must have only one point)
- Numbers:
- me1 me2
- Text following two colons
Example usage (PHP)[edit]
//This line is highlighted so it takes class ln-xtra
<?php #This string takes class kw2, and if you put it partway through the text the part above it won't get highlighted
#This is a comment with a sharp sign so it takes class co2
/* This comment is enclosed by slashes and asterisks so it takes class coMULTI */
$foo #This takes re0
'This string is in single quotes and takes class st_h'
"This string is in double quotes and takes class st0"
"$foo" #This takes es4
7878 #These numbers take class nu0
0.9 #This takes nu19
000,000,000 #This takes nu8
?> #This string also takes kw2 but if you put text after it it won't be highlighted
<? #so we need another thingy
require_once if do else elseif echo #These strings take kw1
function var global #These strings take kw2
array die isset #These use kw3
true false null __FILE__ #These use kw4
{} [] () #These take br0
!@%&*+=- #These take sy0
"\t takes class es1 when it is inside double quotes"
"So do \\, \f, \n, \r, \v, \" and \$"
"\1, \2, \3, \4, \5, \6, \7 and \0 take es3"
::This takes me2
Javascript[edit]
- nu0
- Numbers: used for all numbers, unlike in PHP where different classes are used when decimal points or commas are involved
- st0
- Anything inside single quotes
- br0
- Same as in PHP
- sy0
- Ditto, except that Javascript also uses it for ^ whereas PHP doesn't
- kw1 kw2 kw3
- kw1 is used for words such as var, function, do and if; kw2 is used for true and false; kw3 is unknown
- co1 co2 coMULTI
- Comments, much as in PHP -- co1 is slashes, coMULTI is slashes and asterisks
- me1
- Words following a full stop
- es0
- Used for all characters that have a meaning in PHP when preceded by a backslash and contained within double quotes. No distinction is made for numbers.
Example usage (Javascript)[edit]
//This comment uses co1
/* This comment uses coMULTI */
23928 2.1 222,111 // These use nu0
var function do if // These use kw1
() [] {} //These use br0
;=^:%! //These use sy0
'This uses st0'
"So does this"
.This.uses.me1
true false //These use kw2
"\t takes class es0 when it is inside double quotes"
"So do \\, \f, \n, \r, \v, \", \$, \1, \2, \3, \4, \5, \6, \7 and \0"
Bash[edit]
- co0 co1 co2 co3 co4
- co0 is for sharp signs
- re0 re1 re2 re4 re5
- re1 is strings beginning with $; re5 is strings beginning with -
- br0
- Universal class that will not be discussed further (except in languages such as Brainfuck).
- st0 st_h
- Same as in PHP
- sy0
- Several punctuation characters -- / ! @ * % &
- kw1 kw2 kw3
- Various functions -- respectively, function, do and if; git, clone and bash; fg
- es1 es2 es3 es4
- es1 appears to be used as in PHP; es2 appears to be used like es4 in PHP (i.e. dollar-sign strings (variables?) inside double quotes)
Example usage (Bash)[edit]
#This takes co0
git clone bash mv #These take kw2
/!@*%& #These take sy0
fg #This takes kw3
function do if #These take kw1
'This takes st_h'
"This takes st0"
$foo #This takes re1
{}[]() #These take br0
"\t takes class es1 when it is inside double quotes"
"So do \f, \n, \r, \v, \" and \$"
"$This takes es2"
-foo #This takes re5
CSS[edit]
- co1 co2 coMULTI
- co1 is for at-rules (strings beginning with @); coMULTI has its usual meaning
- re0 re1 re2 re3
- re0 is for ids (strings beginning with #); re1 is for classes (beginning with .); re2 is for pseudo-elements (beginning with :); both must be followed by a { character for the classes to be assigned
- st0
- Strings inside single-quotes and double-quotes (they have the same meaning)
- sy0
- Highlights punctuation characters that have meaning, as always -- ^ * + : ; > (note: not <)
- kw1 kw2
- kw1 is attributes that can be set such as content (highlighted with kw1 and not re0 even when followed by a #), color, background-image, etc (though not all attributes are recognised as such; the subset is smaller than it should be): kw2 is various standard values for attributes, such as
url
andblock
- kw1 is attributes that can be set such as content (highlighted with kw1 and not re0 even when followed by a #), color, background-image, etc (though not all attributes are recognised as such; the subset is smaller than it should be): kw2 is various standard values for attributes, such as
- es0 es2
- es2 is for certain characters preceded by backslashes inside quotes, similarly to PHP and Bash though more limited
- nu0
- Numbers
Example usage (CSS)[edit]
/* This takes coMULTI */
@This takes co1
1234 /* This takes nu0 */
#foo /* This takes re0 */
'This takes st0'
"This takes st0 too"
^*+:> /* These take sy0 */
.foo { /* This takes re1 */
:foo { /* This takes re2 */
color background background-image content display /* These take kw1 */
url block /* These take kw2 */
()[]{} /* These take br0 */
"\f, \1, \2, \3, \4, \5, \6, \7 and \0 take es2 when they are inside double quotes"
Lua[edit]
- co1 co2 coMULTI
- co1 is comments between pairs of hyphens (--), which appear to be the only usable comment syntax in Lua. This begs the question of what co2 and coMULTI could possibly be used for.
- br0
- st0
- Text inside single and double quotes
- sy0
- kw1 kw2 kw3 kw4
- kw1 is for functions such as do, function, if, elseif; kw3 is for functions such as print; kw4 is unknown
- es0 es1 es2
- nu0
Example usage (Lua)[edit]
-- This is a comment and takes co1 --
print -- This takes kw3 --
do function if elseif -- These take kw1 --
%^*%#.,/<> -- These take sy0 --
"This takes st0"
'This takes st0 too'
12345 -- This takes nu0 --
"\t takes class es1 when it is inside double quotes"
"So do \\, \f, \n, \r, \v and \""
"\1, \2, \3, \4, \5, \6, \7 and \0 take es2"
Brainfuck[edit]
- co1
- Comments: any character is a comment except + - < > . , [ ] (there really ought to be others, though, seeing as how there're so many classes. innit?)
- br0
- This is used only for [], unlike the vast majority of other languages
- st0
- sy0 sy1 sy2 sy3 sy4
- sy0 is + -; sy2 is < >; sy3 is . ,
Example usage (Brainfuck)[edit]
This is regular old text so it is a comment and takes co1
+- These take sy0
<> These take sy2
,. These take sy3
[] These take br0
C[edit]
- co1 co2 coMULTI
- co1 is strings preceded by two slashes; co2 is strings preceded by a sharp sign
- br0
- st0
- Both single and double quotes
- sy0
- A variety of punctuation characters
- kw1 kw2 kw3 kw4
- kw1 and kw2 are for certain function words
- es0 es1 es2 es3 es4 es5
- es1 is for certain letters and other characters preceded by backslashes inside quotes; es5 is for numbers in such a condition
- nu0 nu6 nu8 nu12 nu16 nu17 nu18 nu19
- nu0 is for ordinary numbers (including those with commas); nu16 is for decimals
Example usage (C)[edit]
//This takes co1
#This takes co2
/* This takes coMULTI */
()[]{} //These take br0
!%^&*+|<>:?/ //These take sy0
'This takes st0'
"So does this"
if do //These take kw1
function //These take kw2
12345 //This takes nu0
1.2 //This takes nu16
"\t takes class es1 when it is inside quotes"
"So do \\, \f, \n, \r, \v and \""
"\1, \2, \3, \4, \5, \6, \7 and \0 take es5"