Commit 4e3202d2 authored by Jean-Philippe Lang's avatar Jean-Philippe Lang

Reverts r3014 (CodeRay back to 0.7.6).

git-svn-id: svn+ssh://rubyforge.org/var/svn/redmine/trunk@3079 e93f8b46-1217-0410-a6f0-8f06a7374b81
parent d73fb1fa
= CodeRay - Trunk folder structure
== bench - Benchmarking system
All benchmarking stuff goes here.
Test inputs are stored in files named <code>example.<lang></code>.
Test outputs go to <code>bench/test.<encoder-default-file-extension></code>.
Run <code>bench/bench.rb</code> to get a usage description.
Run <code>rake bench</code> to perform an example benchmark.
== bin - Scripts
Executional files for CodeRay.
== demo - Demos and functional tests
Demonstrational scripts to show of CodeRay's features.
Run them as functional tests with <code>rake test:demos</code>.
== etc - Lots of stuff
Some addidtional files for CodeRay, mainly graphics and Vim scripts.
== gem_server - Gem output folder
For <code>rake gem</code>.
== lib - CodeRay library code
This is the base directory for the CodeRay library.
== rake_helpers - Rake helper libraries
Some files to enhance Rake, including the Autumnal Rdoc template and some scripts.
== test - Tests
Tests for the scanners.
Each language has its own subfolder and sub-suite.
Run with <code>rake test</code>.
This diff is collapsed.
= CodeRay
[- Tired of blue'n'gray? Try the original version of this documentation on
coderay.rubychan.de[http://coderay.rubychan.de/doc/] (use Ctrl+Click to open it in its own frame.) -]
http://rd.cYcnus.de/coderay/doc (use Ctrl+Click to open it in its own frame.) -]
== About
CodeRay is a Ruby library for syntax highlighting.
......@@ -18,11 +18,14 @@ And with line numbers.
* is what everybody should have on their website
* solves all your problems and makes the girls run after you
Version: 0.9.0
Version: 0.7.4 (2006.october.20)
Author:: murphy (Kornelius Kalnbach)
Contact:: murphy rubychan de
Website:: coderay.rubychan.de[http://coderay.rubychan.de]
License:: GNU LGPL; see LICENSE file in the main directory.
Subversion:: $Id: README 219 2006-10-20 15:52:25Z murphy $
-----
== Installation
......@@ -30,10 +33,17 @@ You need RubyGems[http://rubyforge.org/frs/?group_id=126].
% gem install coderay
Since CodeRay is still in beta stage, nightly buildy may be useful:
% gem install coderay -rs rd.cYcnus.de/coderay
=== Dependencies
CodeRay needs Ruby 1.8.6 or later. It also runs with Ruby 1.9.1+ and JRuby 1.1+.
CodeRay needs Ruby 1.8 and the
strscan[http://www.ruby-doc.org/stdlib/libdoc/strscan/rdoc/index.htm]
library (part of the standard library.) It should also run with Ruby 1.9 and
yarv.
== Example Usage
......@@ -50,9 +60,11 @@ CodeRay needs Ruby 1.8.6 or later. It also runs with Ruby 1.9.1+ and JRuby 1.1+.
See CodeRay.
Please report errors in this documentation to <murphy rubychan de>.
Please report errors in this documentation to <coderay cycnus de>.
-----
== Credits
=== Special Thanks to
......@@ -60,39 +72,30 @@ Please report errors in this documentation to <murphy rubychan de>.
* licenser (Heinz N. Gies) for ending my QBasic career, inventing the Coder
project and the input/output plugin system.
CodeRay would not exist without him.
* bovi (Daniel Bovensiepen) for helping me out on various occasions.
=== Thanks to
* Caleb Clausen for writing RubyLexer (see
http://rubyforge.org/projects/rubylexer) and lots of very interesting mail
traffic
* birkenfeld (Georg Brandl) and mitsuhiku (Arnim Ronacher) for PyKleur, now pygments.
You guys rock!
* birkenfeld (Georg Brandl) and mitsuhiku (Arnim Ronacher) for PyKleur. You
guys rock!
* Jamis Buck for writing Syntax (see http://rubyforge.org/projects/syntax)
I got some useful ideas from it.
* Doug Kearns and everyone else who worked on ruby.vim - it not only helped me
coding CodeRay, but also gave me a wonderful target to reach for the Ruby
scanner.
* everyone who uses CodeBB on http://www.rubyforen.de and http://www.python-forum.de
* iGEL, magichisoka, manveru, WoNáDo and everyone I forgot from rubyforen.de
* Dethix from ruby-mine.de
* zickzackw
* Dookie (who is no longer with us...) and Leonidas from http://www.python-forum.de
* everyone who used CodeBB on http://www.rubyforen.de and
http://www.infhu.de/mx
* iGEL, magichisoka, manveru, WoNDo and everyone I forgot from rubyforen.de
* Daniel and Dethix from ruby-mine.de
* Dookie (who is no longer with us...) and Leonidas from
http://www.python-forum.de
* Andreas Schwarz for finding out that CaseIgnoringWordList was not case
ignoring! Such things really make you write tests.
* closure for the first version of the Scheme scanner.
* Stefan Walk for the first version of the JavaScript scanner.
* Josh Goebel for another version of the JavaScript scanner and a Diff scanner.
* Jonathan Younger for pointing out the licence confusion caused by wrong LICENSE file.
* Jeremy Hinegardner for finding the shebang-on-empty-file bug in FileType.
* Charles Oliver Nutter and Yehuda Katz for helping me benchmark CodeRay on JRuby.
* Andreas Neuhaus for pointing out a markup bug in coderay/for_redcloth.
* 0xf30fc7 for the FileType patch concerning Delphi file extensions.
* The folks at redmine.org - thank you for using and fixing CodeRay!
* matz and all Ruby gods and gurus
* The inventors of: the computer, the internet, the true color display, HTML &
CSS, VIM, Ruby, pizza, microwaves, guitars, scouting, programming, anime,
CSS, VIM, RUBY, pizza, microwaves, guitars, scouting, programming, anime,
manga, coke and green ice tea.
Where would we be without all those people?
......@@ -100,27 +103,23 @@ Where would we be without all those people?
=== Created using
* Ruby[http://ruby-lang.org/]
* Chihiro (my Sony VAIO laptop); Henrietta (my old MacBook);
Triella, born Rico (my new MacBook); as well as
Seras and Hikari (my PCs)
* RDE[http://homepage2.nifty.com/sakazuki/rde_e.html],
VIM[http://vim.org] and TextMate[http://macromates.com]
* Subversion[http://subversion.tigris.org/]
* Redmine[http://redmine.org/]
* Firefox[http://www.mozilla.org/products/firefox/],
Firebug[http://getfirebug.com/], Safari[http://www.apple.com/safari/], and
* Chihiro (my Sony VAIO laptop), Henrietta (my new MacBook) and
Seras (my Athlon 2200+ tower)
* VIM[http://vim.org] and TextMate[http://macromates.com]
* RDE[http://homepage2.nifty.com/sakazuki/rde_e.html]
* Microsoft Windows (yes, I confess!) and MacOS X
* Firefox[http://www.mozilla.org/products/firefox/] and
Thunderbird[http://www.mozilla.org/products/thunderbird/]
* RubyGems[http://docs.rubygems.org/] and Rake[http://rake.rubyforge.org/]
* TortoiseSVN[http://tortoisesvn.tigris.org/] using Apache via
* Rake[http://rake.rubyforge.org/]
* RubyGems[http://docs.rubygems.org/]
* {Subversion/TortoiseSVN}[http://tortoisesvn.tigris.org/] using Apache via
XAMPP[http://www.apachefriends.org/en/xampp.html]
* RDoc (though I'm quite unsatisfied with it)
* Microsoft Windows (yes, I confess!) and MacOS X
* GNUWin32, MinGW and some other tools to make the shell under windows a bit
less useless
more useful
* Term::ANSIColor[http://term-ansicolor.rubyforge.org/]
* PLEAC[http://pleac.sourceforge.net/] code examples
=== Free
---
* As you can see, CodeRay was created under heavy use of *free* software.
* So CodeRay is also *free*.
......
#!/usr/bin/env ruby
# CodeRay Executable
#
# Version: 0.1
# Author: murphy
def err msg
$stderr.puts msg
end
begin
require 'coderay'
if ARGV.empty?
puts <<-USAGE
CodeRay #{CodeRay::VERSION} (http://rd.cYcnus.de/coderay)
Usage:
coderay -<lang> [-<format>] < file > output
coderay file [-<format>]
Example:
coderay -ruby -statistic < foo.rb
coderay codegen.c # generates codegen.c.html
USAGE
end
first, second = ARGV
if first
if first[/-(\w+)/] == first
lang = $1.to_sym
input = $stdin.read
tokens = :scan
elsif first == '-'
lang = $1.to_sym
input = $stdin.read
tokens = :scan
else
file = first
tokens = CodeRay.scan_file file
output_filename, output_ext = file, /#{Regexp.escape(File.extname(file))}$/
end
else
puts 'No lang/file given.'
exit 1
end
if second
if second[/-(\w+)/] == second
format = $1.to_sym
else
raise 'Invalid format (must be -xxx).'
end
else
$stderr.puts 'No format given; setting to default (HTML Page)'
format = :page
end
# TODO: allow streaming
if tokens == :scan
output = CodeRay::Duo[lang => format].highlight input #, :stream => true
else
output = tokens.encode format
end
out = $stdout
if output_filename
output_filename += '.' + CodeRay::Encoders[format]::FILE_EXTENSION
if File.exist? output_filename
err 'File %s already exists.' % output_filename
exit
else
out = File.open output_filename, 'w'
end
end
out.print output
rescue => boom
err "Error: #{boom.message}\n"
err boom.backtrace
err '-' * 50
err ARGV
exit 1
end
#!/usr/bin/env ruby
require 'coderay'
puts CodeRay::Encoders[:html]::CSS.new.stylesheet
# = CodeRay Library
#
# $Id: coderay.rb 227 2007-04-24 12:26:18Z murphy $
#
# CodeRay is a Ruby library for syntax highlighting.
#
# I try to make CodeRay easy to use and intuitive, but at the same time fully featured, complete,
......@@ -105,7 +107,7 @@
#
# CodeRay.scan_stream:: Scan in stream mode.
#
# == All-in-One Encoding
# == All-in-One Encoding
#
# CodeRay.encode:: Highlight a string with a given input and output format.
#
......@@ -128,14 +130,13 @@
module CodeRay
# Version: Major.Minor.Teeny[.Revision]
# Major: 0 for pre-stable, 1 for stable
# Minor: feature milestone
# Teeny: development state, 0 for pre-release
# Revision: Subversion Revision number (generated on rake gem:make)
VERSION = '0.9.0'
# Major: 0 for pre-release
# Minor: odd for beta, even for stable
# Teeny: development state
# Revision: Subversion Revision number (generated on rake)
VERSION = '0.7.6'
require 'coderay/tokens'
require 'coderay/token_classes'
require 'coderay/scanner'
require 'coderay/encoder'
require 'coderay/duo'
......@@ -314,7 +315,6 @@ end
# Run a test script.
if $0 == __FILE__
$stderr.print 'Press key to print demo.'; gets
# Just use this file as an example of Ruby code.
code = File.read(__FILE__)[/module CodeRay.*/m]
print CodeRay.scan(code, :ruby).html
end
......@@ -2,6 +2,8 @@ module CodeRay
# = Duo
#
# $Id: scanner.rb 123 2006-03-21 14:46:34Z murphy $
#
# A Duo is a convenient way to use CodeRay. You just create a Duo,
# giving it a lang (language of the input code) and a format (desired
# output format), and call Duo#highlight with the code.
......
require "stringio"
module CodeRay
# This module holds the Encoder class and its subclasses.
......@@ -130,56 +132,30 @@ module CodeRay
# By default, it calls text_token or block_token, depending on
# whether +text+ is a String.
def token text, kind
encoded_token =
if text.is_a? ::String
out =
if text.is_a? ::String # Ruby 1.9: :open.is_a? String
text_token text, kind
elsif text.is_a? ::Symbol
block_token text, kind
else
raise 'Unknown token text type: %p' % text
end
append_encoded_token_to_output encoded_token
end
def append_encoded_token_to_output encoded_token
@out << encoded_token if encoded_token && defined?(@out) && @out
@out << out if @out
end
# Called for each text token ([text, kind]), where text is a String.
def text_token text, kind
end
# Called for each block (non-text) token ([action, kind]), where action is a Symbol.
def block_token action, kind
case action
when :open
open_token kind
when :close
close_token kind
when :begin_line
begin_line kind
when :end_line
end_line kind
else
raise 'unknown block action: %p' % action
end
end
# Called for each block token at the start of the block ([:open, kind]).
def open_token kind
end
# Called for each block token end of the block ([:close, kind]).
def close_token kind
end
# Called for each line token block at the start of the line ([:begin_line, kind]).
def begin_line kind
end
# Called for each line token block at the end of the line ([:end_line, kind]).
def end_line kind
end
# Called with merged options after encoding starts.
# The return value is the result of encoding, typically @out.
......@@ -191,16 +167,8 @@ module CodeRay
#
# The already created +tokens+ object must be used; it can be a
# TokenStream or a Tokens object.
if RUBY_VERSION >= '1.9'
def compile tokens, options
for text, kind in tokens
token text, kind
end
end
else
def compile tokens, options
tokens.each(&self)
end
def compile tokens, options
tokens.each(&self)
end
end
......
......@@ -35,14 +35,6 @@ module Encoders
">"
end
def begin_line kind
"#{kind}["
end
def end_line kind
"]"
end
end
end
......
......@@ -9,9 +9,10 @@ module Encoders
register_for :div
DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge \
DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge({
:css => :style,
:wrap => :div
:wrap => :div,
})
end
......
......@@ -25,6 +25,10 @@ module Encoders
#
# == Options
#
# === :escape
# Escape html entities
# Default: true
#
# === :tab_width
# Convert \t characters to +n+ spaces (a number.)
# Default: 8
......@@ -41,12 +45,6 @@ module Encoders
#
# Default: nil
#
# === :title
#
# The title of the HTML page (works only when :wrap is set to :page.)
#
# Default: 'CodeRay output'
#
# === :line_numbers
# Include line numbers in :table, :inline, :list or nil (no line numbers)
#
......@@ -62,16 +60,6 @@ module Encoders
#
# Default: 10
#
# === :highlight_lines
#
# Highlights certain line numbers now by using the :highlight_lines option.
# Can be any Enumerable, typically just an Array or Range, of numbers.
#
# Bolding is deactivated when :highlight_lines is set. It only makes sense
# in combination with :line_numbers.
#
# Default: nil
#
# === :hint
# Include some information into the output using the title attribute.
# Can be :info (show token type on mouse-over), :info_long (with full path)
......@@ -86,19 +74,19 @@ module Encoders
FILE_EXTENSION = 'html'
DEFAULT_OPTIONS = {
:escape => true,
:tab_width => 8,
# :level => :xhtml, # reserved for future use
:level => :xhtml,
:css => :class,
:style => :cycnus,
:wrap => nil,
:title => 'CodeRay output',
:line_numbers => nil,
:line_number_start => 1,
:bold_every => 10,
:highlight_lines => nil,
:hint => false,
}
......@@ -153,7 +141,7 @@ module Encoders
when :debug
classes.inspect
end
title ? " title=\"#{title}\"" : ''
" title=\"#{title}\""
end
def setup options
......@@ -162,6 +150,7 @@ module Encoders
@HTML_ESCAPE = HTML_ESCAPE.dup
@HTML_ESCAPE["\t"] = ' ' * options[:tab_width]
@escape = options[:escape]
@opened = [nil]
@css = CSS.new options[:style]
......@@ -175,7 +164,7 @@ module Encoders
when :class
@css_style = Hash.new do |h, k|
c = CodeRay::Tokens::ClassOfKind[k.first]
c = Tokens::ClassOfKind[k.first]
if c == :NO_HIGHLIGHT and not hint
h[k.dup] = false
else
......@@ -233,70 +222,43 @@ module Encoders
@out.css = @css
@out.numerize! options[:line_numbers], options
@out.wrap! options[:wrap]
@out.apply_title! options[:title]
super
end
def token text, type = :plain
case text
when nil
# raise 'Token with nil as text was given: %p' % [[text, type]]
when String
if text =~ /#{HTML_ESCAPE_PATTERN}/o
def token text, type
if text.is_a? ::String
if @escape && (text =~ /#{HTML_ESCAPE_PATTERN}/o)
text = text.gsub(/#{HTML_ESCAPE_PATTERN}/o) { |m| @HTML_ESCAPE[m] }
end
@opened[0] = type
if text != "\n" && style = @css_style[@opened]
if style = @css_style[@opened]
@out << style << text << '</span>'
else
@out << text
end
# token groups, eg. strings
when :open
@opened[0] = type
@out << (@css_style[@opened] || '<span>')
@opened << type
when :close
if @opened.empty?
# nothing to close
else
if $DEBUG and (@opened.size == 1 or @opened.last != type)
raise 'Malformed token stream: Trying to close a token (%p) \
that is not open. Open are: %p.' % [type, @opened[1..-1]]
else
case text
when :open
@opened[0] = type
@out << (@css_style[@opened] || '<span>')
@opened << type
when :close
if @opened.empty?
# nothing to close
else
if $DEBUG and (@opened.size == 1 or @opened.last != type)
raise 'Malformed token stream: Trying to close a token (%p) \
that is not open. Open are: %p.' % [type, @opened[1..-1]]
end
@out << '</span>'
@opened.pop
end
@out << '</span>'
@opened.pop
end
# whole lines to be highlighted, eg. a deleted line in a diff
when :begin_line
@opened[0] = type
if style = @css_style[@opened]
@out << style.sub('<span', '<div')
when nil
raise 'Token with nil as text was given: %p' % [[text, type]]
else
@out << '<div>'
raise 'unknown token kind: %p' % text
end
@opened << type
when :end_line
if @opened.empty?
# nothing to close
else
if $DEBUG and (@opened.size == 1 or @opened.last != type)
raise 'Malformed token stream: Trying to close a line (%p) \
that is not open. Open are: %p.' % [type, @opened[1..-1]]
end
@out << '</div>'
@opened.pop
end
else
raise 'unknown token kind: %p' % [text]
end
end
......
......@@ -27,19 +27,16 @@ module Encoders
1.upto(styles.size) do |offset|
break if style = cl[styles[offset .. -1]]
end
$stderr.puts 'Style not found: %p' % [styles] if $DEBUG and style.empty?
raise 'Style not found: %p' % [styles] if $DEBUG and style.empty?
return style
end
private
CSS_CLASS_PATTERN = /
( # $1 = selectors
(?:
(?: \s* \. [-\w]+ )+
\s* ,?
)+
)
( (?: # $1 = classes
\s* \. [-\w]+
)+ )
\s* \{ \s*
( [^\}]+ )? # $2 = style
\s* \} \s*
......@@ -47,14 +44,12 @@ module Encoders
( . ) # $3 = error
/mx
def parse stylesheet
stylesheet.scan CSS_CLASS_PATTERN do |selectors, style, error|
stylesheet.scan CSS_CLASS_PATTERN do |classes, style, error|
raise "CSS parse error: '#{error.inspect}' not recognized" if error
for selector in selectors.split(',')
classes = selector.scan(/[-\w]+/)
cl = classes.pop
@classes[cl] ||= Hash.new
@classes[cl][classes] = style.to_s.strip.delete(' ').chomp(';')
end
styles = classes.scan(/[-\w]+/)
cl = styles.pop
@classes[cl] ||= Hash.new
@classes[cl][styles] = style.to_s.strip
end
end
......
......@@ -32,19 +32,9 @@ module Encoders
#end
bold_every = options[:bold_every]
highlight_lines = options[:highlight_lines]
bolding =
if bold_every == false && highlight_lines == nil
if bold_every == false
proc { |line| line.to_s }
elsif highlight_lines.is_a? Enumerable
highlight_lines = highlight_lines.to_set
proc do |line|
if highlight_lines.include? line
"<strong class=\"highlighted\">#{line}</strong>" # highlighted line numbers in bold
else
line.to_s
end
end
elsif bold_every.is_a? Integer
raise ArgumentError, ":bolding can't be 0." if bold_every == 0
proc do |line|
......@@ -61,12 +51,12 @@ module Encoders
case mode
when :inline
max_width = (start + line_count).to_s.size
line_number = start
line = start
gsub!(/^/) do
line_number_text = bolding.call line_number
indent = ' ' * (max_width - line_number.to_s.size) # TODO: Optimize (10^x)
res = "<span class=\"no\">#{indent}#{line_number_text}</span> "
line_number += 1
line_number = bolding.call line
indent = ' ' * (max_width - line.to_s.size)
res = "<span class=\"no\">#{indent}#{line_number}</span> "
line += 1
res
end
......@@ -75,12 +65,12 @@ module Encoders
# Because even monospace fonts seem to have different heights when bold,
# I make the newline bold, both in the code and the line numbers.
# FIXME Still not working perfect for Mr. Internet Exploder
# FIXME Firefox struggles with very long codes (> 200 lines)
line_numbers = (start ... start + line_count).to_a.map(&bolding).join("\n")
line_numbers << "\n" # also for Mr. MS Internet Exploder :-/
line_numbers.gsub!(/\n/) { "<tt>\n</tt>" }
line_numbers_table_tpl = TABLE.apply('LINE_NUMBERS', line_numbers)
gsub!(/<\/div>\n/) { '</div>' }
gsub!(/\n/) { "<tt>\n</tt>" }
wrap_in! line_numbers_table_tpl
@wrapped_in = :div
......@@ -100,9 +90,8 @@ module Encoders
end
close = '</span>' * opened_tags.size
"<li>#{open}#{line}#{close}</li>\n"
"<li>#{open}#{line}#{close}</li>"
end
chomp!("\n")
wrap_in! LIST
@wrapped_in = :div
......
......@@ -86,11 +86,6 @@ module Encoders
Template.wrap! self, template, 'CONTENT'
self
end
def apply_title! title
self.sub!(/(<title>)(<\/title>)/) { $1 + title + $2 }
self
end
def wrap! element, *args
return self if not element or element == wrapped_in
......@@ -105,10 +100,6 @@ module Encoders
wrap! :div if wrapped_in? nil
raise "Can't wrap %p in %p" % [wrapped_in, element] unless wrapped_in? :div
wrap_in! Output.page_template_for_css(@css)
if args.first.is_a?(Hash) && title = args.first[:title]
apply_title! title
end
self
when nil
return self
else
......@@ -175,9 +166,7 @@ module Encoders
# title="double click to expand"
LIST = <<-`LIST`
<ol class="CodeRay">
<%CONTENT%>
</ol>
<ol class="CodeRay"><%CONTENT%></ol>
LIST
PAGE = <<-`PAGE`
......@@ -186,7 +175,7 @@ module Encoders
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="de">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title></title>
<title>CodeRay HTML Encoder Example</title>
<style type="text/css">
<%CSS%>
</style>
......
......@@ -9,10 +9,11 @@ module Encoders
register_for :page
DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge \
DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge({
:css => :class,
:wrap => :page,
:line_numbers => :table
})
end
......
......@@ -9,9 +9,10 @@ module Encoders
register_for :span
DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge \
DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge({
:css => :style,
:wrap => :span
:wrap => :span,
})
end
......
......@@ -14,16 +14,16 @@ module Encoders
protected
def setup options
super
@out = ''
@sep = options[:separator]
end
def text_token text, kind
text + @sep
def token text, kind
@out << text + @sep if text.is_a? ::String
end
def finish options
super.chomp @sep
@out.chomp @sep
end
end
......
module CodeRay
module Encoders
# The Tokens encoder converts the tokens to a simple
# readable format. It doesn't use colors and is mainly
# intended for console output.
#
# The tokens are converted with Tokens.write_token.
#
# The format is:
#
# <token-kind> \t <escaped token-text> \n
#
# Example:
#
# require 'coderay'
# puts CodeRay.scan("puts 3 + 4", :ruby).tokens
#
# prints:
#
# ident puts
# space
# integer 3
# space
# operator +
# space
# integer 4
#
class Tokens < Encoder
include Streamable
register_for :tokens
FILE_EXTENSION = 'tok'
protected
def token text, kind
@out << CodeRay::Tokens.write_token(text, kind)
end
end
end
end
......@@ -29,7 +29,6 @@ module Encoders
end
def finish options
@out = ''
@doc.write @out, options[:pretty], options[:transitive], true
@out
end
......
#!/usr/bin/env ruby
module CodeRay
# = FileType
......@@ -34,12 +33,12 @@ module FileType
# That means you can get filetypes from files that don't exist.
def [] filename, read_shebang = false
name = File.basename filename
ext = File.extname(name).sub(/^\./, '') # from last dot, delete the leading dot
ext2 = filename.to_s[/\.(.*)/, 1] # from first dot
ext = File.extname name
ext.sub!(/^\./, '') # delete the leading dot
type =
TypeFromExt[ext] ||
TypeFromExt[ext.downcase] ||
(TypeFromExt[ext2.downcase] if ext2) ||
TypeFromName[name] ||
TypeFromName[name.downcase]
type ||= shebang(filename) if read_shebang
......@@ -50,11 +49,8 @@ module FileType
def shebang filename
begin
File.open filename, 'r' do |f|
if first_line = f.gets
if type = first_line[TypeFromShebang]
type.to_sym
end
end
first_line = f.gets
first_line[TypeFromShebang]
end
rescue IOError
nil
......@@ -81,41 +77,27 @@ module FileType
end
TypeFromExt = {
'rb' => :ruby,
'rbw' => :ruby,
'rake' => :ruby,
'mab' => :ruby,
'cpp' => :c,
'c' => :c,
'cpp' => :cpp,
'css' => :css,
'diff' => :diff,
'dpr' => :delphi,
'groovy' => :groovy,
'gvy' => :groovy,
'h' => :c,
'java' => :java,
'js' => :javascript,
'xml' => :xml,
'htm' => :html,
'html' => :html,
'html.erb' => :rhtml,
'java' => :java,
'js' => :java_script,
'json' => :json,
'mab' => :ruby,
'pas' => :delphi,
'patch' => :diff,
'php' => :php,
'php3' => :php,
'php4' => :php,
'php5' => :php,
'py' => :python,
'py3' => :python,
'pyw' => :python,
'rake' => :ruby,
'xhtml' => :xhtml,
'raydebug' => :debug,
'rb' => :ruby,
'rbw' => :ruby,
'rhtml' => :rhtml,
'rxml' => :ruby,
'sch' => :scheme,
'sql' => :sql,
'ss' => :scheme,
'xhtml' => :xhtml,
'xml' => :xml,
'sch' => :scheme,
'yaml' => :yaml,
'yml' => :yaml,
}
......@@ -133,16 +115,15 @@ end
if $0 == __FILE__
$VERBOSE = true
eval DATA.read, nil, $0, __LINE__ + 4
eval DATA.read, nil, $0, __LINE__+4
end
__END__
require 'test/unit'
class FileTypeTests < Test::Unit::TestCase
include CodeRay
class TC_FileType < Test::Unit::TestCase
def test_fetch
assert_raise FileType::UnknownFileType do
FileType.fetch ''
......@@ -169,8 +150,6 @@ class FileTypeTests < Test::Unit::TestCase
def test_ruby
assert_equal :ruby, FileType['test.rb']
assert_equal :ruby, FileType['test.java.rb']
assert_equal :java, FileType['test.rb.java']
assert_equal :ruby, FileType['C:\\Program Files\\x\\y\\c\\test.rbw']
assert_equal :ruby, FileType['/usr/bin/something/Rakefile']
assert_equal :ruby, FileType['~/myapp/gem/Rantfile']
......@@ -195,7 +174,6 @@ class FileTypeTests < Test::Unit::TestCase
assert_equal :xhtml, FileType['test.xhtml']
assert_equal :xhtml, FileType['test.html.xhtml']
assert_equal :rhtml, FileType['_form.rhtml']
assert_equal :rhtml, FileType['_form.html.erb']
end
def test_yaml
......@@ -205,16 +183,7 @@ class FileTypeTests < Test::Unit::TestCase
assert_not_equal :yaml, FileType['YAML']
end
def test_pathname
require 'pathname'
pn = Pathname.new 'test.rb'
assert_equal :ruby, FileType[pn]
dir = Pathname.new '/etc/var/blubb'
assert_equal :ruby, FileType[dir + pn]
assert_equal :cpp, FileType[dir + 'test.cpp']
end
def test_no_shebang
def test_shebang
dir = './test'
if File.directory? dir
Dir.chdir dir do
......@@ -222,19 +191,5 @@ class FileTypeTests < Test::Unit::TestCase
end
end
end
def test_shebang_empty_file
require 'tmpdir'
tmpfile = File.join(Dir.tmpdir, 'bla')
File.open(tmpfile, 'w') { } # touch
assert_equal nil, FileType[tmpfile]
end
def test_shebang
require 'tmpdir'
tmpfile = File.join(Dir.tmpdir, 'bla')
File.open(tmpfile, 'w') { |f| f.puts '#!/usr/bin/env ruby' }
assert_equal :ruby, FileType[tmpfile, true]
end
end
......@@ -2,7 +2,7 @@
#
# A simplified interface to the gzip library +zlib+ (from the Ruby Standard Library.)
#
# Author: murphy (mail to murphy rubychan de)
# Author: murphy (mail to murphy cYcnus de)
#
# Version: 0.2 (2005.may.28)
#
......
......@@ -2,6 +2,8 @@ module CodeRay
# = PluginHost
#
# $Id: plugin.rb 220 2007-01-01 02:58:58Z murphy $
#
# A simple subclass plugin system.
#
# Example:
......@@ -20,7 +22,7 @@ module CodeRay
#
# Generators[:fancy] #-> FancyGenerator
# # or
# CodeRay.require_plugin 'Generators/fancy'
# require_plugin 'Generators/fancy'
module PluginHost
# Raised if Encoders::[] fails because:
......@@ -133,13 +135,9 @@ module PluginHost
# map :navy => :dark_blue
# default :gray
# end
def default id = nil
if id
id = validate_id id
plugin_hash[nil] = id
else
plugin_hash[nil]
end
def default id
id = validate_id id
plugin_hash[nil] = id
end
# Every plugin must register itself for one or more
......@@ -176,7 +174,7 @@ module PluginHost
def inspect
map = plugin_hash.dup
map.each do |id, plugin|
map[id] = plugin.to_s[/(?>\w+)$/]
map[id] = plugin.to_s[/(?>[\w_]+)$/]
end
"#{name}[#{host_id}]#{map.inspect}"
end
......@@ -243,7 +241,7 @@ protected
id
elsif id.is_a? String
if id[/\w+/] == id
id.downcase.to_sym
id.to_sym
else
raise ArgumentError, "Invalid id: '#{id}' given."
end
......@@ -281,14 +279,6 @@ module Plugin
plugin_host.register self, *ids
end
def title title = nil
if title
@title = title.to_s
else
@title ||= name[/([^:]+)$/, 1]
end
end
# The host for this Plugin class.
def plugin_host host = nil
if host and not host.is_a? PluginHost
......@@ -309,23 +299,15 @@ module Plugin
#
# The above example loads the file myplugin/my_helper.rb relative to the
# file in which MyPlugin was defined.
#
# You can also load a helper from a different plugin:
#
# helper 'other_plugin/other_helper'
def helper *helpers
for helper in helpers
if helper.is_a?(String) && helper[/\//]
self::PLUGIN_HOST.require_helper $`, $'
else
self::PLUGIN_HOST.require_helper plugin_id, helper.to_s
end
self::PLUGIN_HOST.require_helper plugin_id, helper.to_s
end
end
# Returns the pulgin id used by the engine.
def plugin_id
name[/\w+$/].downcase
name[/[\w_]+$/].downcase
end
end
......@@ -336,7 +318,7 @@ end
# CodeRay.require_plugin '<Host ID>/<Plugin ID>'
#
# Returns the loaded plugin.
def self.require_plugin path
def require_plugin path
host_id, plugin_id = path.split '/', 2
host = PluginHost.host_by_id(host_id)
raise PluginHost::HostNotFound,
......
......@@ -104,7 +104,6 @@ class CaseIgnoringWordList < WordList
h[k] = h.fetch k.downcase, default
end
else
super(default, false)
def self.[] key # :nodoc:
super(key.downcase)
end
......
......@@ -4,6 +4,8 @@ module CodeRay
# = Scanners
#
# $Id: scanner.rb 222 2007-01-01 16:26:17Z murphy $
#
# This module holds the Scanner class and its subclasses.
# For example, the Ruby scanner is named CodeRay::Scanners::Ruby
# can be found in coderay/scanners/ruby.
......@@ -43,7 +45,6 @@ module CodeRay
# You can also use +map+, +any?+, +find+ and even +sort_by+,
# if you want.
class Scanner < StringScanner
extend Plugin
plugin_host Scanners
......@@ -56,8 +57,6 @@ module CodeRay
#
# Define @default_options for subclasses.
DEFAULT_OPTIONS = { :stream => false }
KINDS_NOT_LOC = [:comment, :doctype]
class << self
......@@ -67,16 +66,7 @@ module CodeRay
end
def normify code
code = code.to_s
if code.respond_to? :force_encoding
begin
code.force_encoding 'utf-8'
code[/\z/] # raises an ArgumentError when code contains a non-UTF-8 char
rescue ArgumentError
code.force_encoding 'binary'
end
end
code.to_unix
code = code.to_s.to_unix
end
def file_extension extension = nil
......@@ -85,7 +75,7 @@ module CodeRay
else
@file_extension ||= plugin_id.to_s
end
end
end
end
......@@ -131,7 +121,6 @@ module CodeRay
"but :stream is #{@options[:stream]}" if block_given?
@tokens ||= Tokens.new
end
@tokens.scanner = self
setup
end
......@@ -189,16 +178,6 @@ module CodeRay
def line
string[0..pos].count("\n") + 1
end
def column pos = self.pos
return 0 if pos <= 0
string = string()
if string.respond_to?(:bytesize) && (defined?(@bin_string) || string.bytesize != string.size)
@bin_string ||= string.dup.force_encoding(:binary)
string = @bin_string
end
pos - (string.rindex(?\n, pos) || 0)
end
protected
......@@ -223,7 +202,6 @@ module CodeRay
def reset_instance
@tokens.clear unless @options[:keep_tokens]
@cached_tokens = nil
@bin_string = nil if defined? @bin_string
end
# Scanner error with additional status information
......@@ -236,7 +214,7 @@ module CodeRay
tokens:
%s
current line: %d column: %d pos: %d
current line: %d pos = %d
matched: %p state: %p
bol? = %p, eos? = %p
......@@ -251,10 +229,10 @@ surrounding code:
msg,
tokens.size,
tokens.last(10).map { |t| t.inspect }.join("\n"),
line, column, pos,
line, pos,
matched, state, bol?, eos?,
string[pos - ambit, ambit],
string[pos, ambit],
string[pos-ambit,ambit],
string[pos,ambit],
]
end
......
module CodeRay
module Scanners
map \
:h => :c,
:cplusplus => :cpp,
:'c++' => :cpp,
:ecma => :java_script,
:ecmascript => :java_script,
:ecma_script => :java_script,
:irb => :ruby,
:javascript => :java_script,
:js => :java_script,
:nitro => :nitro_xhtml,
:pascal => :delphi,
map :cpp => :c,
:plain => :plaintext,
:xhtml => :html,
:yml => :yaml
:pascal => :delphi,
:irb => :ruby,
:xml => :html,
:xhtml => :nitro_xhtml,
:nitro => :nitro_xhtml
default :plain
......
......@@ -3,20 +3,22 @@ module Scanners
class C < Scanner
include Streamable
register_for :c
file_extension 'c'
include Streamable
RESERVED_WORDS = [
'asm', 'break', 'case', 'continue', 'default', 'do',
'else', 'enum', 'for', 'goto', 'if', 'return',
'sizeof', 'struct', 'switch', 'typedef', 'union', 'while',
'restrict', # C99
'asm', 'break', 'case', 'continue', 'default', 'do', 'else',
'for', 'goto', 'if', 'return', 'switch', 'while',
'struct', 'union', 'enum', 'typedef',
'static', 'register', 'auto', 'extern',
'sizeof',
'volatile', 'const', # C89
'inline', 'restrict', # C99
]
PREDEFINED_TYPES = [
'int', 'long', 'short', 'char',
'int', 'long', 'short', 'char', 'void',
'signed', 'unsigned', 'float', 'double',
'bool', 'complex', # C99
]
......@@ -25,19 +27,13 @@ module Scanners
'EOF', 'NULL',
'true', 'false', # C99
]
DIRECTIVES = [
'auto', 'extern', 'register', 'static', 'void',
'const', 'volatile', # C89
'inline', # C99
]
IDENT_KIND = WordList.new(:ident).
add(RESERVED_WORDS, :reserved).
add(PREDEFINED_TYPES, :pre_type).
add(DIRECTIVES, :directive).
add(PREDEFINED_CONSTANTS, :pre_constant)
ESCAPE = / [rbfntv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
ESCAPE = / [rbfnrtv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
def scan_tokens tokens, options
......@@ -63,19 +59,16 @@ module Scanners
match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos?
kind = :comment
elsif scan(/ [-+*=<>?:;,!&^|()\[\]{}~%]+ | \/=? | \.(?!\d) /x)
elsif scan(/ [-+*\/=<>?:;,!&^|()\[\]{}~%]+ | \.(?!\d) /x)
kind = :operator
elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
kind = IDENT_KIND[match]
if kind == :ident and check(/:(?!:)/)
# FIXME: don't match a?b:c
match << scan(/:/)
kind = :label
end
elsif scan(/\$/)
kind = :ident
elsif match = scan(/L?"/)
tokens << [:open, :string]
if match[0] == ?L
......@@ -98,7 +91,7 @@ module Scanners
elsif scan(/(?:0[0-7]+)(?![89.eEfF])/)
kind = :oct
elsif scan(/(?:\d+)(?![.eEfF])L?L?/)
elsif scan(/(?:\d+)(?![.eEfF])/)
kind = :integer
elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/)
......@@ -129,7 +122,7 @@ module Scanners
end
when :include_expected
if scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/)
if scan(/[^\n]+/)
kind = :include
state = :initial
......@@ -138,8 +131,8 @@ module Scanners
state = :initial if match.index ?\n
else
state = :initial
next
getch
kind = :error
end
......
......@@ -6,8 +6,6 @@ module Scanners
include Streamable
register_for :debug
file_extension 'raydebug'
title 'CodeRay Token Dump'
protected
def scan_tokens tokens, options
......
......@@ -4,7 +4,6 @@ module Scanners
class Delphi < Scanner
register_for :delphi
file_extension 'pas'
RESERVED_WORDS = [
'and', 'array', 'as', 'at', 'asm', 'at', 'begin', 'case', 'class',
......
......@@ -2,17 +2,12 @@ module CodeRay
module Scanners
# HTML Scanner
#
# $Id$
class HTML < Scanner
include Streamable
register_for :html
KINDS_NOT_LOC = [
:comment, :doctype, :preprocessor,
:tag, :attribute_name, :operator,
:attribute_value, :delimiter, :content,
:plain, :entity, :error
]
ATTR_NAME = /[\w.:-]+/
ATTR_VALUE_UNQUOTED = ATTR_NAME
......@@ -70,14 +65,14 @@ module Scanners
if scan(/<!--.*?-->/m)
kind = :comment
elsif scan(/<!DOCTYPE.*?>/m)
kind = :doctype
kind = :preprocessor
elsif scan(/<\?xml.*?\?>/m)
kind = :preprocessor
elsif scan(/<\?.*?\?>|<%.*?%>/m)
kind = :comment
elsif scan(/<\/[-\w.:]*>/m)
elsif scan(/<\/[-\w_.:]*>/m)
kind = :tag
elsif match = scan(/<[-\w.:]+>?/m)
elsif match = scan(/<[-\w_.:]+>?/m)
kind = :tag
state = :attribute unless match[-1] == ?>
elsif scan(/[^<>&]+/)
......
module CodeRay
module Scanners
class Java < Scanner
register_for :java
RESERVED_WORDS = %w(abstract assert break case catch class
const continue default do else enum extends final finally for
goto if implements import instanceof interface native new
package private protected public return static strictfp super switch
synchronized this throw throws transient try void volatile while)
PREDEFINED_TYPES = %w(boolean byte char double float int long short)
PREDEFINED_CONSTANTS = %w(true false null)
IDENT_KIND = WordList.new(:ident).
add(RESERVED_WORDS, :reserved).
add(PREDEFINED_TYPES, :pre_type).
add(PREDEFINED_CONSTANTS, :pre_constant)
ESCAPE = / [rbfnrtv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
def scan_tokens tokens, options
state = :initial
until eos?
kind = nil
match = nil
case state
when :initial
if scan(/ \s+ | \\\n /x)
kind = :space
elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx)
kind = :comment
elsif match = scan(/ \# \s* if \s* 0 /x)
match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos?
kind = :comment
elsif scan(/ [-+*\/=<>?:;,!&^|()\[\]{}~%]+ | \.(?!\d) /x)
kind = :operator
elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
kind = IDENT_KIND[match]
if kind == :ident and check(/:(?!:)/)
match << scan(/:/)
kind = :label
end
elsif match = scan(/L?"/)
tokens << [:open, :string]
if match[0] == ?L
tokens << ['L', :modifier]
match = '"'
end
state = :string
kind = :delimiter
elsif scan(%r! \@ .* !x)
kind = :preprocessor
elsif scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox)
kind = :char
elsif scan(/0[xX][0-9A-Fa-f]+/)
kind = :hex
elsif scan(/(?:0[0-7]+)(?![89.eEfF])/)
kind = :oct
elsif scan(/(?:\d+)(?![.eEfF])/)
kind = :integer
elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/)
kind = :float
else
getch
kind = :error
end
when :string
if scan(/[^\\\n"]+/)
kind = :content
elsif scan(/"/)
tokens << ['"', :delimiter]
tokens << [:close, :string]
state = :initial
next
elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
kind = :char
elsif scan(/ \\ | $ /x)
tokens << [:close, :string]
kind = :error
state = :initial
else
raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
end
else
raise_inspect 'Unknown state', tokens
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
tokens << [match, kind]
end
if state == :string
tokens << [:close, :string]
end
tokens
end
end
end
end
# http://pastie.textmate.org/50774/
module CodeRay module Scanners
class JavaScript < Scanner
register_for :javascript
RESERVED_WORDS = [
'asm', 'break', 'case', 'continue', 'default', 'do', 'else',
'for', 'goto', 'if', 'return', 'switch', 'while',
# 'struct', 'union', 'enum', 'typedef',
# 'static', 'register', 'auto', 'extern',
# 'sizeof',
'typeof',
# 'volatile', 'const', # C89
# 'inline', 'restrict', # C99
'var', 'function','try','new','in',
'instanceof','throw','catch'
]
PREDEFINED_CONSTANTS = [
'void', 'null', 'this',
'true', 'false','undefined',
]
IDENT_KIND = WordList.new(:ident).
add(RESERVED_WORDS, :reserved).
add(PREDEFINED_CONSTANTS, :pre_constant)
ESCAPE = / [rbfnrtv\n\\\/'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
def scan_tokens tokens, options
state = :initial
string_type = nil
regexp_allowed = true
until eos?
kind = :error
match = nil
if state == :initial
if scan(/ \s+ | \\\n /x)
kind = :space
elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx)
kind = :comment
regexp_allowed = false
elsif match = scan(/ \# \s* if \s* 0 /x)
match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos?
kind = :comment
regexp_allowed = false
elsif regexp_allowed and scan(/\//)
tokens << [:open, :regexp]
state = :regex
kind = :delimiter
elsif scan(/ [-+*\/=<>?:;,!&^|()\[\]{}~%] | \.(?!\d) /x)
kind = :operator
regexp_allowed=true
elsif match = scan(/ [$A-Za-z_][A-Za-z_0-9]* /x)
kind = IDENT_KIND[match]
# if kind == :ident and check(/:(?!:)/)
# match << scan(/:/)
# kind = :label
# end
regexp_allowed=false
elsif match = scan(/["']/)
tokens << [:open, :string]
string_type = matched
state = :string
kind = :delimiter
# elsif scan(/#\s*(\w*)/)
# kind = :preprocessor # FIXME multiline preprocs
# state = :include_expected if self[1] == 'include'
#
# elsif scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox)
# kind = :char
elsif scan(/0[xX][0-9A-Fa-f]+/)
kind = :hex
regexp_allowed=false
elsif scan(/(?:0[0-7]+)(?![89.eEfF])/)
kind = :oct
regexp_allowed=false
elsif scan(/(?:\d+)(?![.eEfF])/)
kind = :integer
regexp_allowed=false
elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/)
kind = :float
regexp_allowed=false
else
getch
end
elsif state == :regex
if scan(/[^\\\/]+/)
kind = :content
elsif scan(/\\\/|\\\\/)
kind = :content
elsif scan(/\//)
tokens << [matched, :delimiter]
tokens << [:close, :regexp]
state = :initial
next
else
getch
kind = :content
end
elsif state == :string
if scan(/[^\\"']+/)
kind = :content
elsif scan(/["']/)
if string_type==matched
tokens << [matched, :delimiter]
tokens << [:close, :string]
state = :initial
string_type=nil
next
else
kind = :content
end
elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
kind = :char
elsif scan(/ \\ | $ /x)
kind = :error
state = :initial
else
raise "else case \" reached; %p not handled." % peek(1), tokens
end
# elsif state == :include_expected
# if scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/)
# kind = :include
# state = :initial
#
# elsif match = scan(/\s+/)
# kind = :space
# state = :initial if match.index ?\n
#
# else
# getch
#
# end
#
else
raise 'else-case reached', tokens
end
match ||= matched
# raise [match, kind], tokens if kind == :error
tokens << [match, kind]
end
tokens
end
end
end end
\ No newline at end of file
......@@ -5,15 +5,13 @@ module Scanners
load :ruby
# Nitro XHTML Scanner
#
# $Id$
class NitroXHTML < Scanner
include Streamable
register_for :nitro_xhtml
file_extension :xhtml
title 'Nitro XHTML'
KINDS_NOT_LOC = HTML::KINDS_NOT_LOC
NITRO_RUBY_BLOCK = /
<\?r
(?>
......
module CodeRay module Scanners
class PHP < Scanner
register_for :php
RESERVED_WORDS = [
'and', 'or', 'xor', '__FILE__', 'exception', '__LINE__', 'array', 'as', 'break', 'case',
'class', 'const', 'continue', 'declare', 'default',
'die', 'do', 'echo', 'else', 'elseif',
'empty', 'enddeclare', 'endfor', 'endforeach', 'endif',
'endswitch', 'endwhile', 'eval', 'exit', 'extends',
'for', 'foreach', 'function', 'global', 'if',
'include', 'include_once', 'isset', 'list', 'new',
'print', 'require', 'require_once', 'return', 'static',
'switch', 'unset', 'use', 'var', 'while',
'__FUNCTION__', '__CLASS__', '__METHOD__', 'final', 'php_user_filter',
'interface', 'implements', 'extends', 'public', 'private',
'protected', 'abstract', 'clone', 'try', 'catch',
'throw', 'cfunction', 'old_function'
]
PREDEFINED_CONSTANTS = [
'null', '$this', 'true', 'false'
]
IDENT_KIND = WordList.new(:ident).
add(RESERVED_WORDS, :reserved).
add(PREDEFINED_CONSTANTS, :pre_constant)
ESCAPE = / [\$\wrbfnrtv\n\\\/'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
def scan_tokens tokens, options
state = :waiting_php
string_type = nil
regexp_allowed = true
until eos?
kind = :error
match = nil
if state == :initial
if scan(/ \s+ | \\\n /x)
kind = :space
elsif scan(/\?>/)
kind = :char
state = :waiting_php
elsif scan(%r{ (//|\#) [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) }mx)
kind = :comment
regexp_allowed = false
elsif match = scan(/ \# \s* if \s* 0 /x)
match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos?
kind = :comment
regexp_allowed = false
elsif regexp_allowed and scan(/\//)
tokens << [:open, :regexp]
state = :regex
kind = :delimiter
elsif scan(/ [-+*\/=<>?:;,!&^|()\[\]{}~%] | \.(?!\d) /x)
kind = :operator
regexp_allowed=true
elsif match = scan(/ [$@A-Za-z_][A-Za-z_0-9]* /x)
kind = IDENT_KIND[match]
regexp_allowed=false
elsif match = scan(/["']/)
tokens << [:open, :string]
string_type = matched
state = :string
kind = :delimiter
elsif scan(/0[xX][0-9A-Fa-f]+/)
kind = :hex
regexp_allowed=false
elsif scan(/(?:0[0-7]+)(?![89.eEfF])/)
kind = :oct
regexp_allowed=false
elsif scan(/(?:\d+)(?![.eEfF])/)
kind = :integer
regexp_allowed=false
elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/)
kind = :float
regexp_allowed=false
else
getch
end
elsif state == :regex
if scan(/[^\\\/]+/)
kind = :content
elsif scan(/\\\/|\\/)
kind = :content
elsif scan(/\//)
tokens << [matched, :delimiter]
tokens << [:close, :regexp]
state = :initial
next
else
getch
kind = :content
end
elsif state == :string
if scan(/[^\\"']+/)
kind = :content
elsif scan(/["']/)
if string_type==matched
tokens << [matched, :delimiter]
tokens << [:close, :string]
state = :initial
string_type=nil
next
else
kind = :content
end
elsif scan(/ \\ (?: \S ) /mox)
kind = :char
elsif scan(/ \\ | $ /x)
kind = :error
state = :initial
else
raise "else case \" reached; %p not handled." % peek(1), tokens
end
elsif state == :waiting_php
if scan(/<\?php/m)
kind = :char
state = :initial
elsif scan(/[^<]+/)
kind = :comment
else
kind = :comment
getch
end
else
raise 'else-case reached', tokens
end
match ||= matched
tokens << [match, kind]
end
tokens
end
end
end end
\ No newline at end of file
......@@ -4,12 +4,9 @@ module Scanners
class Plaintext < Scanner
register_for :plaintext, :plain
title 'Plain text'
include Streamable
KINDS_NOT_LOC = [:plain]
def scan_tokens tokens, options
text = (scan_until(/\z/) || '')
tokens << [text, :plain]
......
......@@ -5,13 +5,12 @@ module Scanners
load :ruby
# RHTML Scanner
#
# $Id$
class RHTML < Scanner
include Streamable
register_for :rhtml
title 'HTML ERB Template'
KINDS_NOT_LOC = HTML::KINDS_NOT_LOC
ERB_RUBY_BLOCK = /
<%(?!%)[=-]?
......
......@@ -21,10 +21,6 @@ module Scanners
file_extension 'rb'
helper :patterns
if not defined? EncodingError
EncodingError = Class.new Exception
end
private
def scan_tokens tokens, options
......@@ -35,10 +31,9 @@ module Scanners
state = :initial
depth = nil
inline_block_stack = []
unicode = string.respond_to?(:encoding) && string.encoding.name == 'UTF-8'
patterns = Patterns # avoid constant lookup
until eos?
match = nil
kind = nil
......@@ -129,15 +124,14 @@ module Scanners
# {{{
if match = scan(/[ \t\f]+/)
kind = :space
match << scan(/\s*/) unless eos? || heredocs
value_expected = true if match.index(?\n)
match << scan(/\s*/) unless eos? or heredocs
tokens << [match, kind]
next
elsif match = scan(/\\?\n/)
kind = :space
if match == "\n"
value_expected = true
value_expected = true # FIXME not quite true
state = :initial if state == :undef_comma_expected
end
if heredocs
......@@ -152,21 +146,17 @@ module Scanners
tokens << [match, kind]
next
elsif bol? && match = scan(/\#!.*/)
tokens << [match, :doctype]
next
elsif match = scan(/\#.*/) or
( bol? and match = scan(/#{patterns::RUBYDOC_OR_DATA}/o) )
kind = :comment
value_expected = true
tokens << [match, kind]
next
elsif state == :initial
# IDENTS #
if match = scan(unicode ? /#{patterns::METHOD_NAME}/uo :
/#{patterns::METHOD_NAME}/o)
if match = scan(/#{patterns::METHOD_NAME}/o)
if last_token_dot
kind = if match[/^[A-Z]/] and not match?(/\(/) then :constant else :ident end
else
......@@ -175,12 +165,13 @@ module Scanners
kind = :constant
elsif kind == :reserved
state = patterns::DEF_NEW_STATE[match]
value_expected = :set if patterns::VALUE_EXPECTING_KEYWORDS[match]
end
end
value_expected = :set if check(/#{patterns::VALUE_FOLLOWS}/o)
## experimental!
value_expected = :set if
patterns::REGEXP_ALLOWED[match] or check(/#{patterns::VALUE_FOLLOWS}/o)
elsif last_token_dot and match = scan(/#{patterns::METHOD_NAME_OPERATOR}|\(/o)
elsif last_token_dot and match = scan(/#{patterns::METHOD_NAME_OPERATOR}/o)
kind = :ident
value_expected = :set if check(/#{patterns::VALUE_FOLLOWS}/o)
......@@ -199,7 +190,6 @@ module Scanners
depth -= 1
if depth == 0 # closing brace of inline block reached
state, depth, heredocs = inline_block_stack.pop
heredocs = nil if heredocs && heredocs.empty?
tokens << [match, :inline_delimiter]
kind = :inline
match = :close
......@@ -221,9 +211,8 @@ module Scanners
interpreted = true
state = patterns::StringState.new :regexp, interpreted, match
# elsif match = scan(/[-+]?#{patterns::NUMERIC}/o)
elsif match = value_expected ? scan(/[-+]?#{patterns::NUMERIC}/o) : scan(/#{patterns::NUMERIC}/o)
kind = self[1] ? :float : :integer
elsif match = scan(/#{patterns::NUMERIC}/o)
kind = if self[1] then :float else :integer end
elsif match = scan(/#{patterns::SYMBOL}/o)
case delim = match[1]
......@@ -285,41 +274,18 @@ module Scanners
else
kind = :error
match = (scan(/./mu) rescue nil) || getch
if !unicode && match.size > 1
# warn 'Switchig to unicode mode: %p' % ['ä'[/#{patterns::METHOD_NAME}/uo]]
unicode = true
unscan
next
end
match = getch
end
elsif state == :def_expected
state = :initial
if scan(/self\./)
tokens << ['self', :pre_constant]
tokens << ['.', :operator]
end
if match = scan(unicode ? /(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/uo :
/(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/o)
if match = scan(/(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/o)
kind = :method
else
next
end
elsif state == :module_expected
if match = scan(/<</)
kind = :operator
else
state = :initial
if match = scan(/ (?:#{patterns::IDENT}::)* #{patterns::IDENT} /ox)
kind = :class
else
next
end
end
elsif state == :undef_expected
state = :undef_comma_expected
if match = scan(/#{patterns::METHOD_NAME_EX}/o)
......@@ -341,22 +307,6 @@ module Scanners
next
end
elsif state == :alias_expected
begin
match = scan(unicode ? /(#{patterns::METHOD_NAME_OR_SYMBOL})([ \t]+)(#{patterns::METHOD_NAME_OR_SYMBOL})/uo :
/(#{patterns::METHOD_NAME_OR_SYMBOL})([ \t]+)(#{patterns::METHOD_NAME_OR_SYMBOL})/o)
rescue EncodingError
raise if $DEBUG
end
if match
tokens << [self[1], (self[1][0] == ?: ? :symbol : :method)]
tokens << [self[2], :space]
tokens << [self[3], (self[3][0] == ?: ? :symbol : :method)]
end
state = :initial
next
elsif state == :undef_comma_expected
if match = scan(/,/)
kind = :operator
......@@ -366,14 +316,24 @@ module Scanners
next
end
elsif state == :module_expected
if match = scan(/<</)
kind = :operator
else
state = :initial
if match = scan(/ (?:#{patterns::IDENT}::)* #{patterns::IDENT} /ox)
kind = :class
else
next
end
end
end
# }}}
unless kind == :error
value_expected = value_expected == :set
last_token_dot = last_token_dot == :set
end
value_expected = value_expected == :set
last_token_dot = last_token_dot == :set
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens, state
......
# encoding: utf-8
module CodeRay
module Scanners
......@@ -15,14 +14,19 @@ module Scanners
DEF_KEYWORDS = %w[ def ]
UNDEF_KEYWORDS = %w[ undef ]
ALIAS_KEYWORDS = %w[ alias ]
MODULE_KEYWORDS = %w[class module]
DEF_NEW_STATE = WordList.new(:initial).
add(DEF_KEYWORDS, :def_expected).
add(UNDEF_KEYWORDS, :undef_expected).
add(ALIAS_KEYWORDS, :alias_expected).
add(MODULE_KEYWORDS, :module_expected)
IDENTS_ALLOWING_REGEXP = %w[
and or not while until unless if then elsif when sub sub! gsub gsub!
scan slice slice! split
]
REGEXP_ALLOWED = WordList.new(false).
add(IDENTS_ALLOWING_REGEXP, :set)
PREDEFINED_CONSTANTS = %w[
nil true false self
DATA ARGV ARGF __FILE__ __LINE__
......@@ -32,25 +36,24 @@ module Scanners
add(RESERVED_WORDS, :reserved).
add(PREDEFINED_CONSTANTS, :pre_constant)
IDENT = 'ä'[/[[:alpha:]]/] == 'ä' ? /[[:alpha:]_][[:alnum:]_]*/ : /[^\W\d]\w*/
IDENT = /[a-z_][\w_]*/i
METHOD_NAME = / #{IDENT} [?!]? /ox
METHOD_NAME_OPERATOR = /
\*\*? # multiplication and power
| [-+~]@? # plus, minus, tilde with and without at sign
| [\/%&|^`] # division, modulo or format strings, and, or, xor, system
| [-+]@? # plus, minus
| [\/%&|^`~] # division, modulo or format strings, &and, |or, ^xor, `system`, tilde
| \[\]=? # array getter and setter
| << | >> # append or shift left, shift right
| <=?>? | >=? # comparison, rocket operator
| ===? | =~ # simple equality, case equality, match
| ![~=@]? # negation with and without at sign, not-equal and not-match
| ===? # simple equality and case equality
/ox
METHOD_NAME_EX = / #{IDENT} (?:[?!]|=(?!>))? | #{METHOD_NAME_OPERATOR} /ox
INSTANCE_VARIABLE = / @ #{IDENT} /ox
CLASS_VARIABLE = / @@ #{IDENT} /ox
OBJECT_VARIABLE = / @@? #{IDENT} /ox
GLOBAL_VARIABLE = / \$ (?: #{IDENT} | [1-9]\d* | 0\w* | [~&+`'=\/,;_.<>!@$?*":\\] | -[a-zA-Z_0-9] ) /ox
PREFIX_VARIABLE = / #{GLOBAL_VARIABLE} | #{OBJECT_VARIABLE} /ox
PREFIX_VARIABLE = / #{GLOBAL_VARIABLE} |#{OBJECT_VARIABLE} /ox
VARIABLE = / @?@? #{IDENT} | #{GLOBAL_VARIABLE} /ox
QUOTE_TO_TYPE = {
......@@ -60,7 +63,7 @@ module Scanners
QUOTE_TO_TYPE.default = :string
REGEXP_MODIFIERS = /[mixounse]*/
REGEXP_SYMBOLS = /[|?*+(){}\[\].^$]/
REGEXP_SYMBOLS = /[|?*+?(){}\[\].^$]/
DECIMAL = /\d+(?:_\d+)*/
OCTAL = /0_?[0-7]+(?:_[0-7]+)*/
......@@ -70,7 +73,7 @@ module Scanners
EXPONENT = / [eE] [+-]? #{DECIMAL} /ox
FLOAT_SUFFIX = / #{EXPONENT} | \. #{DECIMAL} #{EXPONENT}? /ox
FLOAT_OR_INT = / #{DECIMAL} (?: #{FLOAT_SUFFIX} () )? /ox
NUMERIC = / (?: (?=0) (?: #{OCTAL} | #{HEXADECIMAL} | #{BINARY} ) | #{FLOAT_OR_INT} ) /ox
NUMERIC = / [-+]? (?: (?=0) (?: #{OCTAL} | #{HEXADECIMAL} | #{BINARY} ) | #{FLOAT_OR_INT} ) /ox
SYMBOL = /
:
......@@ -80,32 +83,26 @@ module Scanners
| ['"]
)
/ox
METHOD_NAME_OR_SYMBOL = / #{METHOD_NAME_EX} | #{SYMBOL} /ox
SIMPLE_ESCAPE = /
# TODO investigste \M, \c and \C escape sequences
# (?: M-\\C-|C-\\M-|M-\\c|c\\M-|c|C-|M-)? (?: \\ (?: [0-7]{3} | x[0-9A-Fa-f]{2} | . ) )
# assert_equal(225, ?\M-a)
# assert_equal(129, ?\M-\C-a)
ESCAPE = /
[abefnrstv]
| M-\\C-|C-\\M-|M-\\c|c\\M-|c|C-|M-
| [0-7]{1,3}
| x[0-9A-Fa-f]{1,2}
| .?
| .
/mx
CONTROL_META_ESCAPE = /
(?: M-|C-|c )
(?: \\ (?: M-|C-|c ) )*
(?: [^\\] | \\ #{SIMPLE_ESCAPE} )?
/mox
ESCAPE = /
#{CONTROL_META_ESCAPE} | #{SIMPLE_ESCAPE}
/mox
CHARACTER = /
\?
(?:
[^\s\\]
| \\ #{ESCAPE}
)
/mox
/mx
# NOTE: This is not completely correct, but
# nobody needs heredoc delimiters ending with \n.
......@@ -132,29 +129,25 @@ module Scanners
/mx
# Checks for a valid value to follow. This enables
# value_expected in method calls without parentheses.
# fancy_allowed in method calls.
VALUE_FOLLOWS = /
(?>[ \t\f\v]+)
\s+
(?:
[%\/][^\s=]
| <<-?\S
| [-+] \d
| #{CHARACTER}
|
<<-?\S
|
#{CHARACTER}
)
/x
VALUE_EXPECTING_KEYWORDS = WordList.new.add(%w[
and end in or unless begin
defined? ensure redo super until
break do next rescue then
when case else for retry
while elsif if not return
yield
])
RUBYDOC_OR_DATA = / #{RUBYDOC} | #{DATA} /xo
RDOC_DATA_START = / ^=begin (?!\S) | ^__END__$ /x
# FIXME: \s and = are only a workaround, they are still allowed
# as delimiters.
FANCY_START_SAVE = / % ( [qQwWxsr] | (?![a-zA-Z0-9\s=]) ) ([^a-zA-Z0-9]) /mx
FANCY_START_CORRECT = / % ( [qQwWxsr] | (?![a-zA-Z0-9]) ) ([^a-zA-Z0-9]) /mx
FancyStringType = {
......@@ -177,18 +170,17 @@ module Scanners
{ }
] ]
CLOSING_PAREN.each { |k,v| k.freeze; v.freeze } # debug, if I try to change it with <<
CLOSING_PAREN.values.each { |o| o.freeze } # debug, if I try to change it with <<
OPENING_PAREN = CLOSING_PAREN.invert
STRING_PATTERN = Hash.new do |h, k|
STRING_PATTERN = Hash.new { |h, k|
delim, interpreted = *k
delim_pattern = Regexp.escape(delim)
delim_pattern = Regexp.escape(delim.dup)
if closing_paren = CLOSING_PAREN[delim]
delim_pattern = delim_pattern[0..-1] if defined? JRUBY_VERSION # JRuby fix
delim_pattern << Regexp.escape(closing_paren)
end
delim_pattern << '\\\\' unless delim == '\\'
special_escapes =
case interpreted
when :regexp_symbols
......@@ -196,16 +188,16 @@ module Scanners
when :words
'| \s'
end
h[k] =
if interpreted and not delim == '#'
/ (?= [#{delim_pattern}] | \# [{$@] #{special_escapes} ) /mx
/ (?= [#{delim_pattern}\\] | \# [{$@] #{special_escapes} ) /mx
else
/ (?= [#{delim_pattern}] #{special_escapes} ) /mx
/ (?= [#{delim_pattern}\\] #{special_escapes} ) /mx
end
end
}
HEREDOC_PATTERN = Hash.new do |h, k|
HEREDOC_PATTERN = Hash.new { |h, k|
delim, interpreted, indented = *k
delim_pattern = Regexp.escape(delim.dup)
delim_pattern = / \n #{ '(?>[\ \t]*)' if indented } #{ Regexp.new delim_pattern } $ /x
......@@ -215,12 +207,12 @@ module Scanners
else
/ (?= #{delim_pattern}() | \\ ) /mx
end
end
}
def initialize kind, interpreted, delim, heredoc = false
if heredoc
pattern = HEREDOC_PATTERN[ [delim, interpreted, heredoc == :indented] ]
delim = nil
delim = nil
else
pattern = STRING_PATTERN[ [delim, interpreted] ]
if paren = CLOSING_PAREN[delim]
......
......@@ -6,7 +6,7 @@ module CodeRay
class Scheme < Scanner
register_for :scheme
file_extension 'scm'
file_extension :scm
CORE_FORMS = %w[
lambda let let* letrec syntax-case define-syntax let-syntax
......
......@@ -5,12 +5,13 @@ module Scanners
# XML Scanner
#
# $Id$
#
# Currently this is the same scanner as Scanners::HTML.
class XML < HTML
register_for :xml
file_extension 'xml'
end
end
......
......@@ -8,7 +8,7 @@ module Styles
code_background = '#f8f8f8'
numbers_background = '#def'
border_color = 'silver'
normal_color = '#000'
normal_color = '#100'
CSS_MAIN_STYLES = <<-MAIN
.CodeRay {
......@@ -32,7 +32,6 @@ table.CodeRay td { padding: 2px 4px; vertical-align: top }
text-align: right;
}
.CodeRay .line_numbers tt { font-weight: bold }
.CodeRay .line_numbers .highlighted { color: red }
.CodeRay .no { padding: 0px 4px }
.CodeRay .code { width: 100% }
......@@ -47,32 +46,28 @@ ol.CodeRay li { white-space: pre }
.af { color:#00C }
.an { color:#007 }
.at { color:#f08 }
.av { color:#700 }
.aw { color:#C00 }
.bi { color:#509; font-weight:bold }
.c { color:#888; }
.c { color:#666; }
.ch { color:#04D }
.ch .k { color:#04D }
.ch .dl { color:#039 }
.cl { color:#B06; font-weight:bold }
.cm { color:#A08; font-weight:bold }
.co { color:#036; font-weight:bold }
.cr { color:#0A0 }
.cv { color:#369 }
.de { color:#B0B; }
.df { color:#099; font-weight:bold }
.di { color:#088; font-weight:bold }
.dl { color:black }
.do { color:#970 }
.dt { color:#34b }
.ds { color:#D42; font-weight:bold }
.e { color:#666; font-weight:bold }
.en { color:#800; font-weight:bold }
.er { color:#F00; background-color:#FAA }
.ex { color:#C00; font-weight:bold }
.ex { color:#F00; font-weight:bold }
.fl { color:#60E; font-weight:bold }
.fu { color:#06B; font-weight:bold }
.gv { color:#d70; font-weight:bold }
......@@ -80,13 +75,11 @@ ol.CodeRay li { white-space: pre }
.i { color:#00D; font-weight:bold }
.ic { color:#B44; font-weight:bold }
.il { background: #ddd; color: black }
.il .il { background: #ccc }
.il .il .il { background: #bbb }
.il .idl { background: #ddd; font-weight: bold; color: #666 }
.idl { background-color: #bbb; font-weight: bold; color: #666; }
.il { background: #eee }
.il .il { background: #ddd }
.il .il .il { background: #ccc }
.il .idl { font-weight: bold; color: #888 }
.im { color:#f00; }
.in { color:#B2B; font-weight:bold }
.iv { color:#33B }
.la { color:#970; font-weight:bold }
......@@ -96,15 +89,9 @@ ol.CodeRay li { white-space: pre }
.op { }
.pc { color:#038; font-weight:bold }
.pd { color:#369; font-weight:bold }
.pp { color:#579; }
.ps { color:#00C; font-weight:bold; }
.pt { color:#074; font-weight:bold }
.r, .kw { color:#080; font-weight:bold }
.ke { color: #808; }
.ke .dl { color: #606; }
.ke .ch { color: #80f; }
.vl { color: #088; }
.pp { color:#579 }
.pt { color:#339; font-weight:bold }
.r { color:#080; font-weight:bold }
.rx { background-color:#fff0ff }
.rx .k { color:#808 }
......@@ -112,15 +99,14 @@ ol.CodeRay li { white-space: pre }
.rx .mod { color:#C2C }
.rx .fu { color:#404; font-weight: bold }
.s { background-color:#fff0f0; color: #D20; }
.s .s { background-color:#ffe0e0 }
.s .s .s { background-color:#ffd0d0 }
.s .k { }
.s .ch { color: #b0b; }
.s .dl { color: #710; }
.s { background-color:#fff0f0 }
.s .s { background-color:#ffe0e0 }
.s .s .s { background-color:#ffd0d0 }
.s .k { color:#D20 }
.s .dl { color:#710 }
.sh { background-color:#f0fff0; color:#2B2 }
.sh .k { }
.sh { background-color:#f0fff0 }
.sh .k { color:#2B2 }
.sh .dl { color:#161 }
.sy { color:#A60 }
......@@ -133,16 +119,6 @@ ol.CodeRay li { white-space: pre }
.ty { color:#339; font-weight:bold }
.v { color:#036 }
.xt { color:#444 }
.ins { background: #afa; }
.del { background: #faa; }
.chg { color: #aaf; background: #007; }
.head { color: #f8f; background: #505 }
.ins .ins { color: #080; font-weight:bold }
.del .del { color: #800; font-weight:bold }
.chg .chg { color: #66f; }
.head .head { color: #f4f; }
TOKENS
end
......
......@@ -84,9 +84,6 @@ ol.CodeRay li { white-space: pre; }
.pp { color:#579; }
.pt { color:#66f; font-weight:bold; }
.r { color:#5de; font-weight:bold; }
.r, .kw { color:#5de; font-weight:bold }
.ke { color: #808; }
.rx { background-color:#221133; }
.rx .k { color:#f8f; }
......@@ -114,16 +111,6 @@ ol.CodeRay li { white-space: pre; }
.ty { color:#339; font-weight:bold; }
.v { color:#036; }
.xt { color:#444; }
.ins { background: #afa; }
.del { background: #faa; }
.chg { color: #aaf; background: #007; }
.head { color: #f8f; background: #505 }
.ins .ins { color: #080; font-weight:bold }
.del .del { color: #800; font-weight:bold }
.chg .chg { color: #66f; }
.head .head { color: #f4f; }
TOKENS
end
......
......@@ -4,7 +4,6 @@ module CodeRay
h[k] = k.to_s
end
ClassOfKind.update with = {
:annotation => 'at',
:attribute_name => 'an',
:attribute_name_fat => 'af',
:attribute_value => 'av',
......@@ -15,15 +14,12 @@ module CodeRay
:class_variable => 'cv',
:color => 'cr',
:comment => 'c',
:complex => 'cm',
:constant => 'co',
:content => 'k',
:decorator => 'de',
:definition => 'df',
:delimiter => 'dl',
:directive => 'di',
:doc => 'do',
:doctype => 'dt',
:doc_string => 'ds',
:entity => 'en',
:error => 'er',
......@@ -33,16 +29,12 @@ module CodeRay
:function => 'fu',
:global_variable => 'gv',
:hex => 'hx',
:imaginary => 'cm',
:important => 'im',
:include => 'ic',
:inline => 'il',
:inline_delimiter => 'idl',
:instance_variable => 'iv',
:integer => 'i',
:interpreted => 'in',
:keyword => 'kw',
:key => 'ke',
:label => 'la',
:local_variable => 'lv',
:modifier => 'mod',
......@@ -52,7 +44,6 @@ module CodeRay
:pre_type => 'pt',
:predefined => 'pd',
:preprocessor => 'pp',
:pseudo_class => 'ps',
:regexp => 'rx',
:reserved => 'r',
:shell => 'sh',
......@@ -63,13 +54,7 @@ module CodeRay
:tag_special => 'ts',
:type => 'ty',
:variable => 'v',
:value => 'vl',
:xml_text => 'xt',
:insert => 'ins',
:delete => 'del',
:change => 'chg',
:head => 'head',
:ident => :NO_HIGHLIGHT, # 'id'
#:operator => 'op',
......@@ -77,7 +62,7 @@ module CodeRay
:space => :NO_HIGHLIGHT, # 'sp'
:plain => :NO_HIGHLIGHT,
}
ClassOfKind[:method] = ClassOfKind[:function]
ClassOfKind[:procedure] = ClassOfKind[:method] = ClassOfKind[:function]
ClassOfKind[:open] = ClassOfKind[:close] = ClassOfKind[:delimiter]
ClassOfKind[:nesting_delimiter] = ClassOfKind[:delimiter]
ClassOfKind[:escape] = ClassOfKind[:delimiter]
......
......@@ -46,10 +46,47 @@ module CodeRay
#
# Tokens' subclass TokenStream allows streaming to save memory.
class Tokens < Array
# The Scanner instance that created the tokens.
attr_accessor :scanner
class << self
# Convert the token to a string.
#
# This format is used by Encoders.Tokens.
# It can be reverted using read_token.
def write_token text, type
if text.is_a? String
"#{type}\t#{escape(text)}\n"
else
":#{text}\t#{type}\t\n"
end
end
# Read a token from the string.
#
# Inversion of write_token.
#
# TODO Test this!
def read_token token
type, text = token.split("\t", 2)
if type[0] == ?:
[text.to_sym, type[1..-1].to_sym]
else
[type.to_sym, unescape(text)]
end
end
# Escapes a string for use in write_token.
def escape text
text.gsub(/[\n\\]/, '\\\\\&')
end
# Unescapes a string created by escape.
def unescape text
text.gsub(/\\[\n\\]/) { |m| m[1,1] }
end
end
# Whether the object is a TokenStream.
#
# Returns false.
......@@ -109,6 +146,7 @@ module CodeRay
encode :text, options
end
# Redirects unknown methods to encoder calls.
#
# For example, if you call +tokens.html+, the HTML encoder
......@@ -162,29 +200,25 @@ module CodeRay
#
# TODO: Test this!
def fix
tokens = self.class.new
# Check token nesting using a stack of kinds.
opened = []
for type, kind in self
case type
when :open
opened.push [:close, kind]
when :begin_line
opened.push [:end_line, kind]
when :close, :end_line
for token, kind in self
if token == :open
opened.push kind
elsif token == :close
expected = opened.pop
if [type, kind] != expected
if kind != expected
# Unexpected :close; decide what to do based on the kind:
# - token was never opened: delete the :close (just skip it)
next unless opened.rindex expected
# - token was opened earlier: also close tokens in between
tokens << token until (token = opened.pop) == expected
# - token was never opened: delete the :close (skip with next)
next unless opened.rindex expected
tokens << [:close, kind] until (kind = opened.pop) == expected
end
end
tokens << [type, kind]
tokens << [token, kind]
end
# Close remaining opened tokens
tokens << token while token = opened.pop
tokens << [:close, kind] while kind = opened.pop
tokens
end
......@@ -192,8 +226,6 @@ module CodeRay
replace fix
end
# TODO: Scanner#split_into_lines
#
# Makes sure that:
# - newlines are single tokens
# (which means all other token are single-line)
......@@ -321,7 +353,7 @@ module CodeRay
#
# Returns self.
def << token
@callback.call(*token)
@callback.call token
@size += 1
self
end
......@@ -344,48 +376,8 @@ module CodeRay
end
end
# Token name abbreviations
require 'coderay/token_classes'
if $0 == __FILE__
$VERBOSE = true
$: << File.join(File.dirname(__FILE__), '..')
eval DATA.read, nil, $0, __LINE__ + 4
end
__END__
require 'test/unit'
class TokensTest < Test::Unit::TestCase
def test_creation
assert CodeRay::Tokens < Array
tokens = nil
assert_nothing_raised do
tokens = CodeRay::Tokens.new
end
assert_kind_of Array, tokens
end
def test_adding_tokens
tokens = CodeRay::Tokens.new
assert_nothing_raised do
tokens << ['string', :type]
tokens << ['()', :operator]
end
assert_equal tokens.size, 2
end
def test_dump_undump
tokens = CodeRay::Tokens.new
assert_nothing_raised do
tokens << ['string', :type]
tokens << ['()', :operator]
end
tokens2 = nil
assert_nothing_raised do
tokens2 = tokens.dump.undump
end
assert_equal tokens, tokens2
end
end
\ No newline at end of file
module CodeRay
module Encoders
load :token_class_filter
class CommentFilter < TokenClassFilter
register_for :comment_filter
DEFAULT_OPTIONS = TokenClassFilter::DEFAULT_OPTIONS.merge \
:exclude => [:comment]
end
end
end
module CodeRay
module Encoders
class Filter < Encoder
register_for :filter
protected
def setup options
@out = Tokens.new
end
end
end
end
module CodeRay
module Encoders
# = JSON Encoder
class JSON < Encoder
register_for :json
FILE_EXTENSION = 'json'
protected
def compile tokens, options
require 'json'
@out = tokens.to_a.to_json
end
end
end
end
module CodeRay
module Encoders
# Counts the LoC (Lines of Code). Returns an Integer >= 0.
#
# Everything that is not comment, markup, doctype/shebang, or an empty line,
# is considered to be code.
#
# For example,
# * HTML files not containing JavaScript have 0 LoC
# * in a Java class without comments, LoC is the number of non-empty lines
#
# A Scanner class should define the token kinds that are not code in the
# KINDS_NOT_LOC constant.
class LinesOfCode < Encoder
register_for :lines_of_code
NON_EMPTY_LINE = /^\s*\S.*$/
def compile tokens, options
kinds_not_loc = tokens.scanner.class::KINDS_NOT_LOC
code = tokens.token_class_filter :exclude => kinds_not_loc
@loc = code.text.scan(NON_EMPTY_LINE).size
end
def finish options
@loc
end
end
end
end
module CodeRay
module Encoders
load :filter
class TokenClassFilter < Filter
include Streamable
register_for :token_class_filter
DEFAULT_OPTIONS = {
:exclude => [],
:include => :all
}
protected
def setup options
super
@exclude = options[:exclude]
@include = options[:include]
end
def text_token text, kind
[text, kind] if \
(@include == :all || @include.include?(kind)) &&
!(@exclude == :all || @exclude.include?(kind))
end
end
end
end
module CodeRay
# A little hack to enable CodeRay highlighting in RedCloth.
#
# Usage:
# require 'coderay'
# require 'coderay/for_redcloth'
# RedCloth.new('@[ruby]puts "Hello, World!"@').to_html
#
# Make sure you have RedCloth 4.0.3 activated, for example by calling
# require 'rubygems'
# before RedCloth is loaded and before calling CodeRay.for_redcloth.
module ForRedCloth
def self.install
gem 'RedCloth', '>= 4.0.3' rescue nil
require 'redcloth'
unless RedCloth::VERSION.to_s >= '4.0.3'
raise 'CodeRay.for_redcloth needs RedCloth version 4.0.3 or later.'
end
RedCloth::TextileDoc.send :include, ForRedCloth::TextileDoc
RedCloth::Formatters::HTML.module_eval do
def unescape(html)
replacements = {
'&amp;' => '&',
'&quot;' => '"',
'&gt;' => '>',
'&lt;' => '<',
}
html.gsub(/&(?:amp|quot|[gl]t);/) { |entity| replacements[entity] }
end
undef code, bc_open, bc_close, escape_pre
def code(opts) # :nodoc:
opts[:block] = true
if !opts[:lang] && RedCloth::VERSION.to_s >= '4.2.0'
# simulating pre-4.2 behavior
if opts[:text].sub!(/\A\[(\w+)\]/, '')
if CodeRay::Scanners[$1].plugin_id == 'plaintext'
opts[:text] = $& + opts[:text]
else
opts[:lang] = $1
end
end
end
if opts[:lang] && !filter_coderay
require 'coderay'
@in_bc ||= nil
format = @in_bc ? :div : :span
opts[:text] = unescape(opts[:text]) unless @in_bc
highlighted_code = CodeRay.encode opts[:text], opts[:lang], format, :stream => true
highlighted_code.sub!(/\A<(span|div)/) { |m| m + pba(@in_bc || opts) }
highlighted_code
else
"<code#{pba(opts)}>#{opts[:text]}</code>"
end
end
def bc_open(opts) # :nodoc:
opts[:block] = true
@in_bc = opts
opts[:lang] ? '' : "<pre#{pba(opts)}>"
end
def bc_close(opts) # :nodoc:
opts = @in_bc
@in_bc = nil
opts[:lang] ? '' : "</pre>\n"
end
def escape_pre(text)
if @in_bc ||= nil
text
else
html_esc(text, :html_escape_preformatted)
end
end
end
end
module TextileDoc # :nodoc:
attr_accessor :filter_coderay
end
end
end
CodeRay::ForRedCloth.install
\ No newline at end of file
module CodeRay
module Scanners
class CPlusPlus < Scanner
include Streamable
register_for :cpp
file_extension 'cpp'
title 'C++'
# http://www.cppreference.com/wiki/keywords/start
RESERVED_WORDS = [
'and', 'and_eq', 'asm', 'bitand', 'bitor', 'break',
'case', 'catch', 'class', 'compl', 'const_cast',
'continue', 'default', 'delete', 'do', 'dynamic_cast', 'else',
'enum', 'export', 'for', 'goto', 'if', 'namespace', 'new',
'not', 'not_eq', 'or', 'or_eq', 'reinterpret_cast', 'return',
'sizeof', 'static_cast', 'struct', 'switch', 'template',
'throw', 'try', 'typedef', 'typeid', 'typename', 'union',
'while', 'xor', 'xor_eq'
]
PREDEFINED_TYPES = [
'bool', 'char', 'double', 'float', 'int', 'long',
'short', 'signed', 'unsigned', 'wchar_t', 'string'
]
PREDEFINED_CONSTANTS = [
'false', 'true',
'EOF', 'NULL',
]
PREDEFINED_VARIABLES = [
'this'
]
DIRECTIVES = [
'auto', 'const', 'explicit', 'extern', 'friend', 'inline', 'mutable', 'operator',
'private', 'protected', 'public', 'register', 'static', 'using', 'virtual', 'void',
'volatile'
]
IDENT_KIND = WordList.new(:ident).
add(RESERVED_WORDS, :reserved).
add(PREDEFINED_TYPES, :pre_type).
add(PREDEFINED_VARIABLES, :local_variable).
add(DIRECTIVES, :directive).
add(PREDEFINED_CONSTANTS, :pre_constant)
ESCAPE = / [rbfntv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
def scan_tokens tokens, options
state = :initial
until eos?
kind = nil
match = nil
case state
when :initial
if scan(/ \s+ | \\\n /x)
kind = :space
elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx)
kind = :comment
elsif match = scan(/ \# \s* if \s* 0 /x)
match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos?
kind = :comment
elsif scan(/ [-+*=<>?:;,!&^|()\[\]{}~%]+ | \/=? | \.(?!\d) /x)
kind = :operator
elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
kind = IDENT_KIND[match]
if kind == :ident and check(/:(?!:)/)
# FIXME: don't match a?b:c
kind = :label
elsif match == 'class'
state = :class_name_expected
end
elsif scan(/\$/)
kind = :ident
elsif match = scan(/L?"/)
tokens << [:open, :string]
if match[0] == ?L
tokens << ['L', :modifier]
match = '"'
end
state = :string
kind = :delimiter
elsif scan(/#\s*(\w*)/)
kind = :preprocessor
state = :include_expected if self[1] == 'include'
elsif scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox)
kind = :char
elsif scan(/0[xX][0-9A-Fa-f]+/)
kind = :hex
elsif scan(/(?:0[0-7]+)(?![89.eEfF])/)
kind = :oct
elsif scan(/(?:\d+)(?![.eEfF])L?L?/)
kind = :integer
elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/)
kind = :float
else
getch
kind = :error
end
when :string
if scan(/[^\\"]+/)
kind = :content
elsif scan(/"/)
tokens << ['"', :delimiter]
tokens << [:close, :string]
state = :initial
next
elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
kind = :char
elsif scan(/ \\ | $ /x)
tokens << [:close, :string]
kind = :error
state = :initial
else
raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
end
when :include_expected
if scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/)
kind = :include
state = :initial
elsif match = scan(/\s+/)
kind = :space
state = :initial if match.index ?\n
else
state = :initial
next
end
when :class_name_expected
if scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
kind = :class
state = :initial
elsif match = scan(/\s+/)
kind = :space
else
getch
kind = :error
state = :initial
end
else
raise_inspect 'Unknown state', tokens
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
tokens << [match, kind]
end
if state == :string
tokens << [:close, :string]
end
tokens
end
end
end
end
module CodeRay
module Scanners
class CSS < Scanner
register_for :css
KINDS_NOT_LOC = [
:comment,
:class, :pseudo_class, :type,
:constant, :directive,
:key, :value, :operator, :color, :float,
:error, :important,
]
module RE
NonASCII = /[\x80-\xFF]/
Hex = /[0-9a-fA-F]/
Unicode = /\\#{Hex}{1,6}(?:\r\n|\s)?/ # differs from standard because it allows uppercase hex too
Escape = /#{Unicode}|\\[^\r\n\f0-9a-fA-F]/
NMChar = /[-_a-zA-Z0-9]|#{NonASCII}|#{Escape}/
NMStart = /[_a-zA-Z]|#{NonASCII}|#{Escape}/
NL = /\r\n|\r|\n|\f/
String1 = /"(?:[^\n\r\f\\"]|\\#{NL}|#{Escape})*"?/ # FIXME: buggy regexp
String2 = /'(?:[^\n\r\f\\']|\\#{NL}|#{Escape})*'?/ # FIXME: buggy regexp
String = /#{String1}|#{String2}/
HexColor = /#(?:#{Hex}{6}|#{Hex}{3})/
Color = /#{HexColor}/
Num = /-?(?:[0-9]+|[0-9]*\.[0-9]+)/
Name = /#{NMChar}+/
Ident = /-?#{NMStart}#{NMChar}*/
AtKeyword = /@#{Ident}/
Percentage = /#{Num}%/
reldimensions = %w[em ex px]
absdimensions = %w[in cm mm pt pc]
Unit = Regexp.union(*(reldimensions + absdimensions))
Dimension = /#{Num}#{Unit}/
Comment = %r! /\* (?: .*? \*/ | .* ) !mx
Function = /(?:url|alpha)\((?:[^)\n\r\f]|\\\))*\)?/
Id = /##{Name}/
Class = /\.#{Name}/
PseudoClass = /:#{Name}/
AttributeSelector = /\[[^\]]*\]?/
end
def scan_tokens tokens, options
value_expected = nil
states = [:initial]
until eos?
kind = nil
match = nil
if scan(/\s+/)
kind = :space
elsif case states.last
when :initial, :media
if scan(/(?>#{RE::Ident})(?!\()|\*/ox)
kind = :type
elsif scan RE::Class
kind = :class
elsif scan RE::Id
kind = :constant
elsif scan RE::PseudoClass
kind = :pseudo_class
elsif match = scan(RE::AttributeSelector)
# TODO: Improve highlighting inside of attribute selectors.
tokens << [:open, :string]
tokens << [match[0,1], :delimiter]
tokens << [match[1..-2], :content] if match.size > 2
tokens << [match[-1,1], :delimiter] if match[-1] == ?]
tokens << [:close, :string]
next
elsif match = scan(/@media/)
kind = :directive
states.push :media_before_name
end
when :block
if scan(/(?>#{RE::Ident})(?!\()/ox)
if value_expected
kind = :value
else
kind = :key
end
end
when :media_before_name
if scan RE::Ident
kind = :type
states[-1] = :media_after_name
end
when :media_after_name
if scan(/\{/)
kind = :operator
states[-1] = :media
end
when :comment
if scan(/(?:[^*\s]|\*(?!\/))+/)
kind = :comment
elsif scan(/\*\//)
kind = :comment
states.pop
elsif scan(/\s+/)
kind = :space
end
else
raise_inspect 'Unknown state', tokens
end
elsif scan(/\/\*/)
kind = :comment
states.push :comment
elsif scan(/\{/)
value_expected = false
kind = :operator
states.push :block
elsif scan(/\}/)
value_expected = false
if states.last == :block || states.last == :media
kind = :operator
states.pop
else
kind = :error
end
elsif match = scan(/#{RE::String}/o)
tokens << [:open, :string]
tokens << [match[0, 1], :delimiter]
tokens << [match[1..-2], :content] if match.size > 2
tokens << [match[-1, 1], :delimiter] if match.size >= 2
tokens << [:close, :string]
next
elsif match = scan(/#{RE::Function}/o)
tokens << [:open, :string]
start = match[/^\w+\(/]
tokens << [start, :delimiter]
if match[-1] == ?)
tokens << [match[start.size..-2], :content]
tokens << [')', :delimiter]
else
tokens << [match[start.size..-1], :content]
end
tokens << [:close, :string]
next
elsif scan(/(?: #{RE::Dimension} | #{RE::Percentage} | #{RE::Num} )/ox)
kind = :float
elsif scan(/#{RE::Color}/o)
kind = :color
elsif scan(/! *important/)
kind = :important
elsif scan(/rgb\([^()\n]*\)?/)
kind = :color
elsif scan(/#{RE::AtKeyword}/o)
kind = :directive
elsif match = scan(/ [+>:;,.=()\/] /x)
if match == ':'
value_expected = true
elsif match == ';'
value_expected = false
end
kind = :operator
else
getch
kind = :error
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
tokens << [match, kind]
end
tokens
end
end
end
end
module CodeRay
module Scanners
class Diff < Scanner
register_for :diff
title 'diff output'
def scan_tokens tokens, options
line_kind = nil
state = :initial
until eos?
kind = match = nil
if match = scan(/\n/)
if line_kind
tokens << [:end_line, line_kind]
line_kind = nil
end
tokens << [match, :space]
next
end
case state
when :initial
if match = scan(/--- |\+\+\+ |=+|_+/)
tokens << [:begin_line, line_kind = :head]
tokens << [match, :head]
next unless match = scan(/.+/)
kind = :plain
elsif match = scan(/Index: |Property changes on: /)
tokens << [:begin_line, line_kind = :head]
tokens << [match, :head]
next unless match = scan(/.+/)
kind = :plain
elsif match = scan(/Added: /)
tokens << [:begin_line, line_kind = :head]
tokens << [match, :head]
next unless match = scan(/.+/)
kind = :plain
state = :added
elsif match = scan(/\\ /)
tokens << [:begin_line, line_kind = :change]
tokens << [match, :change]
next unless match = scan(/.+/)
kind = :plain
elsif scan(/(@@)((?>[^@\n]*))(@@)/)
tokens << [:begin_line, line_kind = :change]
tokens << [self[1], :change]
tokens << [self[2], :plain]
tokens << [self[3], :change]
next unless match = scan(/.+/)
kind = :plain
elsif match = scan(/\+/)
tokens << [:begin_line, line_kind = :insert]
tokens << [match, :insert]
next unless match = scan(/.+/)
kind = :plain
elsif match = scan(/-/)
tokens << [:begin_line, line_kind = :delete]
tokens << [match, :delete]
next unless match = scan(/.+/)
kind = :plain
elsif scan(/ .*/)
kind = :comment
elsif scan(/.+/)
tokens << [:begin_line, line_kind = :head]
kind = :plain
else
raise_inspect 'else case rached'
end
when :added
if match = scan(/ \+/)
tokens << [:begin_line, line_kind = :insert]
tokens << [match, :insert]
next unless match = scan(/.+/)
kind = :plain
else
state = :initial
next
end
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
tokens << [match, kind]
end
tokens << [:end_line, line_kind] if line_kind
tokens
end
end
end
end
module CodeRay
module Scanners
load :java
class Groovy < Java
include Streamable
register_for :groovy
# TODO: Check this!
GROOVY_KEYWORDS = %w[
as assert def in
]
KEYWORDS_EXPECTING_VALUE = WordList.new.add %w[
case instanceof new return throw typeof while as assert in
]
GROOVY_MAGIC_VARIABLES = %w[ it ]
IDENT_KIND = Java::IDENT_KIND.dup.
add(GROOVY_KEYWORDS, :keyword).
add(GROOVY_MAGIC_VARIABLES, :local_variable)
ESCAPE = / [bfnrtv$\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} /x # no 4-byte unicode chars? U[a-fA-F0-9]{8}
REGEXP_ESCAPE = / [bfnrtv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} | \d | [bBdDsSwW\/] /x
# TODO: interpretation inside ', ", /
STRING_CONTENT_PATTERN = {
"'" => /(?>\\[^\\'\n]+|[^\\'\n]+)+/,
'"' => /[^\\$"\n]+/,
"'''" => /(?>[^\\']+|'(?!''))+/,
'"""' => /(?>[^\\$"]+|"(?!""))+/,
'/' => /[^\\$\/\n]+/,
}
def scan_tokens tokens, options
state = :initial
inline_block_stack = []
inline_block_paren_depth = nil
string_delimiter = nil
import_clause = class_name_follows = last_token = after_def = false
value_expected = true
until eos?
kind = nil
match = nil
case state
when :initial
if match = scan(/ \s+ | \\\n /x)
tokens << [match, :space]
if match.index ?\n
import_clause = after_def = false
value_expected = true unless value_expected
end
next
elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx)
value_expected = true
after_def = false
kind = :comment
elsif bol? && scan(/ \#!.* /x)
kind = :doctype
elsif import_clause && scan(/ (?!as) #{IDENT} (?: \. #{IDENT} )* (?: \.\* )? /ox)
after_def = value_expected = false
kind = :include
elsif match = scan(/ #{IDENT} | \[\] /ox)
kind = IDENT_KIND[match]
value_expected = (kind == :keyword) && KEYWORDS_EXPECTING_VALUE[match]
if last_token == '.'
kind = :ident
elsif class_name_follows
kind = :class
class_name_follows = false
elsif after_def && check(/\s*[({]/)
kind = :method
after_def = false
elsif kind == :ident && last_token != '?' && check(/:/)
kind = :key
else
class_name_follows = true if match == 'class' || (import_clause && match == 'as')
import_clause = match == 'import'
after_def = true if match == 'def'
end
elsif scan(/;/)
import_clause = after_def = false
value_expected = true
kind = :operator
elsif scan(/\{/)
class_name_follows = after_def = false
value_expected = true
kind = :operator
if !inline_block_stack.empty?
inline_block_paren_depth += 1
end
# TODO: ~'...', ~"..." and ~/.../ style regexps
elsif match = scan(/ \.\.<? | \*?\.(?!\d)@? | \.& | \?:? | [,?:(\[] | -[->] | \+\+ |
&& | \|\| | \*\*=? | ==?~ | <=?>? | [-+*%^~&|>=!]=? | <<<?=? | >>>?=? /x)
value_expected = true
value_expected = :regexp if match == '~'
after_def = false
kind = :operator
elsif match = scan(/ [)\]}] /x)
value_expected = after_def = false
if !inline_block_stack.empty? && match == '}'
inline_block_paren_depth -= 1
if inline_block_paren_depth == 0 # closing brace of inline block reached
tokens << [match, :inline_delimiter]
tokens << [:close, :inline]
state, string_delimiter, inline_block_paren_depth = inline_block_stack.pop
next
end
end
elsif check(/[\d.]/)
after_def = value_expected = false
if scan(/0[xX][0-9A-Fa-f]+/)
kind = :hex
elsif scan(/(?>0[0-7]+)(?![89.eEfF])/)
kind = :oct
elsif scan(/\d+[fFdD]|\d*\.\d+(?:[eE][+-]?\d+)?[fFdD]?|\d+[eE][+-]?\d+[fFdD]?/)
kind = :float
elsif scan(/\d+[lLgG]?/)
kind = :integer
end
elsif match = scan(/'''|"""/)
after_def = value_expected = false
state = :multiline_string
tokens << [:open, :string]
string_delimiter = match
kind = :delimiter
# TODO: record.'name'
elsif match = scan(/["']/)
after_def = value_expected = false
state = match == '/' ? :regexp : :string
tokens << [:open, state]
string_delimiter = match
kind = :delimiter
elsif value_expected && (match = scan(/\//))
after_def = value_expected = false
tokens << [:open, :regexp]
state = :regexp
string_delimiter = '/'
kind = :delimiter
elsif scan(/ @ #{IDENT} /ox)
after_def = value_expected = false
kind = :annotation
elsif scan(/\//)
after_def = false
value_expected = true
kind = :operator
else
getch
kind = :error
end
when :string, :regexp, :multiline_string
if scan(STRING_CONTENT_PATTERN[string_delimiter])
kind = :content
elsif match = scan(state == :multiline_string ? /'''|"""/ : /["'\/]/)
tokens << [match, :delimiter]
if state == :regexp
# TODO: regexp modifiers? s, m, x, i?
modifiers = scan(/[ix]+/)
tokens << [modifiers, :modifier] if modifiers && !modifiers.empty?
end
state = :string if state == :multiline_string
tokens << [:close, state]
string_delimiter = nil
after_def = value_expected = false
state = :initial
next
elsif (state == :string || state == :multiline_string) &&
(match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox))
if string_delimiter[0] == ?' && !(match == "\\\\" || match == "\\'")
kind = :content
else
kind = :char
end
elsif state == :regexp && scan(/ \\ (?: #{REGEXP_ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
kind = :char
elsif match = scan(/ \$ #{IDENT} /mox)
tokens << [:open, :inline]
tokens << ['$', :inline_delimiter]
match = match[1..-1]
tokens << [match, IDENT_KIND[match]]
tokens << [:close, :inline]
next
elsif match = scan(/ \$ \{ /x)
tokens << [:open, :inline]
tokens << ['${', :inline_delimiter]
inline_block_stack << [state, string_delimiter, inline_block_paren_depth]
inline_block_paren_depth = 1
state = :initial
next
elsif scan(/ \$ /mx)
kind = :content
elsif scan(/ \\. /mx)
kind = :content
elsif scan(/ \\ | \n /x)
tokens << [:close, state]
kind = :error
after_def = value_expected = false
state = :initial
else
raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
end
else
raise_inspect 'Unknown state', tokens
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
last_token = match unless [:space, :comment, :doctype].include? kind
tokens << [match, kind]
end
if [:multiline_string, :string, :regexp].include? state
tokens << [:close, state]
end
tokens
end
end
end
end
module CodeRay
module Scanners
class Java < Scanner
include Streamable
register_for :java
helper :builtin_types
# http://java.sun.com/docs/books/tutorial/java/nutsandbolts/_keywords.html
KEYWORDS = %w[
assert break case catch continue default do else
finally for if instanceof import new package
return switch throw try typeof while
debugger export
]
RESERVED = %w[ const goto ]
CONSTANTS = %w[ false null true ]
MAGIC_VARIABLES = %w[ this super ]
TYPES = %w[
boolean byte char class double enum float int interface long
short void
] << '[]' # String[] should be highlighted as a type
DIRECTIVES = %w[
abstract extends final implements native private protected public
static strictfp synchronized throws transient volatile
]
IDENT_KIND = WordList.new(:ident).
add(KEYWORDS, :keyword).
add(RESERVED, :reserved).
add(CONSTANTS, :pre_constant).
add(MAGIC_VARIABLES, :local_variable).
add(TYPES, :type).
add(BuiltinTypes::List, :pre_type).
add(BuiltinTypes::List.select { |builtin| builtin[/(Error|Exception)$/] }, :exception).
add(DIRECTIVES, :directive)
ESCAPE = / [bfnrtv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
STRING_CONTENT_PATTERN = {
"'" => /[^\\']+/,
'"' => /[^\\"]+/,
'/' => /[^\\\/]+/,
}
IDENT = /[a-zA-Z_][A-Za-z_0-9]*/
def scan_tokens tokens, options
state = :initial
string_delimiter = nil
import_clause = class_name_follows = last_token_dot = false
until eos?
kind = nil
match = nil
case state
when :initial
if match = scan(/ \s+ | \\\n /x)
tokens << [match, :space]
next
elsif match = scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx)
tokens << [match, :comment]
next
elsif import_clause && scan(/ #{IDENT} (?: \. #{IDENT} )* /ox)
kind = :include
elsif match = scan(/ #{IDENT} | \[\] /ox)
kind = IDENT_KIND[match]
if last_token_dot
kind = :ident
elsif class_name_follows
kind = :class
class_name_follows = false
else
import_clause = true if match == 'import'
class_name_follows = true if match == 'class' || match == 'interface'
end
elsif scan(/ \.(?!\d) | [,?:()\[\]}] | -- | \+\+ | && | \|\| | \*\*=? | [-+*\/%^~&|<>=!]=? | <<<?=? | >>>?=? /x)
kind = :operator
elsif scan(/;/)
import_clause = false
kind = :operator
elsif scan(/\{/)
class_name_follows = false
kind = :operator
elsif check(/[\d.]/)
if scan(/0[xX][0-9A-Fa-f]+/)
kind = :hex
elsif scan(/(?>0[0-7]+)(?![89.eEfF])/)
kind = :oct
elsif scan(/\d+[fFdD]|\d*\.\d+(?:[eE][+-]?\d+)?[fFdD]?|\d+[eE][+-]?\d+[fFdD]?/)
kind = :float
elsif scan(/\d+[lL]?/)
kind = :integer
end
elsif match = scan(/["']/)
tokens << [:open, :string]
state = :string
string_delimiter = match
kind = :delimiter
elsif scan(/ @ #{IDENT} /ox)
kind = :annotation
else
getch
kind = :error
end
when :string
if scan(STRING_CONTENT_PATTERN[string_delimiter])
kind = :content
elsif match = scan(/["'\/]/)
tokens << [match, :delimiter]
tokens << [:close, state]
string_delimiter = nil
state = :initial
next
elsif state == :string && (match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox))
if string_delimiter == "'" && !(match == "\\\\" || match == "\\'")
kind = :content
else
kind = :char
end
elsif scan(/\\./m)
kind = :content
elsif scan(/ \\ | $ /x)
tokens << [:close, :delimiter]
kind = :error
state = :initial
else
raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
end
else
raise_inspect 'Unknown state', tokens
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
last_token_dot = match == '.'
tokens << [match, kind]
end
if state == :string
tokens << [:close, state]
end
tokens
end
end
end
end
module CodeRay
module Scanners
class JSON < Scanner
include Streamable
register_for :json
file_extension 'json'
KINDS_NOT_LOC = [
:float, :char, :content, :delimiter,
:error, :integer, :operator, :value,
]
CONSTANTS = %w( true false null )
IDENT_KIND = WordList.new(:key).add(CONSTANTS, :value)
ESCAPE = / [bfnrt\\"\/] /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} /x
def scan_tokens tokens, options
state = :initial
stack = []
string_delimiter = nil
key_expected = false
until eos?
kind = nil
match = nil
case state
when :initial
if match = scan(/ \s+ | \\\n /x)
tokens << [match, :space]
next
elsif match = scan(/ [:,\[{\]}] /x)
kind = :operator
case match
when '{' then stack << :object; key_expected = true
when '[' then stack << :array
when ':' then key_expected = false
when ',' then key_expected = true if stack.last == :object
when '}', ']' then stack.pop # no error recovery, but works for valid JSON
end
elsif match = scan(/ true | false | null /x)
kind = IDENT_KIND[match]
elsif match = scan(/-?(?:0|[1-9]\d*)/)
kind = :integer
if scan(/\.\d+(?:[eE][-+]?\d+)?|[eE][-+]?\d+/)
match << matched
kind = :float
end
elsif match = scan(/"/)
state = key_expected ? :key : :string
tokens << [:open, state]
kind = :delimiter
else
getch
kind = :error
end
when :string, :key
if scan(/[^\\"]+/)
kind = :content
elsif scan(/"/)
tokens << ['"', :delimiter]
tokens << [:close, state]
state = :initial
next
elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
kind = :char
elsif scan(/\\./m)
kind = :content
elsif scan(/ \\ | $ /x)
tokens << [:close, :delimiter]
kind = :error
state = :initial
else
raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
end
else
raise_inspect 'Unknown state', tokens
end
match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match
tokens << [match, kind]
end
if [:string, :key].include? state
tokens << [:close, state]
end
tokens
end
end
end
end
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment