some text this is bold! more text
" - # # The element 'p' has two text elements, "some text " and " more text". - # doc.root.text #-> "some text " - def text( path = nil ) - rv = get_text(path) - return rv.value unless rv.nil? - nil - end - - # Returns the first child Text node, if any, or +nil+ otherwise. - # This method returns the actual +Text+ node, rather than the String content. - # doc = Document.new "some text this is bold! more text
" - # # The element 'p' has two text elements, "some text " and " more text". - # doc.root.get_text.value #-> "some text " - def get_text path = nil - rv = nil - if path - element = @elements[ path ] - rv = element.get_text unless element.nil? - else - rv = @children.find { |node| node.kind_of? Text } - end - return rv - end - - # Sets the first Text child of this object. See text() for a - # discussion about Text children. - # - # If a Text child already exists, the child is replaced by this - # content. This means that Text content can be deleted by calling - # this method with a nil argument. In this case, the next Text - # child becomes the first Text child. In no case is the order of - # any siblings disturbed. - # text:: - # If a String, a new Text child is created and added to - # this Element as the first Text child. If Text, the text is set - # as the first Child element. If nil, then any existing first Text - # child is removed. - # Returns:: this Element. - # doc = Document.new '' - # doc.root.text = 'Sean' #-> 'Sean' - # doc.root.text = 'Elliott' #-> 'Elliott' - # doc.root.add_element 'c' #-> 'ElliottREXML is a conformant XML processor for the Ruby programming - language. REXML passes 100% of the Oasis non-validating tests and - includes full XPath support. It is reasonably fast, and is implemented - in pure Ruby. Best of all, it has a clean, intuitive API. REXML is - included in the standard library of Ruby
- -This software is distribute under the Ruby - license.
-REXML arose out of a desire for a straightforward XML API, and is an - attempt at an API that doesn't require constant referencing of - documentation to do common tasks. "Keep the common case simple, and the - uncommon, possible."
- -REXML avoids The DOM API, which violates the maxim of simplicity. It - does provide a DOM model, but one that is Ruby-ized. It is an - XML API oriented for Ruby programmers, not for XML programmers coming - from Java.
- -Some of the common differences are that the Ruby API relies on block - enumerations, rather than iterators. For example, the Java code:
- -in Ruby becomes:
- -Can't you feel the peace and contentment in this block of code? Ruby - is the language Buddha would have programmed in.
- -One last thing. If you use and like this software, and you're in a - position of power in a company in Western Europe and are looking for a - software architect or developer, drop me a line. I took a lot of French - classes in college (all of which I've forgotten), and I lived in Munich - long enough that I was pretty fluent by the time I left, and I'd love to - get back over there.
-You don't have to install anything; if you're running a
- version of Ruby greater than 1.8, REXML is included. However, if you
- choose to upgrade from the REXML distribution, run the command:
- ruby bin/install.rb
. By the way, you really should look at
- these sorts of files before you run them as root. They could contain
- anything, and since (in Ruby, at least) they tend to be mercifully
- short, it doesn't hurt to glance over them. If you want to uninstall
- REXML, run ruby bin/install.rb -u
.
If you have Test::Unit installed, you can run the unit test cases.
- Run the command: ruby bin/suite.rb
; it runs against the
- distribution, not against the installed version.
There is a benchmark suite in benchmarks/
. To run the
- benchmarks, change into that directory and run ruby
- comparison.rb
. If you have nothing else installed, only the
- benchmarks for REXML will be run. However, if you have any of the
- following installed, benchmarks for those tools will also be run:
EXML.jar
into the
- benchmarks
directory and compile
- flatbench.java
before running the test)The results will be written to index.html
.
Please see the Tutorial.
- -The API documentation is available on-line, - or it can be downloaded as an archive in - tgz format (~70Kb) or (if you're a masochist) in - zip format (~280Kb). The best solution is to download and install - Dave Thomas' most excellent rdoc and generate the API docs - yourself; then you'll be sure to have the latest API docs and won't have - to keep downloading the doc archive.
- -The unit tests in test/
and the benchmarking code in
- benchmark/
provide additional examples of using REXML. The
- Tutorial provides examples with commentary. The documentation unpacks
- into rexml/doc
.
Kouhei Sutou maintains a Japanese - version of the REXML API docs. Kou's - documentation page contains links to binary archives for various - versions of the documentation.
-Unfortunately, NQXML is the only package REXML can be compared - against; XMLParser uses expat, which is a native library, and really is - a different beast altogether. So in comparing NQXML and REXML you can - look at four things: speed, size, completeness, and API.
- -Benchmarks
- -REXML is faster than NQXML in some things, and slower than NQXML in a
- couple of things. You can see this for yourself by running the supplied
- benchmarks. Most of the places where REXML are slower are because of the
- convenience methodselement.elements[index]
isn't really an array operation;
- index can be an Integer or an XPath, and this feature is relatively time
- expensive.
The sizes of the XML parsers are closeruby -nle 'print unless /^\s*(#.*|)$/' *.rb | wc -l
-
REXML is a conformant XML 1.0 parser. It supports multiple language - encodings, and internal processing uses the required UTF-8 and UTF-16 - encodings. It passes 100% of the Oasis non-validating tests. - Furthermore, it provides a full implementation of XPath, a SAX2 and a - PullParser API.
-As of release 2.0, XPath 1.0 is fully implemented.
- -I fully expect bugs to crop up from time to time, so if you see any - bogus XPath results, please let me know. That said, since I'm now - following the XPath grammar and spec fairly closely, I suspect that you - won't be surprised by REXML's XPath very often, and it should become - rock solid fairly quickly.
- -Check the "bugs" section for known problems; there are little bits of - XPath here and there that are not yet implemented, but I'll get to them - soon.
- -Namespace support is rather odd, but it isn't my fault. I can only do - so much and still conform to the specs. In particular, XPath attempts to - help as much as possible. Therefore, in the trivial cases, you can pass - namespace prefixes to Element.elements[...] and so on -- in these cases, - XPath will use the namespace environment of the base element you're - starting your XPath search from. However, if you want to do something - more complex, like pass in your own namespace environment, you have to - use the XPath first(), each(), and match() methods. Also, default - namespaces force you to use the XPath methods, rather than the - convenience methods, because there is no way for XPath to know what the - mappings for the default namespaces should be. This is exactly why I - loath namespaces -- a pox on the person(s) who thought them up!
-Namespace support is now fairly stable. One thing to be aware of is - that REXML is not (yet) a validating parser. This means that some - invalid namespace declarations are not caught.
-There is a low-volume mailing list dedicated to REXML. To subscribe, - send an empty email to ser-rexml-subscribe@germane-software.com. - This list is more or less spam proof. To unsubscribe, similarly send a - message to ser-rexml-unsubscribe@germane-software.com.
-An RSS - file for REXML is now being generated from the change log. This - allows you to be alerted of bug fixes and feature additions via "pull". - Another - RSS is available which contains a single item: the release notice - for the most recent release. This is an abuse of the RSS - mechanism, which was intended to be a distribution system for headlines - linked back to full articles, but it works. The headline for REXML is - the version number, and the description is the change log. The links all - link back to the REXML home page. The URL for the RSS itself is - http://www.germane-software.com/software/rexml/rss.xml.
- -The changelog itself is here.
- -For those who are interested, there's a SLOCCount (by David A. Wheeler) file - with stats on the REXML sourcecode. Note that the SLOCCount output - includes the files in the test/, benchmarks/, and bin/ directories, as - well as the main sourcecode for REXML itself.
-You can submit bug reports and feature requests, and view the list of - known bugs, at the REXML bug report - page. Please do submit bug reports. If you really want your bug - fixed fast, include an runit or Test::Unit method (or methods) that - illustrates the problem. At the very least, send me some XML that REXML - doesn't process properly.
- -You don't have to send an entire test suite -- just the unit test - methods. If you don't send me a unit test, I'll have to write one - myself, which will mean that your bug will take longer to fix.
- -When submitting bug reports, please include the version of Ruby and
- of REXML that you're using, and the operating system you're running on.
- Just run: ruby -vrrexml/rexml -e 'p
- REXML::VERSION,PLATFORM'
and paste the results in your bug
- report. Include your email if you want a response about the bug.
REXML is hanging while parsing one of my XML files.- - Your XML is probably malformed. Some malformed XML, especially XML that - contains literal '<' embedded in the document, causes REXML to hang. - REXML should be throwing an exception, but it doesn't; this is a bug. I'm - aware that it is an extremely annoying bug, and it is one I'm trying to - solve in a way that doesn't significantly reduce REXML's parsing - speed. - -
I'm using the XPath '//foo' on an XML branch node X, and keep getting - all of the 'foo' elements in the entire document. Why? Shouldn't it return - only the 'foo' element descendants of X?- - No. XPath specifies that '/' returns the document root, regardless of - the context node. '//' also starts at the document root. If you want to - limit your search to a branch, you need to use the self:: axe. EG, - 'self::node()//foo', or the shorthand './/foo'. - -
I want to parse a document both as a tree, and as a stream. Can I do - this?- - Yes, and no. There is no mechanism that directly supports this in - REXML. However, aside from writing your own traversal layer, there is a - way of doing this. To turn a tree into a stream, just turn the branch you - want to process as a stream back into a string, and re-parse it with your - preferred API. EG: pp = PullParser.new( some_element.to_s ). The other - direction is more difficult; you basically have to build a tree from the - events. REXML will have one of these builders, eventually, but it doesn't - currently exist. - -
Why is Element.elements indexed off of '1' instead of '0'?- - Because of XPath. The XPath specification states that the index of the - first child node is '1'. Although it may be counter-intuitive to base - elements on 1, it is more undesireable to have element.elements[0] == - element.elements[ 'node()[1]' ]. Since I can't change the XPath - specification, the result is that Element.elements[1] is the first child - element. - -
Why isn't REXML a validating parser?- - Because validating parsers must include code that parses and interprets - DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and - even that isn't complete. There is DTD parsing code in the works, but I - only work on it when I'm really, really bored. Rumor has it that a - contributor is working on a DTD parser for REXML; rest assured that any - such contribution will be included with REXML as soon as it is - available. - -
I'm trying to create an ISO-8859-1 document, but when I add text to the - document it isn't being properly encoded.- - Regardless of what the encoding of your document is, when you add text - programmatically to a REXML document you must ensure that you are - only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1 - encoded text that contains characters above 0x80 to REXML trees -- you - must convert it to UTF-8 before doing so. Luckily, this is easy: -
text.unpack('C*').pack('U*')
will do the trick. 7-bit ASCII
- is identical to UTF-8, so you probably won't need to worry about this.
-
- How do I get the tag name of an Element?- - You take a look at the APIs, and notice that
Element
- includes Namespace
. Then you click on the
- Namespace
link and look at the methods that
- Element
includes from Namespace
. One of these is
- name()
. Another is expanded_name()
. Yet another
- is prefix()
. Then, you email the author of rdoc and ask him
- to extend rdoc so that it lists methods in the API that are included from
- other files, so that you don't have to do all of that looking around for
- your method.
- I've had help from a number of resources; if I haven't listed you here, - it means that I just haven't gotten around to adding you, or that I'm a - dork and have forgotten. In either case, feel free to write me and - complain.
- -뤤ϡWindowsȤνʤ襤 ;-(
-[ܸ / English]
- - - -ΥڡǤϡmswin32rubyۤѹΤΤ餻ԤäƤޤ
-̤˸ڡǤʤǤʤơ䤬˽ƤڡǤǤץࡦ(̵)ˤĤƤϡƼȽǤǤѤ
-䤤碌ءְäƤ¾οͤǤ褦ʤȤϤʤǤ͡
mswin32rubyȤϡ32bitWindows(Windows95Windows98WindowsMeWindows NTWindows 2000WindowsXPWindows 2003 ServerʲWindowsɽ)ưRubyΥХʥΰĤǤ
-WindowsưRubyȤƤϡߡ5ΥХʥ꤬¸ߤޤϤ줾mswin32ǡcygwinǡmingw32ǡbccwin32ǡdjgppǤȸƤФƤޤ
-줾ΰ㤤ޤȤȰʲΤ褦ˤʤޤ(¸ǧФΤ餻)
VC++ǥѥ뤵롣Windows鸫ФäȤ̡פΥХʥȸ뤬ȿ̡RubyUNIXħŪʵǽΰѤǤʤ1.7.3ʹߤmingw32Ǥȳĥ饤֥ˤĤƤϥХʥߴ롣
-RUBY_PLATFORM*-mswin32
gccǥѥ뤵졢cygwinĶư롣cygwinĶUNIX饤ʴĶWindowsǹۤΤǤΤǡcygwinrubyϰ̤UNIXѤΤΤȤƱ褦ư(ȤԤǤ)
-RUBY_PLATFORM*-cygwin
gccǥѥ뤵롣ϤۤȤmswin32Ǥȶ̤Ǥꡢ饤֥ⶦ(MSVCRT.dll)ʤΤǡư(餯)mswin32ǤȤۤƱ1.7.3ʹߤmswin32Ǥȳĥ饤֥ˤĤƤϥХʥߴ롣
-RUBY_PLATFORM*-mingw32
BC++ǥѥ뤵롣Ϥʤʬmswin32Ǥȶ̤ǤϤ뤬饤֥꤬ۤʤΤǡ٤Ȥǵưmswin32ǤȤϰۤʤ(Ϥ)1.7ʹߤǥݡȤ롣
-RUBY_PLATFORM*-bccwin32
DJGPPǥѥ뤵롣DOSѤΥХʥʤΤǡDOSǤư롣ȿ̡WindowsˤäDOSˤʤǽ¿Ȥʤ(ͥåȥϢʤ)
-RUBY_PLATFORM*-msdosdjgpp
ΥڡǤϡ嵭Τmswin32ǤΤߤäƤޤ
-ʤcygwinǡmingw32ǡdjgppǤˤĤƤϤ錄ʤ٤Ruby binariesǽǤޤbccwin32ǤˤĤƤϾRubyǽǤ
ƤΥХʥVC++ 5.0(Version 11.00.7022 for 80x86)makeΤǤrubyΤ˴ؤƤϡɸۤΥ(ޤCVSΥ)餽ΤޤƤޤĥ饤֥ˤĤƤϳơΥȤƤ
-ΥХʥzipǥ֤Ƥޤ
md5sumΥåˡǤ㤨rubyȡ뤵Ƥʤ鲼Τ褦ˡޤ
-ruby -r md5 -e "puts MD5.new(File.open('filename', 'rb').read).hexdigest"
嵭ΥХʥȡ뤹ϡߤΥǥ쥯ȥ(ʲ$TOPDIR
ȵ)ŸƤǥ쥯ȥդǰ̤ƤޤΤǡŸˤϥǥ쥯ȥդŸΤ˺줺(̣狼ʤͤϵˤʤƤǤ)
-Ÿϡ$TOPDIR\bin
PATH
̤ƤƤ
ʤʲγĥ饤֥ϡʪ˴ޤޤʤΥ饤֥˰¸Ƥޤ
-嵭ΤPDCursesGDBMOpenSSLreadlineZlibˤĤƤϡPorting Libraries to Win32˥Хʥ꤬¸ߤޤ
-IconvˤĤƤϡMeadowy.orgۤƤiconv-1.8.win32.zipѤƤޤ
-Tcl/TkˤĤƤϡActiveStateۤƤActiveTclѤƤޤ
ޤƤǤȤޤä٤!
-ruby-1.8.1-20040127ruby-1.9.0-20040126֤ޤ
-Ԥϼ1.8.1Ǥȯ줿ԶνǤԤϳȯǡ
ꥹޥ! ruby-1.8.1ޤ!
-(previewФ㤤ޤʤ...)
ruby-1.8.1-preview3֤ޤפΤ֤ۤ㤤ޤ͡
- -ruby-1.8.1-preview2֤ޤ
- -ruby-1.8.1-20031027֤ޤ
-racc-1.4.4-all֤ޤ
eruby-1.0.4vrswin-030906vruby-030906֤ޤ
- -ruby-1.8.0-20030812֤ޤ
-vrswin-030811vruby-030811֤ޤ֤ĥ饤֥1.8Ѥˤʤޤ
- -ruby-1.8.0֤ޤRuby 1.8ϺǽΥȤʤޤ
-1.6ϤѹˤĤƤϡchanges.1.8.0ʤɤ
ruby-1.8.0-preview6֤ޤȡޤƤ2previewФͤǤ :)
- -ruby-1.8.0-preview5֤ޤ餯줬1.8.0κǸpreviewˤʤǤ礦
-ruby-1.6.8-20030727֤ޤ
-ruby 1.9.0 (2004-01-13)
-ERb 2.0.4
-RDtool 0.6.11
-rublog 0.0.2
-
diff --git a/test/rexml/data/jaxen3.xml b/test/rexml/data/jaxen3.xml deleted file mode 100644 index a87723a3b9..0000000000 --- a/test/rexml/data/jaxen3.xml +++ /dev/null @@ -1,15 +0,0 @@ - -
Text placed in the public domain by Moby Lexical Tools, 1992.
-SGML markup by Jon Bosak, 1992-1994.
-XML version by Jon Bosak, 1996-1998.
-This work may be freely copied and distributed worldwide.
-Despite the uncertain legality of the Napster online music-sharing service, the number of people -using it more than quadrupled in just five months, Media Metrix said Monday.
-That made Napster the fastest-growing software application ever recorded by the Internet research -company.
-From 1.1 million home users in the United States in February, the first month Media Metrix -tracked the application, Napster use rocketed to 4.9 million users in July.
-That represents 6 percent of U.S. home PC users who have modems, said Media Metrix, which pays -people to install monitoring software on their computers.
-It estimates total usage from a panel of about 50,000 people in the United States.
-Napster was also used at work by 887,000 people in July, Media Metrix said.
-Napster Inc. has been sued by the recording industry for allegedly enabling copyright -infringement. The federal government weighed in on the case Friday, saying the service is not protected -under a key copyright law, as the San Mateo, Calif., company claims.
-Bruce Ryon, head of Media Metrix's New Media Group, said Napster was used by "the full spectrum of PC users, not just the youth with time on their hands and a passion for music."
-The Napster program allows users to copy digital music files from the hard drives of other -users over the Internet.
-Napster Inc. said last week that 28 million people had downloaded its program. It does not reveal -its own figures for how many people actually use the software.
-Because the program connects to the company's computers over the Internet every time -it is run, Napster Inc. can track usage exactly.
-__
-On the Net:
- - -This is a tutorial for using REXML, - a pure Ruby XML processor.
-REXML was inspired by the Electric XML library for Java, which - features an easy-to-use API, small size, and speed. Hopefully, REXML, - designed with the same philosophy, has these same features. I've tried - to keep the API as intuitive as possible, and have followed the Ruby - methodology for method naming and code flow, rather than mirroring the - Java API.
- -REXML supports both tree and stream document parsing. Stream parsing - is faster (about 1.5 times as fast). However, with stream parsing, you - don't get access to features such as XPath.
- -The API documentation also - contains code snippits to help you learn how to use various methods. - This tutorial serves as a starting point and quick guide to using - REXML.
- -We'll start with parsing an XML document
- -Line 3 creates a new document and parses the supplied file. You can - also do the following
- -So parsing a string is just as easy as parsing a file. For future
- examples, I'm going to omit both the require
and
- include
lines.
Once you have a document, you can access elements in that document - in a number of ways:
- -Element
class itself has
- each_element_with_attribute
, a common way of accessing
- elements.Element.elements
is an
- Elements
class instance which has the each
- and []
methods for accessing elements. Both methods can
- be supplied with an XPath for filtering, which makes them very
- powerful.Element
is a subclass of Parent, you can
- also access the element's children directly through the Array-like
- methods Element[], Element.each, Element.find,
- Element.delete
. This is the fastest way of accessing
- children, but note that, being a true array, XPath searches are not
- supported, and that all of the element children are contained in
- this array, not just the Element children.Here are a few examples using these methods. First is the source - document used in the examples. Save this as mydoc.xml before running - any of the examples that require it:
- -Notice the second-to-last line of code. Element children in REXML
- are indexed starting at 1, not 0. This is because XPath itself counts
- elements from 1, and REXML maintains this relationship; IE,
- root.elements['*[1]'] == root.elements[1]
. The last line
- finds the first child element with the name of "food". As you can see
- in this example, accessing attributes is also straightforward.
You can also access xpaths directly via the XPath class.
- -Another way of getting an array of matching nodes is through - Element.elements.to_a(). Although this is a method on elements, if - passed an XPath it can return an array of arbitrary objects. This is - due to the fact that XPath itself can return arbitrary nodes - (Attribute nodes, Text nodes, and Element nodes).
- -REXML attempts to make the common case simple, but this means that - the uncommon case can be complicated. This is especially true with - Text nodes.
- -Text nodes have a lot of behavior, and in the case of internal - entities, what you get may be different from what you expect. When - REXML reads an XML document, in parses the DTD and creates an internal - table of entities. If it finds any of these entities in the document, - it replaces them with their values:
- -When you write the document back out, REXML replaces the values - with the entity reference:
- -But there's a problem. What happens if only some of the words are - also entity reference values?
- -Well, REXML does the only thing it can:
- -This is probably not what you expect. However, when designing - REXML, I had a choice between this behavior, and using immutable text - nodes. The problem is that, if you can change the text in a node, - REXML can never tell which tokens you want to have replaced with - entities. There is a wrinkle: REXML will write what it gets in as long - as you don't access the text. This is because REXML does lazy - evaluation of entities. Therefore,
- -There is a programmatic solution: :raw
. If you set the
- :raw
flag on any Text or Element node, the entities
- within that node will not be processed. This means that you'll have to
- deal with entities yourself:
Again, there are a couple of mechanisms for creating XML documents - in REXML. Adding elements by hand is faster than the convenience - method, but which you use will probably be a matter of aesthetics.
- -If you want to add text to an element, you can do it by either
- creating Text objects and adding them to the element, or by using the
- convenience method text=
But note that each of these text objects are still stored as
- separate objects; el1.text
will return "Hello world!";
- el1[2]
will return a Text object with the contents
- "Goodbye".
Please be aware that all text nodes in REXML are UTF-8 encoded, and - all of your code must reflect this. You may input and output other - encodings (UTF-8, UTF-16, ISO-8859-1, and UNILE are all supported, - input and output), but within your program, you must pass REXML UTF-8 - strings.
- -I can't emphasize this enough, because people do have problems with
- this. REXML can't possibly alway guess correctly how your text is
- encoded, so it always assumes the text is UTF-8. It also does not warn
- you when you try to add text which isn't properly encoded, for the
- same reason. You must make sure that you are adding UTF-8 text.
- If you're adding standard 7-bit ASCII, which is most common, you
- don't have to worry. If you're using ISO-8859-1 text (characters
- above 0x80), you must convert it to UTF-8 before adding it to an
- element. You can do this with the shard:
- text.unpack("C*").pack("U*")
. If you ignore this warning
- and add 8-bit ASCII characters to your documents, your code may
- work... or it may not. In either case, REXML is not at fault.
- You have been warned.
One last thing: alternate encoding output support only works from - Document.write() and Document.to_s(). If you want to write out other - nodes with a particular encoding, you must wrap your output object - with Output:
- -You can pass Output any of the supported encodings.
- -If you want to insert an element between two elements, you can use
- either the standard Ruby array notation, or
- Parent.insert_before
and
- Parent.insert_after
.
The raw
flag in the Text
constructor can
- be used to tell REXML to leave strings which have entities defined for
- them alone.
Note that, in all cases, the value()
method returns
- the text with entities expanded, so the raw
flag only
- affects the to_s()
method. If the raw
is set
- for a text node, then to_s()
will not entities will not
- normalize (turn into entities) entity values. You can not create raw
- text nodes that contain illegal XML, so the following will generate a
- parse error:
You can also tell REXML to set the Text children of given elements - to raw automatically, on parsing or creating:
- -In this example, all tags named "tag1", "tag2", or "tag3" will have - any Text children set to raw text. If you want to have all of the text - processed as raw text, pass in the :all tag:
- -There aren't many things that are more simple than writing a REXML
- tree. Simply pass an object that supports <<( String
- )
to the write
method of any object. In Ruby, both
- IO instances (File) and String instances support <<.
If you want REXML to pretty-print output, pass write()
- an indent value greater than -1:
REXML will not, by default, write out the XML declaration unless
- you specifically ask for them. If a document is read that contains an
- XML declaration, that declaration
There are four main methods of iterating over children.
- Element.each
, which iterates over all the children;
- Element.elements.each
, which iterates over just the child
- Elements; Element.next_element
and
- Element.previous_element
, which can be used to fetch the
- next Element siblings; and Element.next_sibling
and
- Eleemnt.previous_sibling
, which fetches the next and
- previous siblings, regardless of type.
REXML stream parsing requires you to supply a Listener class. When - REXML encounters events in a document (tag start, text, etc.) it - notifies your listener class of the event. You can supply any subset - of the methods, but make sure you implement method_missing if you - don't implement them all. A StreamListener module has been supplied as - a template for you to use.
- -Stream parsing in REXML is much like SAX, where events are
- generated when the parser encounters them in the process of parsing
- the document. When a tag is encountered, the stream listener's
- tag_start()
method is called. When the tag end is
- encountered, tag_end()
is called. When text is
- encountered, text()
is called, and so on, until the end
- of the stream is reached. One other note: the method
- entity()
is called when an &entity;
is
- encountered in text, and only then.
Please look at the StreamListener
- API for more information.
By default, REXML respects whitespace in your document. In many - applications, you want the parser to compress whitespace in your - document. In these cases, you have to tell the parser which elements - you want to respect whitespace in by passing a context to the - parser:
- -Whitespace for tags "tag1", "tag2", and "tag3" will be compressed; - all other tags will have their whitespace respected. Like :raw, you - can set :compress_whitespace to :all, and have all elements have their - whitespace compressed.
- -You may also use the tag :respect_whitespace
, which
- flip-flops the behavior. If you use :respect_whitespace
- for one or more tags, only those elements will have their whitespace
- respected; all other tags will have their whitespace compressed.
REXML does some automatic processing of entities for your - convenience. The processed entities are &, <, >, ", and '. - If REXML finds any of these characters in Text or Attribute values, it - automatically turns them into entity references when it writes them - out. Additionally, when REXML finds any of these entity references in - a document source, it converts them to their character equivalents. - All other entity references are left unprocessed. If REXML finds an - &, <, or > in the document source, it will generate a - parsing error.
- -Namespaces are fully supported in REXML and within the XPath - parser. There are a few caveats when using XPath, however:
- -each
, first
, and
- match
and pass them the mapping.The pull parser API is not yet stable. When it settles down, I'll - fill in this section. For now, you'll have to bite the bullet and read - the PullParser - API docs. Ignore the PullListener class; it is a private helper - class.
-The original REXML stream parsing API is very minimal. This also - means that it is fairly fast. For a more complex, more "standard" API, - REXML also includes a streaming parser with a SAX2+ API. This API - differs from SAX2 in a couple of ways, such as having more filters and - multiple notification mechanisms, but the core API is SAX2.
- -The two classes in the SAX2 API are SAX2Parser
- and SAX2Listener
.
- You can use the parser in one of five ways, depending on your needs.
- Three of the ways are useful if you are filtering for a small number
- of events in the document, such as just printing out the names of all
- of the elements in a document, or getting all of the text in a
- document. The other two ways are for more complex processing, where
- you want to be notified of multiple events. The first three involve
- Procs, and the last two involve listeners. The listener mechanisms are
- very similar to the original REXML streaming API, with the addition of
- filtering options, and are faster than the proc mechanisms.
An example is worth a thousand words, so we'll just take a look at - a small example of each of the mechanisms. The first example involves - printing out only the text content of a document.
- -In this example, we tell the parser to call our block for every
- characters
event. "characters" is what SAX2 calls Text
- nodes. The event is identified by the symbol :characters
.
- There are a number of these events, including
- :element_start
, :end_prefix_mapping
, and so
- on; the events are named after the methods in the
- SAX2Listener
API, so refer to that document for a
- complete list.
You can additionally filter for particular elements by passing an
- array of tag names to the listen
method. In further
- examples, we will not include the require
or parser
- construction lines, as they are the same for all of these
- examples.
In this example, only the text content of changelog and todo - elements will be printed. The array of tag names can also contain - regular expressions which the element names will be matched - against.
- -Finally, as a shortcut, if you do not pass a symbol to the listen
- method, it will default to :element_start
This example prints the "version" attribute of all "item" elements
- in the document. Notice that the number of arguments passed to the
- block is larger than for :text
; again, check the
- SAX2Listener API for a list of what arguments are passed the blocks
- for a given event.
The last two mechanisms for parsing use the SAX2Listener API. Like
- StreamListener, SAX2Listener is a module
, so you can
- include
it in your class to give you an adapter. To use
- the listener model, create a class that implements some of the
- SAX2Listener methods, or all of them if you don't include the
- SAX2Listener model. Add them to a parser as you would blocks, and when
- the parser is run, the methods will be called when events occur.
- Listeners do not use event symbols, but they can filter on element
- names.
In the previous example, listener1
will be notified of
- all events that occur, and listener2
will only be
- notified of events that occur in changelog
,
- todo
, and credits
elements. We also see that
- multiple listeners can be added to the same parser; multiple blocks
- can also be added, and listeners and blocks can be mixed together.
There is, as yet, no mechanism for recursion. Two upcoming features - of the SAX2 API will be the ability to filter based on an XPath, and - the ability to specify filtering on an elemnt and all of its - descendants.
- -WARNING: The SAX2 API for dealing with doctype (DTD) - events almost certainly will change.
-Michael Neumann contributed some convenience functions for nodes,
- and they are general enough that I've included. Michael's use-case
- examples follow:
This isn't everything there is to REXML, but it should be enough to
- get started. Check the API
- documentationtest/
directory,
- and these are great sources of working examples.
Among the people who've contributed to this document are:
- -Para of text
-Remove this and figs order differently
-- A link. -
- - -" - - xmldoc = REXML::Document.new(doc) - xpath = "descendant::node()[(local-name()='link' or local-name()='a') and @rel='sub']" - hrefs = [] - xmldoc.elements.each(xpath) do |element| - hrefs << element.attributes["href"] - end - assert_equal(["/"], hrefs, "Bug #3842 [ruby-core:32447]") - end - end -end diff --git a/test/rexml/xpath/test_compare.rb b/test/rexml/xpath/test_compare.rb deleted file mode 100644 index bb666c9b12..0000000000 --- a/test/rexml/xpath/test_compare.rb +++ /dev/null @@ -1,256 +0,0 @@ -# frozen_string_literal: false - -require_relative "../rexml_test_utils" - -require "rexml/document" - -module REXMLTests - class TestXPathCompare < Test::Unit::TestCase - def match(xml, xpath) - document = REXML::Document.new(xml) - REXML::XPath.match(document, xpath) - end - - class TestEqual < self - class TestNodeSet < self - def test_boolean_true - xml = <<-XML - -