Star (-) Watch (-)

Blog

noodle is a node server and module which clients can use to query data from web documents. Features Cross domain document querying (html, json, xml, atom, rss feeds) Server supports querying via JSONP and JSON POST Multiple queries per request Access to queried server headers Allows for POSTing to web documents In memory caching for query results and web documents

Setup Create a directory

$ mkdir noodlejs $ cd noodlejs

Create an express application $ express

create : . create : ./package.json create : ./app.js create : ./public create : ./routes create : ./routes/index.js create : ./routes/user.js create : ./views create : ./views/layout.jade create : ./views/index.jade create : ./public/javascripts create : ./public/images create : ./public/stylesheets create : ./public/stylesheets/style.css

install dependencies: $ cd . && npm install

run the app: $ node app

Install dependencies $ npm install

Install noodlejs

$ npm install noodlejs

or

$ git clone git@github.com:dharmafly/noodle.git $ cd noodle $ npm install

In app.js add these lines

var noodle = require('noodlejs');

app.get('/:text', function(req, res) { noodle.query({ url: 'http://google.com/search?q='+req.params.text,

  selector: 'h3.r a',            // If no `selector` is specified than the entire document is returned. This is a 
                                // rule applied to all types of docments. The `extract` rule will be ignored if included.
  extract:  ["href", "text"]    // "text" if you want to display only the text
                            // "href" if you want to display only the links
                            // In case multiple extracts give it as an array ie ["href", "text"]
})
.then(function (results) {
  res.send(results);             // return the result
});

});

In browser try :

http://localhost:3000/noodlejs

Various noodle settings like cache and ratelimit settings are exposed and can be edited in lib/config.json. Some config options are

// If no query type option is supplied then // what should noodle assume "defaultDocumentType": "html" //html, json, feed & xml

To configure noodle programmatically use noodle.configure(obj) ;

example : noodle.configure({ debug: false, defaultDocumentType: "json" });

Dependencies : JSONSelect : used as the underlying library to extract information from a JSON document (http://jsonselect.org/) connect-ratelimit : provides rate limiting out of the box with connect-ratelimit(https://github.com/dharmafly/connect-ratelimit). connect, feedparser, moment, cheerio, request, q, xml2json, underscore, mocha, chai, colors

Repository https://github.com/dharmafly/noodle Bugs https://github.com/dharmafly/noodle/issues

Error handling fail() handler used to listen to various errors that Noodle fire. noodle.query(queryObj) .then(function (result) { console.log(results); }) .fail(function (error) { console.log( error ); });

Methods noodle.query noodle.fetch noodle.html.select noodle.json.select noodle.feed.select noodle.xml.select noodle events : allows one to listen for emitted cache related events. Noodle inherits from node’s EventEmitter. // Called when a page is cached noodle.events.on('cache/page', function (obj) { //obj is the page cache object detailing the page, its headers //and when it was first cached });