noodle is a node server and module which clients can use to query data from web documents. Features Cross domain document querying (html, json, xml, atom, rss feeds) Server supports querying via JSONP and JSON POST Multiple queries per request Access to queried server headers Allows for POSTing to web documents In memory caching for query results and web documents
Setup Create a directory
$ mkdir noodlejs $ cd noodlejs
Create an express application $ express
create : . create : ./package.json create : ./app.js create : ./public create : ./routes create : ./routes/index.js create : ./routes/user.js create : ./views create : ./views/layout.jade create : ./views/index.jade create : ./public/javascripts create : ./public/images create : ./public/stylesheets create : ./public/stylesheets/style.css
install dependencies: $ cd . && npm install
run the app: $ node app
Install dependencies $ npm install
Install noodlejs
$ npm install noodlejs
or
$ git clone git@github.com:dharmafly/noodle.git $ cd noodle $ npm install
In app.js add these lines
var noodle = require('noodlejs');
app.get('/:text', function(req, res) { noodle.query({ url: 'http://google.com/search?q='+req.params.text,
selector: 'h3.r a', // If no `selector` is specified than the entire document is returned. This is a
// rule applied to all types of docments. The `extract` rule will be ignored if included.
extract: ["href", "text"] // "text" if you want to display only the text
// "href" if you want to display only the links
// In case multiple extracts give it as an array ie ["href", "text"]
})
.then(function (results) {
res.send(results); // return the result
});
});
In browser try :
http://localhost:3000/noodlejs
Various noodle settings like cache and ratelimit settings are exposed and can be edited in lib/config.json. Some config options are
// If no query type option is supplied then // what should noodle assume "defaultDocumentType": "html" //html, json, feed & xml
To configure noodle programmatically use noodle.configure(obj) ;
example : noodle.configure({ debug: false, defaultDocumentType: "json" });
Dependencies : JSONSelect : used as the underlying library to extract information from a JSON document (http://jsonselect.org/) connect-ratelimit : provides rate limiting out of the box with connect-ratelimit(https://github.com/dharmafly/connect-ratelimit). connect, feedparser, moment, cheerio, request, q, xml2json, underscore, mocha, chai, colors
Repository https://github.com/dharmafly/noodle Bugs https://github.com/dharmafly/noodle/issues
Error handling fail() handler used to listen to various errors that Noodle fire. noodle.query(queryObj) .then(function (result) { console.log(results); }) .fail(function (error) { console.log( error ); });
Methods noodle.query noodle.fetch noodle.html.select noodle.json.select noodle.feed.select noodle.xml.select noodle events : allows one to listen for emitted cache related events. Noodle inherits from node’s EventEmitter. // Called when a page is cached noodle.events.on('cache/page', function (obj) { //obj is the page cache object detailing the page, its headers //and when it was first cached });