New RDF processing at Screaming Penguin
Submitted by charlie.collins on Wed, 11/07/2001 - 21:58
Tagged:
Screaming Penguin uses RDF (and RSS) to provide syndicated content from other tech, science and news sites around the web. RDF is an XML based standard for providing content. If you are not already aware of the TotSP content syndication features then please check out customTotSP for more information.
Anyway. The RDF processing engine at TotSP has been entirely rewritten. This was done for extensibility, flexibility and mega better performance. The old system was extremly solid, but also very slow.
The new system is entirely independent of the web server and site and uses a java library to retrieve and store remote RDF files. Previously the RDF files were retrieved and stored locally, but this was done as a web page was viewed (and the files were cached, so technically this was only done on a page that was viewed AND had an expired source.) The new java library was written in house entirely and uses the same database tables that the website uses to know where to get RDF files for "sources." The new stuff runs via cron on an hourly basis, so remote sources will be updated hourly (this is better than the previous 30 minutes for web traffic and performance reasons, but does mean data will not be as new, if this becomes an issue for anyone please let me know and it can be changed if justified.) Once the source information is in the database and the data is local via the java library the site uses the same old PHP render functions to display the information (one day I would like to java the penguin, but you know how the time thing can be, and dont get me wrong I have nothing but great stuff to say about PHP, I just want to focus some more personally and stick to Java.)
All this mumbo jumbo means the site should be a lot faster. All data is now local and in a database. Let me know what you think.
Also, I am still documenting and revising some of the stuff, but all of this RDF jazz will be available here in the projects section when ready. There is a web based admin tool written in PHP (to add, edit, remove and preview sources), a PHP based RDF parsing and rendering function, and the entire java package that retrieves and stores remote RDF files. Its not all standalone and abstracted at this point, but that is my goal. As soon as possible this will all be release, if you want stuff before its ready just let me know.
Penguin Warp Drive ON
UPDATE!!!
I also cleaned up the sources themselves. Some have gone away literally and therefore were deleted (32bitsOnline, GeekNews), some were lame and some STILL DONT USE RDF-RSS!!!
NOTABLY, I still parse LinuxToday using their silly text headlines file, its the only special one now. They even have an "XML" file backend according to their site but it aint RDF-RSS. Clowns! Therefore LinuxToday is still avail because of extra effort, but it WONT be on the front page, and WILL still be slow.







Recent comments
22 weeks 7 hours ago
22 weeks 18 hours ago
24 weeks 4 days ago
25 weeks 2 days ago
25 weeks 2 days ago
25 weeks 2 days ago
29 weeks 6 days ago
30 weeks 10 hours ago
30 weeks 3 days ago
30 weeks 5 days ago