I’m very happy to announce that FAQoverflow has launched! Yay! Please take a look, by visiting www.faqoverflow.com, and please help to spread the word in whichever way you can.
FAQoverflow is the coffee table book version of the StackOverflow (SO) family of Q&A sites. It contains great answers to questions about everything, and is ideal for flipping through.
There is a dedicated webapp version of FAQoverflow for the iPhone. It’s free, and it’s easy to install.
In the next week or so we’ll be releasing PDF and eBook versions of FAQoverflow, idea for offline reading. When they’re available, you’ll be able to get them here.
I had the idea for FAQoverflow just over two weeks ago, and I quickly set about making it real. All up, implementation has taken about 60 hours so far. This is what I did:
- Registered the faqoverflow.com domain name.
- Set up an Amazon Web Services account.
- Set up S3 (Amazon’s file hosting service) and CloudFront (Amazon’s content-distribution network).
- Quickly got a “Coming Soon” page online
- Hand-coded HTML5 and CSS
- Incorporated Google Analytics
- Chose suitable Typekit fonts
- Set up site for Google Webmaster Tools (and Yahoo! and Bing as well)
- Made sure site works well on the iPhone
- Wrote a Ruby script to spider the Stack Exchange API
- Made sure the script recovers gracefully if it crashes out, as spidering takes a long time
- Made sure the script doesn’t hit the SO servers too hard
- Developed a crazy algorithm for finding and categorising the best questions and answers
- Wrote a Ruby script to generate a static website from the spidered content
- Tweaked the CSS to make sure each question and answer is beautifully formatted
- Used Google’s “Prettify” library for code formatting
- Used MathJax to format LaTeX-style maths equations
- Hooked up a PayPal donation button (in the hope the community can fund ongoing hosting)
- Wrote a Ruby script to upload all of the generated content to S3, thereby setting the site live :)
- Compact HTML with HTML Tidy
- Compact CSS with the YUI Compressor
Spidering the content is relatively easy. It works as follows.
- Make a request to the Stack Auth API to find all SO sites that are publicly visible, excluding the “meta” sites.
- For each site, request the top 1000 questions, sorted by votes.
- For each question, request the top 2 answers, sorted by votes.
- Exclude any questions that are closed, or that don’t have any answers.
- For each question, calculate a quality score. This takes into account the following factors:
- The number of votes the question received.
- The difference between the number of votes of the top two answers (thereby favouring questions that have one standout answer).
- The length of the question (preferring questions that are a few sentences long).
- The length of the answer (preferring answers about a page long).
- For a particular site, sort the questions by quality.
- Divide the questions up into sections, as follows:
- Consider the question at the top of the list (i.e. the one with the highest quality score).
- Create a tentative section for each combination of tags for that question, but don’t create a tentative section if the FAQ already contains that section. So, for example, if a question is tagged “Ruby” and “Web”, then create three tentative sections (“Ruby”, “Web” and “Ruby and Web”).
- Walk through the list, adding questions to the tentative sections if they have the requisite tags. Stop adding questions to a tentative section once it contains ten questions.
- Discard any tentative sections that contain less than 5 questions by the end of this process (we want each section of the FAQ to contain between 5 and 10 questions).
- Of the remaining tentative sections, choose the one with the greatest number of tags. If there is more than one such section, choose the one with the largest average quality score. Add the chosen section to the FAQ, and remove the questions in that section from the list. Repeat this process!
- If no tentative sections remained after this process, add the question to the “Miscellaneous” section. Once that fills up, we’re done!
- Sort the sections by their average quality, but always place the Miscellaneous section last.
- Sort the sites (which we now call “chapters”) by their average quality, but always place the “About FAQoverflow” chapter at the top.
The entire process is automated, and recovers from errors. At the moment we kick things off manually, but we plan to start up a server so that the FAQ spidering and generation process runs automatically, around the clock, which will allow me to update FAQoverflow every few days.
Yes, that’s right – we currently do not run a server at all. The entire site is static, and hosted by CloudFront. It look like it’ll cost us a few cents a month to run. Not only that, we leverage Amazon’s content distribution network, so the site is really fast regardless of where you are in the world, as it serves files from “edge locations” in the US (9 datacentres across the country), Europe (four countries across the EU) and Asia (Singapore, Hong Kong and Japan), depending on where the requests are coming from. Finally, it’s super reliable and robust, so that we fully expect to survive any level of Slashdottery that we may be subjected to. Great stuff!
We believe that this is the direction that webapps are moving in; with the smarts migrating from the server back-end to the client and to offline script processing, and we anticipate experimenting with this approach a whole lot more.
Anyway, I’ve spent hours reading FAQoverflow, so it works for me. Hope you’ll get some pleasure out of it too!