On the Liip blog today there's a new post from David Buchmann looking at why a client of theirs moved away from a Google Search Appliance and towards Zend_Lucene for their searching needs.
Google technology does a good job when searching the wild and treacherous realms of the public internet. However, the commercial Google Search Appliance (GSA) sold for searching intranet websites did not convince me at all. For a client, we first had to integrate the GSA, later we reimplemented search with Zend_Lucene. Some thoughts comparing the two search solutions.
They replaced a search appliance with Lucene for a few reasons, some of which the GSA weren't doing very well: access protection, language detection and filtering by meta data. He describes the Lucene system that replaced it - separate processes for each website, checks on only changed documents, plain-text conversion and backups of the Lucene indexes, just in case. It's definitely a more flexible solution, though the initial indexing still takes a while. He also includes a small snippet of code showing how to read in binary files as plain text.