Docuverse: Idea for Peer to Peer Anonymous Document Repository
So in my dealings with various document communities out there (first rule of Fight Club = Don’t talk about Fight Club), I see a huge inefficiency out there, and single points of failure. This has led me to the think about ways in which these various problems could be solved. Here are the main problems with these communities I see:
- Format proliferation: Everyone has a favorite document type they want for their particular viewer/reader application. HTML, TEXT, RTF, DOC, PDB, LIT, etc.
- Versioning Issues: Many times these documents are flawed when initially released and require iterative improvements. And while many documents adhere to the community standard version rules (which are themselves pretty lax), many do not. So when combined with the format proliferation mentioned above, it can be challenging to find the best version of a document out there, and clogs up hard drives with all of the various versions. Both this and format proliferation also increase the time it takes to search for and retrieve the document the user wants.
- Single or few points of failure: Many documents that are older no longer get served by many of the community members, or if those members decide to no longer be involved in the community, whole swathes of documents may be lost. While there is a certain level of redundancy built in to the system, it is not the type of redundancy that leads to a long-term healthy document library. Also, in many cases there are choke points for document distribution. There are several brave and dedicated souls who contribute the bulk of the effort and bandwidth required to provide the document library maintenance and distribution capability. But this leads to longer download times and strain on these ‘backbone’ users.
- Proofing Effort: Again, because of the release cycle and existing methods, there is a barrier to entry for new and less dedicated users to contributing to the community by editing flaws found in the documents. Ideally, as each and every reader was reading their document, they would be able to easily indicate corrections, that would then propagate to the rest of the community without having to deal with the whole verisoning/format/duplicate files issues.
- Lack of Anonymity: Due to the nature of some of documents, there may be liability if any individual is recognized as the creator/distributor of the document, so creating some sense of anonymity would be preferable.
So my idea is to create a p2p application that incorporates solutions to these issues, while hiding much of the complexity needed to solve them.
- Create a standardized, extensible format (XML based presumably) for document encoding. Provide converters for all of the major file types, with user customizable styles, so when files are released they can be converted to the user’s contentment. Provide for all standard document entities and formatting, as well as the aforementioned extensibility to allow for more arcane formatting innovations.
- Provide for Wiki style versioning of these documents, where anyone may be allowed to edit a document for corrections, but that the entire history of corrections is preserved so that malicious edits can easily be reverted. Limit edit size and frequency to prevent mass malicious corruption. Create a voting/approval system whereby corrections can easily be undone for any malicious edits on a mass level.
- Create a giant massively redundant distributed and encrypted database to store these encoded documents. Users would be required to provide a minimum amount of shared space on their hard drives to store file indexing information and database storage (say 1GB minimum). Many ‘power users’ would obviously provide much more space. Most document communities are around the 1TB range. Ideally, many users providing 100GB or more of storage would allow for redundancy on the order of 10x. Users would not know which files/portions of files they were hosting. Popularity would be used to provide more redundancy for those files requested by many users, while still maintaining a certain level of redundancy for unpopular files or for users entering and leaving the peer group. This would avoid the problems of other p2p communities hopefully, where files may be available shortly after release, but quickly disappear if not for a few dedicated seeders.
- Provide for decentralized pseudo-anonymity where all file requests would be delivered through an intermediary user, who acted strictly as a conduit for that request. This would double the bandwidth requirements for the system, but should fall well within the realm of acceptability considering these are mostly text documents, while providing a level of security to everyone in the community. More research would be required to see if there was a way to make it truly anonymous, without the need for centralized servers.
- Create reader/viewers for all major operating systems and devices, that allowed the user to read and edit the documents seamlessly.
Anyway, this is an idea rattling around in my head. Not sure if it could ever go anywhere, but it would go a long way in my opinion to solving a lot of the inefficiencies inherent in the existing communities.
Trackbacks
Use the following link to trackback from your own site:
http://blog.slaingod.com/trackbacks?article_id=docuverse-idea-for-peer-to-peer-anonymous-document-repository&day=21&month=01&year=2008