Tuesday, 4 June 2013

File sharing via Reddit

A few months ago, a friend and I were having a drink and we ended up talking about how Reddit in some ways has behaviours and layouts that sort of remind me of Usenet. They have a lot of differences of course, but in some ways they're also quite similar.

When that idea came up, I wondered how difficult it would be to be to use Reddit for what really was the final nail in the coffin for Usenet: file sharing.


One of the problems facing this idea is that there are varying limits on what can be posted on the site. For example, messages between users and comments in posts have a character limit of 10,000, but self-posts can be either 10,000 or 40,000 characters depending on the status the sub-Reddit has. This means that we can only work with files that at most 9 KB in a lot of situations and 39 KB at best.

However, in addition to those limits, you cannot expect to post binary data right into your form post without running into corruption problems should changes be made to the file upon it entering the database. So when you use something like Base64, you end up having to work with smaller files as the encoding process will increase the file size.

On top of that we have to verify that when receiving the file that the data hasn't become corrupted. This means that in addition to the encoded file, we'll need a small header that includes some small details about it.

Storing and receiving data

Based on my last points, in the end I had to create something like this in a Reddit post:

| File name: filename.ext |
| File size: X [bytes/KB] | < Meta data
| File MD5: md5sum |
| Base64 data | < File data

We only need the file name, the MD5, and the Base64 data in this circumstance, but the file size is useful if you end up reading the message as a person and not as the tool.

And speaking of which, that is exactly what I have done. I have written a tool that does exactly this very thing.

Introducing "Karma Share"

Reddit is built based on karma, which is basically Internet points that cannot be exchanged for any monetary value; so why not get karma for sharing files? Well, not exactly yet, as the tool is only designed to send and receive files using private messages.

So using the wonders of Python and the Reddit API, I have created a tool called Karma Share, which is a command line-based application that will send a file via Reddit's private messaging system to any user that you desire.

In addition to sending files, it can also receive them too.

Overview of Karma Share

Karma Share is written using Python 3 and PRAW, a Python library that interfaces with the Reddit API.

To send a file to someone, it just involves invoking this command:

karmashare.py push <filename> <recipient> <user> <pass>

And to receive a file, it's as simple as this:

karmashare.py pull <user> <pass>

You can also edit the file quite easily and have the username and password stored directly in the script, requiring you to only use "pull" by itself or just "push [file] [recipient]".

Karma Share - Version 0.1
Created by Colin Keigher - http://afreak.ca

Filename: clients.csv
Size: 3 KB
MD5: 845e0ccf9318e51af4241f4b0e594dc0

*** Attempting to login...
*** Login succeeded!

*** Attempting to send message to AnotherRedditUser.
*** Message sent!

Messages are sent as normal and can be viewed via a browser without any consequences.

It also checks to see if you have already received files and will not discriminate against messages that have already been read; this means that you can avoid worrying about accidentally reading a message that was meant to be downloaded using this tool.

Karma Share - Version 0.1
Created by Colin Keigher - http://afreak.ca

*** Attempting to login...
*** Login succeeded!

RedditUser has sent you a file!
File name: clients.csv
File size: 3
File sum: 845e0ccf9318e51af4241f4b0e594dc0

RedditUser has sent you a file!
Skipping decoding of oka.jpg as MD5 matches existing file.

RedditUser has sent you a file!
Skipping decoding of qm.gif as MD5 matches existing file.

Read 3 messages and downloaded 1 new items.

The tool makes an MD5 sum of the file before it encodes it and will only send it via the Reddit messaging system provided that the Base64-encoded data and the header values themselves do not exceed the limits imposed by the site. It will also not write the file if it finds that the data has been corrupted upon it being transferred to Reddit.

Pitfalls and the future

The obvious problem here is that we're still limited to files that are a few KB in size. However, it is possible that the tool can be written to do multi-parted files. But there is one caveat to this.

Reddit does limit how many posts you can make in any given period of time and if you're using an account that is either not verified via e-mail or has somehow tripped its anti-spam mechanisms otherwise, it is going to require verification for each and every single post. This will become apparent in the tool when it starts to requesting for a captcha input upon sending a file; the PRAW library provides a link you can click that will show you the required string.

In some cases you can post every few seconds, but in a lot of instances you will find that you can only post every ten minutes. This means that if you're attempting to send 100 KB via Reddit, it's going to take you almost an hour and a half--it's not very effective. For the time being I do not believe that the powers that be at Reddit will have anything to fear as it would take weeks to just upload a 500 MB DVD-quality copy of the latest Top Gear or Game of Thrones episode. But this could be possibly useful for posting a funny cat picture without having to rely on services like Imgur.

Really, this is an experiment and perhaps a useful one in certain circumstances that I have yet to figure out. Also keep in mind that using this tool could get you banned from Reddit so I take no responsibility if you lose any of your Internet points.

Download it!

Want to try it out? It works just fine under Python 3 with the PRAW libraries installed. You can grab it via my GitHub.

The code is licensed under the GPLv3 licence.

Ed: the idea has been turned into a webcomic entry.

No comments:

Post a Comment