Monday, 1 April 2013

Being an Avivore and data mining Twitter

How easy is it to pull phone numbers, IP addresses, and Blackberry PINs from Twitter? It should be no surprise that it isn't very difficult at all and it was the topic of a presentation that I gave at BSides Vancouver last month. This is more of a follow up to my talk more so than my previous write-up on the event.

Background

This began really due to boredom on my part. I was working on a Twitter-related project for Maker Faire Vancouver and due to some missing components, I had to delay what I was already writing. Being that I do not like to have idle hands, I started to mess with what I already had and decided to make it so it would search for phone numbers. I didn't anticipate anything at first but then I started to get results.



After seeing the large volume of results from the script, I decided that I would bear a bot that would respond to people who have been found to be erroneously posting their numbers on Twitter. Oddly it seems that I am the first to go and search for this information openly, but not the first to do something related to user-initiated privacy violations--not sure what else to call this.

@PhoneNumberTwit

After some fumbling around, I managed to create a Twitter bot.



The bot was designed to only respond to one tweet at a minimum of ten minutes and only when a random number generator hit a certain value--so not to cause a flood of tweets. It also was set to tweet a randomly chosen phrase to avoid the anti-spam actions that Twitter itself would take should I tweet the same thing each time.

While I took the time to ensure that Twitter itself wouldn't ban me, it ran for something like 36-hours before it got banned. I had eventually appealed it but I spoke to a contact I had for the company and was told upfront that it would probably remain banned if it were to get in trouble again.

Some of the reactions during its brief lifetime were kind of interesting:



Poor decisions are made regardless of what country they're from--especially mine.



Justin Bieber is pretty wild these days. Maybe?



Note: no mobile phone was harmed or abused for this portion of project.

So what next?

Well, I wanted to be persistent on this and thoughts about a few different approaches. One idea someone had proposed was to follow the lead of @NeedADebitCard and just retweet the phone numbers. However, I didn't want to run the risk of getting banned from Twitter should they decide that it should be banned--mine would likely follow soon after.

Maybe I should just compile the findings into a list?



This ran for a few days as well and I killed it eventually (it could be revived at a moment's notice) for reasons that I cannot recall at the time of this writing. It basically followed similar footsteps to that of Please Rob Me, which compiled results found of Twitter users checking in using services like Foursquare to demonstrate that one can be easily followed.

It worked quite well as my application would take results and then dump them into a database. The page itself would then display items from the database and then display the number in the format '(555) 555-1xxx'. It was intended so one could just delete their tweet should they find their phone number showing up on my page. By deleting the tweet the obfuscated number would be all that remained and would just fall off of the list as others would make the same mistake.

In addition to that, I had briefly configured it to use Twilio to send text message very sparingly to random users who ended up in that list. Out of the 400 or so tweets that I archived before shelving the project, about nine people were texted with the message "I found your number on Twitter. You should not tweet these things, y'know". Most of the responses were fairly benign but I discontinued from this behaviour as I was unsure to what laws I may or may not be violating at the time.

Ed - April 3: I was asked why I discontinued from this overall. I had decided that I wasn't sure of the consequences of doing this. Add the fact that I didn't want my hosting service to have any troubles over this so I opted to not let this service operate any further.

Going for broke

In the end I opted to just create some code to search Twitter on a cycle and pull everything into a database. The code is somewhat not well written but it does the job and lets you see how effective it is in retrieving information. It was initially designed to just pull phone numbers but then it sort of mutated into finding IP addresses and Blackberry PINs.

A sample output is as follows:

$ python avivore.py
Avivore 1.0
A Twitter-based tool for finding personal data. │
Licensed under the LGPL and created by Colin Keigher
http://github.com/ColinKeigher
[1364844946] Using existing database to store results.
[1364844946] 12447 entries in this database so far.
[1364844956] Type: bbpin, User: SocialGamerMax, PIN: 261D288C, TweetID: 318808517685440513
[1364844957] Type: bbpin, User: AsgharBhatti3, PIN: 21D91A46,TweetID: 318808291490791424
[1364844957] Type: bbpin, User: MIDOO_889, PIN: 26CFA12B, TweetID: 318807746273214464
[1364844957] Type: bbpin, User: Tsa7el, PIN: 25ba2a8f, TweetID: 318806708887629824


The 12,000+ entries you see above were created over the course of 24-hours, so as you can see there is just a tonne of personal data sitting on Twitter for everyone to have their hands on.

Closing

Some other ideas that came up was to harvest e-mail addresses or prescription medication. E-mail addresses would be straight forward, but prescriptions were an interesting one that would probably require me to build a database of names of common drugs. This would be interesting to search for using the limitations that Twitter has put upon me, but I would not be surprised if this and the other data has some sort of use.

The talk I gave at BSides Vancouver touched loosely on the ramifications of this sort of information and what use it could serve. For example, someone who is willing to provide their personal details on the service may also be highly likely to be susceptible to social engineering attacks. An easy one to pull off would be to call the individual a few days or so afterwards stating that they were from Twitter and wanted to ensure that everything was okay and to also "verify" their password. For businesses, it could allow you to know who might be a risk to your company and might be willing to produce certain, proprietary information that you may not want divulged. These are of course all hypothetical but it was something that ran across my mind as I was working on this project.

As a result of this work, I've expanded on this data retrieval for other services and have a new project in the works that may likely be demonstrated sometime soon. For now, feel free to try out the script and have a sigh or two.

10 comments:

  1. Really, great work, I didnt realize @NeedADebitCard was actually real. and my god, in the wrong hands...

    ReplyDelete
  2. Nice article! Are you sharing your code anywhere? I'm new to python, would love to know the sweet code ~~~ :)

    ReplyDelete
  3. This needs to be a module for Recon-ng: https://bitbucket.org/LaNMaSteR53/recon-ng

    ReplyDelete
  4. Great article, I have been looking into the same, out of boredom, but on photobucket, very scary, the things people upload there and make public.

    Phone numbers,
    Name & Address often at the top of letters, so SE attacks have a context
    Credit Cards including svn
    There is even a medical center that uses photobucket to share your chest xrays

    ReplyDelete
  5. Hi there,

    This is great...thanks for sharing.

    I'm having a bit of trouble getting your script to work. Getting the following:

    TypeError: 'NoneType' object has no attribute '__getitem__'

    Thanks again, sir.

    ReplyDelete
  6. Is it just me, or does anyone else get the following error message?

    File "avivore.py", line 71, in Main
    for y in TwitterSearch(x):
    TypeError: 'int' object is not iterable

    ReplyDelete
  7. Hi guys,

    There is a flaw in the application that I am aware of where the TypeError appears. It's the result of some sloppy writing on my part (I am not a very good developer but trying to improve) and generally just restarting the application fixes it. I'll make a correction to it either tonight or tomorrow to address the issue.

    I am not sure about Dave's issue however. Could you elaborate?

    Cheers,
    Colin

    ReplyDelete
  8. Hi Colin,

    I think if you resolve the TypeError I should be good. Thanks, sir.

    Dave

    ReplyDelete