Slacking

I’m tweaking the collection of data relating to activities inside our corporate Slack team. I was particularly interested in volume, for example, if someone asked us to capture the metadata associated with every reaction to a Slack message with the party parrot, how much storage (and processing power) would I need to budget on the corporate log management platform?

I should mention before i start to blab, that there are probably great commercial products to do this kind of work, and if you are short on time you should look into them. However, if you set aside an afternoon to mess about with a tiny amount of code, and a whole heap of JSON you’ll have a lot more fun.

The first thing to know is that most (maybe all?) the Slack API messages are delivered as JSON.

An event record that captures a reaction would look like this when the API serves it up:

The second thing to consider is that for my use case (determining data volume) there are a couple of good avenues to pursue.

  1. Use the events API. You subscribe to the events that you care about, and Slack serves them up as they happen.
  2. Use the team.accesslogs HTTP-RPC endpoint. You ask Slack for a date range, and Slack will tell you all the folk who logged in that day.

Events API:

For the events API, there was a really great tutorial available. The tutorial is easy to follow and will get you up and running fast. It covers everything from choosing the events you are interested in (the ones you want your application to ‘subscribe’ to), all the way to setting up a reverse proxy with ngrok (don’t worry, its a one line command) so that your application can receive events while it is running on your laptop.

Once I had the tutorial up and running, the only tinkering I needed to do was to adjust the routes that I cared about. As an example, while I was looking at reactions to Slack messages my example.py file looked like this:

from slackeventsapi import SlackEventAdapter
from slackclient import SlackClient
import json
import os
import pprint

SLACK_VERIFICATION_TOKEN = os.environ[“SLACK_VERIFICATION_TOKEN”]
slack_events_adapter = SlackEventAdapter(SLACK_VERIFICATION_TOKEN, “/slack/events”)

SLACK_BOT_TOKEN = os.environ[“SLACK_BOT_TOKEN”]
CLIENT = SlackClient(SLACK_BOT_TOKEN)

total_size = 0

def display_total(size):
    print “——————————————-“
    print size, ” bytes sent from slack events so far”
    print “(“, size / 1024 / 1024, ” MB so far)”
    print “——————————————-“

@slack_events_adapter.on(“reaction_added”)
def channel_created(event_data):
    global total_size
    json_obj = json.dumps(event_data)
    json_size = len(json_obj)
    total_size = total_size + json_size
    print “The size of this object is: “, len(json_obj)
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(event_data)
    display_total(total_size)

slack_events_adapter.start(port=3000)

live event subscription for ‘reactions’ to messages in Slack. We used variations on the code above to measure the volume over a period of time as well as the processing required to handle this at scale.
Example of the ngrok tunnel allowing Slack to deliver messages from out on the internet to the application running on my laptop.

With all the awesome work the Slack developer evangelists did with that tutorial, variations on the model above was about all I needed to measure event subscription volume and get a really good idea how much it would cost to include various slack events in our central log platform. (FWIW – we don’t really care about message reactions, just an example).

The team.accesslogs HTTP-RPC endpoint:

Flipping it around a little, Slack also offers the traditional HTTPS-RPC endpoints that allow us to ask the questions, rather than slack tapping us on the shoulder when stuff happens.

The team.accesslogs method is the one that gives us information about who logged in, where, when and from what device. Its not a stretch to imagine that this is an interesting endpoint for most security teams. The data you get back from Slack about this one looks like this:

{
    “user_id”: “U12345”,
    “username”: “bob”,
    “date_first”: 1422922864,
    “date_last”: 1422922864,
    “count”: 1,
    “ip”: “127.0.0.1”,
    “user_agent”: “SlackWeb Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_2)       AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/41.0.2272.35     Safari\/537.36”,
    “isp”: “BigCo ISP”,
    “country”: “US”,
    “region”: “CA”
}

I have the same goal in mind with this endpoint. If i want to store this stuff, I need to know how many events like this we generate per day (on average).

Slack authorization tokens for this type of thing can be retrieved from here. (They can be revoked from here by going to “tester” and issuing the sample API call to kill the token for your team).

Once I had a token, I settled with some skeleton code that looked like this:

import json
import requests
import time

url = “https://slack.com/api/”
method = “team.accessLogs”
token = “{your token here}”
pretty = 1
page = 1

total_size = 0

for _ in range(100):

    payload = {‘token’: token, ‘pretty’: pretty, “page”:page}
    r = requests.get(url + method , params=payload)
    json_version = r.json()
    response_size = len(json_version)
    total_size = total_size + response_size
    print “Size: ” + str(response_size)
    for event in json_version[“logins”]:
        print “Name: ” + str(event[“username”])
        print “IP: ” + str(event[“ip”])
        formatted_time = time.strftime(‘%Y-%m-%d %H:%M:%S’, time.localtime(event[“date_last”]))
        print “Date: ” + str(formatted_time)
        print “”

    page = page + 1

print “Total Size: ” + str(total_size)

Nothing really special there, but hopefully the template will save someone a little time getting up and running quickly. Variations on this allowed me to narrow things down and get to the information that was most important for our security program. We started adding code to bucket the volume by days, we looked at filtering various things, we also took some measurements to understand how often we would need to pull from the endpoint to get an efficient next set of data each time. We were able to project the average data volume that tracking this type of event would add, and also make some assumptions about the rate of growth over time.

All in all, a good detour slightly off the typical Thursday afternoon and we have a good understanding of how much extra space and processing we need to add Slack data to our logging systems.