In Log Everything All the Time I advocate applications shouldn't bother logging at all. Why waste all that time and code? No, wait, that's not right. I preach logging everything all the time. Doh. Facebook obviously feels similarly which is why they opened sourced Scribe, their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions that went to News Feed, and many others.
Imagine hundreds of thousands of machines across many geographical dispersed datacenters just aching to send their precious log payload to the central repository off all knowledge. Because really, when you combine all the meta data with all the events you pretty much have a complete picture of your operations. Once in the central repository logs can be scanned, indexed, summarized, aggregated, refactored, diced, data cubed, and mined for every scrap of potentially useful information.
Just imagine the log stream from all of Facebook's Apache servers alone. Brutal. My guess is these are not real-time feeds so there are no streaming query issues, but the task is still daunting. Let's say they log 10 billion messages a day. That's over 1 million messages per second!
When no off the shelf products worked for them they built their own. Scribe can be downloaded from Sourceforge. But the real action is on their wiki. It's here you'll find some decent documentation and their support forums. Not much activity on the site so you haven't missed your chance to be a charter member of the Scribe guild.
A logging system has three broad components:
It some ways it could be fancier. For example, there's no throttle on incoming connections so a server can chew up memory. And there is a max_msg_per_second throttle on message processing, but this is really to simple. Throttling needs to be adaptive based on local conditions and the conditions of down stream servers. Under load you want to push flow control back to the client so the data stays there until resources become available. Simple configuration file settings rarely work when the world starts getting weird.
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn't available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed file system, or send them to another layer of scribe servers.
I know, I thought the same thing. Thank God there's another IDL syntax. We simply did not have enough of them. Thrift translates this IDL into the glue code necessary for making cross-language calls (marshalling arguments and responses over the wire). The Thrift library also has templates for servers and clients.
enum ResultCode
{
OK,
TRY_LATER
}
struct LogEntry
{
1: string category,
2: string message
}
service scribe extends fb303.FacebookService
{
ResultCode Log(1: list messages);
}
$messages = array();
$entry = new LogEntry;
$entry->category = "buckettest";
$entry->message = "something very interesting happened";
$messages []= $entry;
$result = $conn->Log($messages);
MSG(msg) - a simple message. It only prints out msg. None of the other information is printed out.
NOTE(const char* name, const char* reason, const char* what, Module* module, msg) - something to take note of.
WARN(const char* name, const char* reason, const char* what, Module* module, msg) - a warning.
ERR(const char* name, const char* reason, const char* what, Module* module, msg) - an error occured.
CRIT(const char* name, const char* reason, const char* what, Module* module, msg) - a critical error occurred.
EMERG(const char* name, const char* reason, const char* what, Module* module, msg) - an emergency occurred.
Scribe is unique in that clients log entries consisting of two strings, a category and a message. The category is a high level description of the intended destination of the message and can have a specific configuration in the scribe server, which allows data stores to be moved by changing the scribe configuration instead of client code. The server also allows for configurations based on category prefix, and a default configuration that can insert the category name in the file path. Flexibility and extensibility is provided through the "store" abstraction. Stores are loaded dynamically based on a configuration file, and can be changed at runtime without stopping the server. Stores are implemented as a class hierarchy, and stores can contain other stores. This allows a user to chain features together in different orders and combinations by changing only the configuration.
The types of stores currently available are:
# BUCKETIZER TEST
<store>
category=buckettest
type=buffer
target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10
<primary>
type=bucket
num_buckets=6
bucket_subdir=bucket
bucket_type=key_hash
delimiter=1
<bucket>
type=file
fs_type=std
file_path=/tmp/scribetest
base_filename=buckettest
max_size=1000000
rotate_period=hourly
rotate_hour=0
rotate_minute=30
write_meta=yes
</bucket>
</primary>
<secondary>
type=file
fs_type=std
file_path=/tmp
base_filename=buckettest
max_size=30000
</secondary>
</store>