Treasure Data CDP Resources

Featured Resource

Reports

Treasure Data Named a Leader by Forrester

Get complimentary access to The Forrester Wave™: Customer Data Platforms For B2C, Q3 2024. Treasure Data was named a Leader by Forrester.

Read post

12 Open Source Software Innovations from Treasure Data Engineers

TD is proud to have some of the best technical minds in the world working on our unique managed service. When they’re not working on the TD Service or supporting our customers, many of our engineers continue to support technological innovation by...

Amazon Recommends Fluentd as “Best Practice for Data Collection” over Flume and Scribe

This month, Parviz Deyham from Amazon Web Service promoted as the best data collection tool for Amazon Elastic MapReduce (EMR), a hosted Hadoop framework running on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)...

Treasure Data’s Plazma: Columnar Cloud Storage

TD has been developed by Hadoop experts. We get Hadoop, and, in many ways, it’s part of our core. As we have built out the platform, we noticed that the storage layer needs to be multi-tenant, elastic, and easy to manage while keeping the scalability...

Fluentd + Hadoop: Instant Big Data Collection

Many companies choose Hadoop Distributed Filesystem (HDFS) for big data storage. Until recently, however, the only API interface was Java. This changed with the new WebHDFS interface, which allows users to interact with HDFS via...

Understanding the Book-Crossing Dataset: Setup

I'm a data scientist at TD. In a series of blog entries, I want to introduce how to use our platform by interacting with a concrete dataset. I chose the publicly available Book-Crossing Dataset as our base data...

Log Everything as JSON. Make Your Life Easier

The Story of an Engineer. Here is an anecdote. I am sure some of you have had a similar experience.

Enabling Facebook’s Log Infrastructure with Fluentd

Facebook uses Scribe as its core log aggregation service. The description of Github reads, “Scribe is a server for aggregating log data streamed in real time from a large number of servers.”..

Real-Time Log Collection with Fluentd and MongoDB

For those of you who do not know what MongoDB is, it is an open-source, document-oriented database developed at 10gen, Inc. It is schema-free and uses a JSON-like format to manage semi-structured data...

Fluentd: The Missing Log Collector Software

The fundamental problem with logs is that they are usually stored in files although they are best represented as streams (by Adam Wiggins, CTO at Heroku). Traditionally, they have been dumped into text-based files and collected by rsync in...

1 … 3 4 5 6