Treasure Data CDP Resources

Featured Resource

Reports

Treasure Data Named a Leader by Forrester

Get complimentary access to The Forrester Wave™: Customer Data Platforms For B2C, Q3 2024. Treasure Data was named a Leader by Forrester.

Read post

5 Tips to Optimize Fluentd Performance

We’ve rewritten the Ruby supporting MessagePack, the highly efficient binary serialization format used internally. (MessagePack was invented by TD‘s co-founder Sadayuki Furuhashi)...

Collecting All Docker Logs with Fluentd

Just in case you have been offline for the last two years, Docker is an open platform for distributed apps for developers and sysadmins. By turning your software into containers, Docker lets cross-functional teams ship and run apps across platforms seamlessly...

Data Science 101: Interactive Analysis with Jupyter, Pandas and Treasure Data

TD gives you a cloud-based analytics infrastructure accessible via SQL. Our interactive engines like Presto give you the power to crunch billions of records with ease. As a data scientist, you’ll still need to learn how to write basic SQL queries...

Python 101 for Aspiring Data Nerds

As a data scientist, or anyone interested in collecting data for that matter, it’s no doubt helpful to know about how to go about collecting the data in your app – data that you’ll want to later query and analyze. Here, we’ll build an app in Python from A-Z, iterate on it to make it ... Python 101 for Aspiring Data Nerds

New UDFs in Presto: currency conversion and geocoding tools

Today, we introduced three new UDFs (user-defined functions) to to TD‘s Presto offering. They are...

Four Reasons Presto is the Best SQL-on-Hadoop (That You Haven’t Heard Of)

Presto is an in-memory distributed SQL query engine developed by Facebook that has been open-sourced since November 2013. Presto has a number of key advantages over other SQL-on-Hadoop engines, yet these benefits are not widely recognized or understood. Reason #1: Presto is Plenty Fast Unlike MapReduce, which was designed for very high throughput at the ... Four Reasons Presto is the Best SQL-on-Hadoop (That You Haven’t Heard Of)

Eliminating Schema Rot in MPP Databases Like Redshift

The MPP database is an incredible piece of technology. These databases run large-scale analytic queries very quickly, making them great tools for iterative data exploration. With a cloud offering like Redshift in the market, MPP databases are enjoying increasing adoption today outside of enterprise IT. However, like any other great technology, they excel in some ... Eliminating Schema Rot in MPP Databases Like Redshift

Managing the Data Pipeline with Git + Luigi

One of the common pains of managing data, especially for larger companies, is that a lot of data gets dirty (which you may or may not even notice!) and becomes scattered around everywhere. Many ad hoc scripts are running in different places, these scripts silently generate dirty data. Further, if and when a script results ... Managing the Data Pipeline with Git + Luigi

Learn SQL by Calculating Customer Lifetime Value Part 2: GROUP BY and JOIN

This is the second installment of our SQL tutorial blog series. In the first part, we set up the data source with SQLite and learned how to filter and sort data. This time, we will learn two other key concepts in SQL: GROUP BY and JOIN. Get the FREE e-book based on this blog series! ... Learn SQL by Calculating Customer Lifetime Value Part 2: GROUP BY and JOIN

1 2 3 4 5 6