Treasure Data CDP Resources

  • Filter by Resource Type
  • Articles
  • Blog
  • Case Studies
  • Cheatsheets
  • Reports
  • Webinars
  • Filter by Industry
  • Automotive
  • CPG
  • Entertainment & Media
  • Financial Services
  • Healthcare
  • Retail
  • Technology
  • Travel & Hospitality
  • Filter by Topic
  • AI & Machine Learning
  • CDP
  • CDP Use Cases
  • Company News
  • Customer Data Strategy
  • Customer Service
  • Data Privacy & Security
  • Marketing
  • Partners
  • Treasure Data CDP

A Self-Study List for Data Engineers and Aspiring Data Architects

With the explosion of “Big Data” over the last few years, the need for people who know how to build and manage data-pipelines has grown.  Unfortunately, supply has not kept up with demand and there seems to be a shortage of engineers focused on the ingestion and management of data at scale.  Part of the ... A Self-Study List for Data Engineers and Aspiring Data Architects

Build a Simple Recommendation Engine with Hivemall and Minhash

This is a translation of this blog post, printed with permission from the author. In this post, I will introduce a technique called Minhash that is bundled in Treasure Data’s Hivemall machine learning library. Minhash is not usually thought of as a machine learning technique, but as you will see in this post, it’s quite ... Build a Simple Recommendation Engine with Hivemall and Minhash

What’s the difference between Amazon Redshift and Aurora?

As you plan your analytics and data architecture on AWS, you may get confused between Redshift and Aurora. Both are advertised to be scalable and performant. Both are supposedly better than incumbents. Both have optically inspired names. So, what’s the difference? In short, Redshift is OLAP whereas Aurora is OLTP. In this blog post, we’ll ... What’s the difference between Amazon Redshift and Aurora?

Graduate from Mixpanel: Funnel Analysis with SQL and R

This post is part one of a two part series. See part two here. What is Funnel Analysis? In a nutshell, funnel analysis allows you to follow a user through a series of self-defined events as well as, allowing you to calculate the given conversion rates between event to event. There are multiple ways and ... Graduate from Mixpanel: Funnel Analysis with SQL and R

Redshift is 400x Bigger than MySQL Yet MySQL is More Popular

The Amazon Redshift COPY Command Guide is now available! There are good reasons for the hype around Amazon Redshift. Redshift is blazing fast and not that much more expensive than MySQL or PostgreSQL, the traditional mainstay of data engineers. But is Amazon Redshift really becoming predominant in the world of analytic databases, taking over its ... Redshift is 400x Bigger than MySQL Yet MySQL is More Popular

Move your data – from MySQL to Amazon Redshift (in less time than it takes to ignore an index!)

Redshift, as you may already know, is quickly gaining broad acceptance, especially among consumers of free software like MySQL and PostgreSQL, for its “pay as you go” pricing model. However, the same pricing model can still make it a very expensive one. Not all queries need to be done against the Redshift instance itself, as ... Move your data – from MySQL to Amazon Redshift (in less time than it takes to ignore an index!)

Elasticsearch vs. Hadoop For Advanced Analytics

A Tale of Two Platforms Elasticsearch is a great tool for document indexing and powerful full text search. Its JSON based Domain Specific query Language (DSL) is simple and powerful, making it the defacto standard for search integration in any web app. But is it good as an analytics backend? Are we looking at a ... Elasticsearch vs. Hadoop For Advanced Analytics

5 Tips to Optimize Fluentd Performance

We’ve rewritten the Ruby supporting MessagePack, the highly efficient binary serialization format used internally. (MessagePack was invented by TD‘s co-founder Sadayuki Furuhashi)...

Making Magic with pandas-td

Magic functions enable common tasks by saving you typing. (NOTE:  Pandas itself doesn’t have magic functions; the IPython kernel does.) Magic functions are functions preceeded by a % symbol. Magic functions have been introduced into pandas-td version 0.8.0!  Toru Takahashi from Treasure Data walks us through. Treasure Data’s magic functions work by wrapping a separate ... Making Magic with pandas-td

Transform customer data into your most valuable business asset