Open-Source Contributions


Open source is in our DNA. Check out a list of open-source projects invented by Treasure Data engineers and projects we contribute to.

Open-source pioneers

We know first-hand that open source makes software more accessible, developers more connected, and the world a little smaller. We believe in the open-source community, invest in moving projects forward, and welcome your contributions to our projects.  

Over the last 13 years, we’ve invented many key software components for modern data stacks. Treasure Data engineers pioneered and subsequently open-sourced these innovations, while including them in our customer data platform (CDP).  

Highlights include an open data protocol adopted by Apple and thousands of other companies, creating one of the world’s largest Hadoop user communities, and bringing Linux to the Fortune 100.

Open-source projects: Invented by Treasure Data engineers and hosted in our CDP

Fluentd

With more than 600 plugins and native support for Docker and Kubernetes, Fluentd is one of the most popular tools for log management.

Learn more

Embulk

An open-source data loader to move massive data across storage systems, software services, file formats, and data centers.

Learn more

Apache Hivemall

Democratizes machine learning by bringing cutting-edge ML algorithms to the fingertips of SQL analysts.

Learn more

MessagePack

An efficient binary serialization format. It lets you exchange data among multiple languages like JSON, but it’s faster and smaller.

Learn more

Digdag

A simple tool that helps you to build, run, schedule, and monitor complex pipelines of tasks. It handles dependency resolution so that tasks run in series or in parallel.

Learn more

Fluent Bit

An open-source data collector that collects data from different sources, unifies it and sends it to multiple destinations.

Learn more

wvlet

Wvlet is a cross-SQL flow-style query language for functional data modeling and interactive data analysis.

Learn More

pyenv

Lets you easily switch between multiple versions of Python.

Learn more

snappy-java

A Java port of the snappy, a fast C++ compresser/decompresser developed by Google.

Learn more

httpclient

A Ruby HTTP client library.

Learn more

Projects We Contribute To

Docker

Docker

Contributed the integration of Logging Driver for FluentD

Kubernetes

Kubernetes

Contributed the integration of logging capabilities with Fluentd

Ruby

Ruby

Several of our engineers are committers

Ruby on Rails

Ruby on Rails

Various contributions, including MySQL support

Apache Spark

Apache Spark

Various contributions, including MLLib and Spark SQL

Trino

Trino

Many bug fixes and organization of conferences and events for Presto and Trino.

Keras

Keras

Various contributions, including pretrained frontier Deep Learning model, core tensor operations, weight conversion and benchmarking scripts.

Apache Hive

Apache Hive

Various contributions, including Apache Iceberg integration, logical and physical optimizers (CTE optimization or SharedWorkOptimizer), decision-making regarding language specifications.

RecTools

Implemented new models such as ImplicitBPRWrapper, support for using hardware acceleration (GPUs), partial fitting on datasets for LightFM, minor bug fixes and Python version support.

Discover the capabilities and technologies powering Treasure Data CDP