Open-Source Contributions
Open source is in our DNA. Check out a list of open-source projects invented by Treasure Data engineers and projects we contribute to.
Open-source pioneers
We know first-hand that open source makes software more accessible, developers more connected, and the world a little smaller. We believe in the open-source community, invest in moving projects forward, and welcome your contributions to our projects.Over the last 13 years, we’ve invented many key software components for modern data stacks. Treasure Data engineers pioneered and subsequently open-sourced these innovations, while including them in our customer data platform (CDP).
Highlights include an open data protocol adopted by Apple and thousands of other companies, creating one of the world’s largest Hadoop user communities, and bringing Linux to the Fortune 100.

Open-source projects: Invented by Treasure Data engineers and hosted in our CDP
Fluentd
With more than 600 plugins and native support for Docker and Kubernetes, Fluentd is one of the most popular tools for log management.
Learn moreEmbulk
An open-source data loader to move massive data across storage systems, software services, file formats, and data centers.
Learn moreApache Hivemall
Democratizes machine learning by bringing cutting-edge ML algorithms to the fingertips of SQL analysts.
Learn moreMessagePack
An efficient binary serialization format. It lets you exchange data among multiple languages like JSON, but it’s faster and smaller.
Learn moreDigdag
A simple tool that helps you to build, run, schedule, and monitor complex pipelines of tasks. It handles dependency resolution so that tasks run in series or in parallel.
Learn moreFluent Bit
An open-source data collector that collects data from different sources, unifies it and sends it to multiple destinations.
Learn morewvlet
Wvlet is a cross-SQL flow-style query language for functional data modeling and interactive data analysis.
Learn Moresnappy-java
A Java port of the snappy, a fast C++ compresser/decompresser developed by Google.
Learn moreProjects We Contribute To
Docker
Contributed the integration of Logging Driver for FluentD
Kubernetes
Contributed the integration of logging capabilities with Fluentd
Ruby
Several of our engineers are committers
Ruby on Rails
Various contributions, including MySQL support
Apache Spark
Various contributions, including MLLib and Spark SQL
Trino
Many bug fixes and organization of conferences and events for Presto and Trino.
Keras
Various contributions, including pretrained frontier Deep Learning model, core tensor operations, weight conversion and benchmarking scripts.
Apache Hive
Various contributions, including Apache Iceberg integration, logical and physical optimizers (CTE optimization or SharedWorkOptimizer), decision-making regarding language specifications.
RecTools
Implemented new models such as ImplicitBPRWrapper, support for using hardware acceleration (GPUs), partial fitting on datasets for LightFM, minor bug fixes and Python version support.