Chinchilla 0.1.4 | Coderz Repository

chinchilla 0.1.4

Last updated:

0 purchases

chinchilla 0.1.4 Image
chinchilla 0.1.4 Images

Free

Languages

Categories

Add to Cart

Description:

chinchilla 0.1.4

chinchilla
chinchilla is a research toolkit designed to estimate scaling laws and train compute-optimal models for various deep learning tasks.



Expected Use Cases:



Researching the neural scaling law itself
Scaling compute for

Large Language Models (LLM)
Vision Transformers (ViT)
Reinforcement Learning (RL)
Embedding Models
Knowledge distillation


Evaluating compute efficiencies of new algorithms & architectures
Researching the neural scaling law itself




Probably Not For:




Fine-tuning tasks


Data-scarce domains






[!IMPORTANT]
This work builds upon the scaling law formulation proposed in the original Chinchilla paper by DeepMind (2022),
with some modifications detailed in ./docs/changes.md.

Features

Scaling Law Estimation: Fit a loss predictor based on multiple training runs.
Compute-Optimal Allocation: Train the best possible model within a given compute budget.
Progressive Scaling: Iteratively update the scaling law estimation and scale up the compute.
Simulation Mode: Test scaling law estimations in hypothetical scenarios.

Basics
Definitions

N: The number of parameters
D: The number of data samples
C: Total compute in FLOPs (C≈6 ND)
L(N, D)=E+A/Nα+B/Dβ: A loss predictor parameterized by E,A,B,α,β and C

Compute-Optimal Allocation

Optimize the parameters E,A,B,α,β to better predict losses Li from (Ni,Di)
Solve argminN, D L(N, D | C), which can be derived from A,B,α,β

chinchilla Procedure

seed: Sample X training runs (Ni,Di,Li), referred to as seeds
For i = 0 to K:

fit: Optimize the scaling law parameters to fit L(N, D) on the training runs
scale: Configure a new model with a scaled compute
Evaluate the allocation by training a model
append: Add the result to the database of training runs



Installation

[!WARNING]
chinchilla requires Python >= 3.8

From Source (Recommended for Customization)
git clone https://github.com/kyo-takano/chinchilla.git
cd chinchilla
pip install -e .

From PyPI
pip install -U chinchilla

Usage
Below is an example to get started with chinchilla.
import numpy as np
from chinchilla import Chinchilla

cc = Chinchilla(
"your_project__dir",
param_grid=dict(
E=np.linspace(1.1, 1.5, 5),
A=np.linspace(200, 1000, 5),
B=np.linspace(200, 1000, 5),
alpha=np.linspace(0.1, 0.5, 5),
beta=np.linspace(0.1, 0.5, 5),
),
seed_ranges=dict(C=(1e15, 1e16), N_to_D=(10, 100)),
# To search for the model configuration with N closest to suggested:
model_search_config=dict(
hyperparam_grid=dict(
hidden_size=list(range(64, 16384 + 1, 64)),
num_hidden_layers=list(range(1, 50 + 1)),
num_heads=list(range(1, 40 + 1)),
),
size_estimator=estimate_model_size, # You gotta define a function to estimate & return model size also
),
# Parameters you may pre-set
num_seeding_steps=100,
scaling_factor=2.0,
)


# Run the scaling law estimation and training process
for i in range(100 + 5):
# Sample a new model
(N, D), model_config = cc.step(num_seeding_steps=100)

# Define a model
model = YourModelClass(**model_config)

# Train & evaluate the allocation C => (N, D)
loss = train_and_evaluate(model, D)

# Finally, append the training run into the database
cc.append(N=N, D=D, loss=loss)

Ensure you define functionally equivalent versions of:

estimate_model_size: Estimates and returns the model size.
YourModelClass: Your model class definition.
train_and_evaluate: Function to train and evaluate your model.

Simulation
You can also visualize how chinchilla would perform under the given setup and a hypothetical scaling law, optionally with a noise term:
import random

cc.simulate(
num_seeding_steps=401,
num_scaling_steps=1,
scaling_factor=10.0,
target_params=dict(
E=1.69337368,
A=406.401018,
B=410.722827,
alpha=0.33917084,
beta=0.2849083
),
# Add exponentially distributed loss averaging at 0.1
noise_generator=(random.expovariate, (10,))
)

Please see API Reference for more.
Examples
Find a practical application of chinchilla in the examples directory (more to come):

Training Compute-Optimal Rubik's Cube Solvers (100 PetaFLOPs)

Documentation
For a detailed API Reference, tips, differences from the original Chinchilla paper, etc., please browse to ./docs.
Contributing
We welcome your contributions.
Please report bugs and suggest improvements through new issues and pull requests.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product: (if this is empty don't purchase this product)

Customer Reviews

There are no reviews.