Math formula

Cerbo AI

Your AI Your Control

Math formulaMath formula







Proof of Federated Learning (PoFL):

Participation is All You Need



Cerbo AI Aug 29, 2024

Abstract

This paper introduces Proof of Federated Learning (PoFL), a blockchain protocol designed to measure and reward participants in federated learning. Unlike traditional mechanisms that rely heavily on computational power, PoFL evaluates the quality of data, model contributions, and validation accuracy. This protocol integrates participant data and activity evaluations with a dynamic scoring system, verified on-chain, which can be used as a sybil resistance and incentive distribution mechanism to foster a fair economy around federated learning.

1. Introduction

PoFL is a new protocol aimed at measuring the quality and impact of contributions in federated learning. Its purpose is to motivate individuals and organizations to actively participate in federated learning. This engagement (1) enhances model performance through diverse data contributions and (2) ensures data privacy and security, allowing participants to maintain full ownership of their data.

2. Federated Learning Network

The motivation behind PoFL is to provide a new model of data ownership and sustainable incentivization/engagement for federated learning. Traditional centralized learning models often compromise data privacy and security. PoFL aims to decentralize the learning process while providing robust incentives for high-quality contributions.

3. Proof of Federated Learning (PoFL)

We start by describing the concept of PoFL via its core components. See the below diagram for a top-level view of the PoFL ecosystem.

Math formula

Participation Metrics: A set of metrics to determine how participants interact with the federated learning platform in ways that can be quantified and rewarded. These include Data Quality Score, Model Contribution Score, and Validation Accuracy Score.

Data Evaluation: Process engagement data, verify its authenticity, and calculate data engagement scores.

PoFL Score: Combines Data Score, Model Contribution Score, and Validation Accuracy Score. The PoFL score is proportional to the amount of newly minted rewards allocated to the participant.


3.1 Participation Metrics

3.1.1 Data Quality Score

This metric evaluates the quality of data provided by participants. High-quality, diverse data that significantly contributes to model improvement receives a higher score.


3.1.2 Model Contribution Score

This metric assesses the impact of each participant's model updates on the overall global model performance. Significant improvements result in higher scores.


3.1.3 Validation Accuracy Score

This metric evaluates the accuracy and reliability of the model validation performed by the participant. Accurate and consistent validations result in higher scores.


3.2 Participation Evaluation

A detailed explanation of how participation data is processed, the authenticity verified, and participation scores calculated.


3.2.1 Data Score

The data score is determined by the quality and diversity of data provided by participants. For example, data from different demographic groups that significantly improves model generalization will receive a higher score.

PoFL-Score_data = ∑ (α × Quality_data - β × Error_data)



3.2.2 Model Contribution Score

The model contribution score evaluates the impact of each participant's model updates. This score is based on the improvement or degradation of the global model's performance due to the participant's contribution.


PoFL-Score_model = ∑ (γ × Improvement_global - δ × Degradation_global)




3.2.3 Validation Accuracy Score

The validation accuracy score assesses the reliability of model validations performed by the participant.

PoFL-Score_validation = ∑ (η × Correct_validation - θ × Incorrect_validation)




4. Construct PoFL Aggregate Score

The aggregate PoFL score is constructed by aggregating the data score, model contribution score, and validation accuracy score.

PoFL-Score=∑ (Weight_stake × PoFL-Score_data)

      + PoFL-Score_model + PoFL-Score_validation



where the sum of the stake weights equals 1.




4.1 Dynamic Weight Allocation in the PoFL Score

The aggregate PoFL score strategically allocates weights between the data score, model contribution score, and validation accuracy score, emphasizing stake-weighted contributions. The approach to stake weighting evolves through two distinct phases:

Bootstrap Phase: Initially employs fixed weights to establish the system and facilitate participant onboarding.

Mature Phase: Implements variable stake weights that adjust based on participant engagement and system growth, reflecting a more dynamic and responsive scoring mechanism as the network matures.


4.2 Access Control and Role-Based Authorization

Access control is essential in the PoFL ecosystem to determine ”who is allowed to do what” within the system. Given the decentralized nature of the protocol, ensuring that only authorized entities can perform specific actions is crucial for maintaining the integrity and security of the network.


4.2.1 Ownership and Role-Based Access Control (RBAC)

While simple ownership models—where a designated owner account has administrative privileges—are effective for basic systems, PoFL requires a more nuanced approach. Role-Based Access Control (RBAC) offers the flexibility needed to define multiple roles within the system, each with distinct permissions. This ensures that the system is not overly reliant on a single entity, reducing the risk of centralization.

Roles in PoFL:

Uploader:Responsible for uploading data to the blockchain system for model training.

Verifier:Validates data and model contributions to ensure they meet quality standards.

Challenger:Challenges the contributions of Uploaders if they suspect data quality issues, conducting independent verification.

Finalizer:Makes the final decision in disputes arising from challenges, determining the outcome of such conflicts.

Role Granting and Revocation:Instead of relying on an Owner role to manage these roles—which poses risks of centralization and security vulnerabilities—the PoFL protocol employs a decentralized governance model for role management. This is facilitated through the voting governance system, as detailed in the next section.


4.3 Voting Governance in PoFL

The Cerbo Chain Voting Governance system underpins the decentralized management of roles and other protocol parameters. This system ensures that the community, rather than any single entity, controls critical decisions within the PoFL ecosystem.


4.3.1 Proposal Submission and Deposit Phase

Any community member can submit a proposal for role changes or other protocol modifications. To prevent spam, proposals must receive a deposit of at least 512 CBO within two weeks of submission. This deposit acts as a filter to ensure that only serious proposals move forward.


4.3.2 Voting Phase

Once a proposal reaches the deposit threshold, it enters a two-week voting phase. During this phase, all token stakers in the network can vote on the proposal. The voting options include:

Yes:Support the proposal.

No:Oppose the proposal.

NoWithVeto:Strongly oppose, with a veto power that could prevent the proposal from being considered in the future.

Abstain:Neutral vote, expressing neither support nor opposition.

Delegated Voting:Validators can vote on behalf of the delegators who have staked tokens with them. However, if a delegator votes personally, their vote overrides that of the validator.


5. Protocol Incentives and Governance Integration


With the addition of access control and governance mechanisms, the PoFL protocol's incentive distribution can also be refined. Role management via decentralized voting ensures that incentives are fairly allocated based on the participants' roles and contributions. The protocol's monetary policy (Section 5.1) and rewards distribution (Section 5.2) are closely tied to these governance processes, ensuring alignment between participant actions and the overall goals of the PoFL ecosystem.


5.1 PoFL Monetary Policy

At each epoch, the protocol issues rewards in the form of tokens. The amount can be fixed according to an issuance schedule or it can vary sublinearly with the aggregate data and activity scores.


5.2 PoFL Rewards Distribution

The goal is to direct issuance rewards to the most valuable actions for the network. Rewards are distributed proportionally to the PoFL score, incentivizing high-quality data provision, impactful model contributions, and accurate validations.


5.3 Protocol Economics

In the initial bootstrap phase, the focus is on increasing data volume and distribution. The protocol earns no revenue; users only pay for the minimal blockchain resources used to calculate and store their PoFL score and distribute associated incentives.

In the mature phase, the protocol can start earning revenue via direct fees or fee burn. Moreover, a wide array of AI services and data markets can be developed once the PoFL economy is at scale.


5.4 Governance and Incentive Alignment

The decentralized governance model not only manages roles but also directly impacts how rewards are distributed. For example, if a proposal to adjust the weighting of the PoFL score components (data quality, model contribution, validation accuracy) is passed, it will immediately influence the incentives participants receive. This dynamic ensures that the protocol remains responsive to the needs and contributions of its participants over time.

Conclusion

Proof of Federated Learning (PoFL) offers a novel approach to incentivizing and rewarding participants in federated learning. By combining blockchain technology, smart contracts, and a dynamic scoring system, PoFL ensures data privacy, security, and fair incentivization, fostering a robust and collaborative environment for federated learning.

Back to the top