Predicting Interest Rates: BERT Sequence Classification of Federal Reserve Corpora

ABSTRACT—This paper investigates the extraction of monetary policy sentiment from Federal Reserve corpora in order to predict the direction of the Federal Funds Rate. Specifically, it analyzes minutes, statements, speeches, and testimonies delivered by Federal Reserve board chairpersons since 1980, sourced directly from the Federal Reserve's public archives. The raw text is preprocessed by segmenting documents into 200-word chunks grouped by speaker, and each segment is annotated for sentiment using the Loughran-McDonald Dictionary of Financial Terms — a domain-specific lexicon designed for financial and economic texts that outperforms general-purpose sentiment tools on this type of corpus. A base BERT (Bidirectional Encoder Representations from Transformers) model is then fine-tuned on the annotated dataset using a sequence classification head, and training and validation losses are recorded across multiple cross-validation folds to estimate predictive accuracy. The results underscore both the promise and the practical limitations of applying large language models to this domain: the bidirectional contextual representations learned by BERT capture nuances in Fed communication that elude simpler bag-of-words or term-frequency approaches, but the relatively limited size of the labeled corpus — even spanning four decades of Fed communications — constrains the depth of fine-tuning achievable within compute constraints. The paper concludes that additional data sourcing, more thorough preprocessing, and the adoption of domain-specific pre-trained models (such as FinBERT) represent the most productive paths forward toward a deployable interest rate direction classifier suitable for integration with systematic trading strategies.

KEYWORDSFederal Reserve, Federal Funds Rate, Interest Rate, Prediction, Sequence Classification, Bidirectional Encoders, Transformers

I. INTRODUCTION
The Federal Open Market Committee (FOMC) meets eight times per year to review economic and financial conditions, assess the risks to its long-run goals of price stability and maximum sustainable employment, and communicate its monetary policy decisions to the public. Its primary instrument — the Federal Funds Rate — is one of the most consequential variables in global finance, influencing everything from domestic mortgage rates and corporate borrowing costs to sovereign debt yields and currency exchange rates in emerging markets. Even subtle shifts in the language of Federal Reserve communications — whether in official post-meeting statements, the detailed minutes of deliberations, prepared speeches by board members, or formal testimonies before Congress — can move financial markets significantly, as participants attempt to infer the trajectory of future rate decisions from tone, emphasis, and thematic content.

The challenge of extracting a reliable directional signal from this volume of text is considerable. Federal Reserve communications are carefully drafted to balance transparency with ambiguity, and their interpretation requires not only a grasp of financial terminology but also a sensitivity to the shift in rhetorical register that accompanies changing macroeconomic conditions — the difference, for instance, between language characteristic of a tightening cycle and that typical of accommodative policy stances. Traditional keyword-based approaches and simple bag-of-words sentiment models lack the contextual understanding necessary to capture these distinctions reliably.

The Federal Funds Rate can be modeled as a latent variable within a natural language processing framework: its direction at any future FOMC meeting is not explicitly stated in the preceding communications, but is consistently implied through the cumulative sentiment, thematic emphasis, and rhetorical framing of the textual record. Extracting this latent signal — and using it to build a classifier capable of predicting whether the rate will increase, decrease, or remain unchanged — is the central objective of this research.

A. TRANSFORMERS
Transformers are a Deep Learning innovation that builds beyond recurrent neural networks with the ultimate goal of reducing processing times of even larger datasets, with equal or higher accuracy [1]. Gated RNN’s were the most sophisticated model before the introduction of transformers, require that the text tokens be processed sequentially, which greatly reduces the ability to parallelize the task. In the case of a transformer, a encoder-decoder architecture is utilized in order to enlarge the scope of data analysis and allow for bidirectional processing without the need to account for the beginning and end of a token [1].

The transformer model is structured as one large matrix calculation as follows,

\begin{equation} \boxed{Attn(Q,K,V)=softmax_{layer}\frac{QK_{T}}{\sqrt[2]{d_{k}}}V} \end{equation}
where \(Q,K,V\) are the vectors the of the \(i^{th}\) rows of the tokens fed into the model.

In the case of BERT in particular, the innovation is bidirectional training, or the encoder-decoder architecture mentioned above. Similarly to Next Sentence Classification, the classification task modeled in this research is performed by adding a classification layer on the transformer output for the [CLS] tokens [4].

II. DATA
The data was sectioned in 200-word segments in order to ease processing and was grouped by speaker. The main speakers chosen were the chairpersons of the Federal Reserve, while all other speaker content was dropped from the data. Sentiment was added to each of the word segments using the Loughran-McDonald Dictionary of Financial Terms to identify the general stance towards interest rates (increase, decrease or no change) [3].

III. RESULTS & DISCUSSION
BERT was deployed on the preprocessed and annotated Federal Reserve corpus to evaluate sequence classification performance. Fig. 1 shows the training and validation loss curves across three cross-validation folds. The model exhibits the expected pattern for a small-data fine-tuning regime: training loss decreases steadily across epochs, while validation loss reduction is more modest and inconsistent, indicating that the model is operating near the boundary of the available data's capacity to generalize.

The computational demands of the BERT fine-tuning process proved to be a practical constraint: past the third fold, the model's memory footprint exceeded the available GPU capacity within the Google Colaboratory environment, causing repeated runtime crashes that prevented full cross-validation. This is a common limitation when fine-tuning large transformer models on consumer-grade GPU hardware, and underscores the importance of either working with a lighter-weight model architecture (such as DistilBERT or a smaller BERT variant), utilizing Google's TPU compute allocation, or migrating to a cloud-based training infrastructure with access to larger memory allocations.

The primary takeaway from these results is structural rather than numerical: the classification accuracy achievable with the current data volume is fundamentally constrained by the size of the labeled corpus, not by the model architecture. Federal Reserve communications, while spanning four decades, amount to a relatively modest volume of text once chunked into 200-word segments and filtered to chairperson-only content. This makes a strong case for pursuing domain-adapted pre-trained models as the most computationally efficient path toward improved accuracy, since they arrive with substantial prior knowledge about financial language and require far less fine-tuning data to achieve competitive performance on downstream classification tasks.

IV. NEXT STEPS
The clearest finding of the analysis is that the quality and quantity of labeled training data is the primary constraint on model performance — a familiar challenge in applied NLP, and particularly acute in this domain where the labeling process requires not just raw text but annotated directional rate outcomes associated with each document segment. Several concrete paths forward present themselves.

Data augmentation and expanded sourcing. The current corpus is limited to speeches and statements by FOMC chairpersons, which, while authoritative, represents only a fraction of the textual signal that market participants interpret. Expanding the corpus to include regional Federal Reserve bank president speeches, FOMC voting member dissent records, and the full text of Beige Book regional economic reports would substantially enrich the training data. Additionally, experimenting with shorter and longer chunking intervals — beyond the current 200-word segments — could improve the signal-to-noise ratio depending on how the rate direction label is associated with document sections.

Pre-trained domain-adapted models. Fine-tuning a general-purpose BERT model on a limited corpus is inherently limited by the model’s lack of prior exposure to financial and macroeconomic language. A more promising approach is to initialize from a domain-adapted pre-trained model — such as FinBERT, which has been pre-trained on large financial corpora — and fine-tune it on the Fed-specific dataset. This transfers domain knowledge while requiring far less task-specific data to achieve competitive performance.

Alternative architectures and ensemble methods. Before committing to BERT as the sole modeling approach, it is worth benchmarking a range of alternative architectures — including LSTM-based sequence classifiers, gradient-boosted models on hand-crafted TF-IDF features, and topic model-derived features from Latent Dirichlet Allocation — to establish a clear performance baseline and identify where deep contextual representations genuinely add value over simpler approaches.

Trading strategy integration. Ultimately, the most compelling validation of any Fed sentiment classifier is its performance as a signal in an executable trading strategy. One natural application is to use the classified rate direction as a conditioning signal for a mean-reversion or moving-average strategy on interest-rate sensitive instruments — Treasury futures, interest rate swaps, or rate-sensitive equity sectors — backtesting the strategy across historical FOMC cycles to assess whether the NLP signal adds alpha beyond what is available from market-implied rate expectations alone.

ACKNOWLEDGEMENTS
I would like to thank professor Feinstein for his invaluable guidance through his teachings in FE 690: Machine Learning in Finance, and his continued support in identifying the pain points and potential developments in the project proposal; The Stevens Institute of Technology Department of Financial Engineering for its provision of laptops and technological equipment to undertake this project; The Google Collaboratory for providing a streamlined platform for the testing and execution of the models; and lastly, my colleagues at the Nomura Research Institute for providing me with helpful tips and guidance on how to preprocess the data in a sound way and execute the model in this research.

REFERENCES

  • [1] Jacob Devlin, Ming-Wei Chang. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Accessed October 25th, 2020. Online.
  • [2] Transformers. HuggingFace Documentation. Accessed October 20th, 2020. Online.
  • [3] Takahashi, Yuki. Analyze Central Bank Announcements. Nomura Research Institute. Accessed October 20th, 2020. Online.
  • [4] Horev, Rani. BERT Explained: State-of-the-art language model for NLP. Accessed October 28th 2020. Online.


DOCUMENTS

This browser does not support PDFs. Download paper: Federal Funds Rate Prediction: BERT Sequence Classification on Fed Corpora


The Github repository can be found here.
All requests for copies of the research, please forward to my university email address listed on my homepage.