data crunching

Notes Beforehand

I am a traditional SAS programmer, and only recently started to develop these python codes in an effort to branch into this new popular programming language. These programs target some of the most fundamental topics in empirical asset pricing, and should serve as a stepping stone for those who are relatively new to the programming world of empirical finance. Of course, these python code probably could use some improvement in coding efficiency, and I welcome any suggestions on that front.

If your institution is already a WRDS subscriber, you can find most of the codes below under the "Research Application" section of WRDS.

Data and Platform Setup

These codes are written under the WRDS data platform, where relevant empirical data (pricing, fundamental, estimates, etc) are called upon directly through the WRDS API. If your institution has proper WRDS data subscription, you can run the code directly under your Python environment.

Python Codes

Setting Up Sand Box

Connecting to WRDS

With the WRDS API, it is very straightforward to connect to WRDS and extract data through Python.

Market Anomalies and Risk Factors


Replicates the Jegadeesh and Titman (1993) momentum strategy, by buying the past winners and selling the past losers.

Fama French 3-Factor Model

This set of Python code replicates the Fama and French (1993) risk factors SMB and HML, in addition to the excess market risk factor. It utilizes CRSP data for pricing related items and Compustat data for fundamental data.

Characteristics-Based Benchmarks (DGTW)

This code is written to replicate the characteristics-based benchmarks proposed by Daniel, Grinblatt, Titman and Wermers (1997), hence the short form DGTW benchmarks.

Linking IBES and CRSP

Thomson Reuter's IBES database contains earnings and analyst forecasts related data, and researchers tend to link it with CRSP database for pricing related data to gauge market reaction to earnings related news. As these two databases do not have common native identifiers, this code aims to build a linkage between these two.

Post-Earnings Announcement Drift

Based on the original paper of Ball and Brown (1968) and follow up paper by Livnat and Mendenhall (2006), this code calculates the earnings surprises (SUE) relative to analysts' forecast, and reports the portfolio returns formed based on the SUE.

Institutional Ownership

Chen, Hong and Stein (2002) studies the relationship between institutional ownership breadth and underlying stock returns. This set of code replicates this exercise using Thomson Reuters 13F data.