European Short Position Data

policy Background

Following the global financial crisis of 2008, European Securities and Markets Authority (ESMA) launched policy EU N236/2012 on short selling and certain aspects of credit default swaps. The purpose of this EU-wide reporting regulation is to increase market stability by reducing the opaqueness of market activities through mandatory daily reporting by institutional investors. 

Therefore, each participating country's financial monitoring agency requires market participants with the significant short positions on qualifying securities to self-report the positions, and further makes such information available to the public. 

technical background

As each government agency's website carries the data differently, and often the data scraping exercise involves mimicking the "clicking" action of a mouse, the package Selenium suits the need perfectly. 

However, I would like to emphasize here that this code needs human monitoring over time, as government agency's website can change layout or the presented data might show different format. So please do not rely on the code output blindly (more likely it will crash somewhere before getting to the end!).

Our student intern Shuai Zheng contributed to this code.

This code was revamped significantly to reflect the selenium package update as well as changes in the host countries' websites.

Update: May 2024


0. Import packages

1. Define functions and set up Selenium driver

Selenium driver set up is slightly tricky. Normally one can use "pip install webdriver-manager" to automatically install the latest version of the chrome driver. 

However sometimes when chrome driver is not updated as fast as chrome browser, then the automatic driver initiation will yield error. 

Instead user needs to manually download the latest chromedriver from, and then manually assign the driver to the location of the chromedriver.exe from the unzipped download. 

2. Scrape each country's agency site

2.1 Austria

2.2 Belgium

2.3 Czech Republic

2.4 Denmark

2.5 Finland

2.6 France

2.7 Germany

2.8 Greece

2.9 Hungary

2.10 Iceland

2.11 Ireland

2.12 Italy

2.13 Luxembourg

2.14 Netherlands

2.15 Poland

2.16 Spain

2.17 Sweden

2.18 UK

2.19 Norway

Norwegian Finanstilsynet presents the data in the json format through its API portal after the most recent website revamp, therefore the code does not resemble the other countries' download process.

The field activePositions is a list of dictionary items that report detail information on short sellers and the respective holdings, and one needs to further open up this field to have a long table that shows each item as a separate row. 

While most records show only one item in the activePositions field (i.e. one position holder's record), there are many cases where more than one position holder's records are presented in the activePositions field, organized in a dictionary. And note that the variable shortPercent actually represents the aggregated shortPercent reported by each position holder in that row. 

For instance, below is the content of one activePositions:

[{'date': '2023-03-03',

  'shortPercent': 0.68,

  'shares': 153129,

  'positionHolder': 'Voleon Capital Management LP'},

 {'date': '2023-03-07',

  'shortPercent': 0.89,

  'shares': 198486,

  'positionHolder': 'WorldQuant LLC'}]

As some of the records report a complete empty activePositions field, I first remove these records from the entire sample and then parse out the position holder information from this field, and then create a new dataframe that contains all the relevant fields.

Last step is to remove the redundant rows from longDf: the reason is that the json data reports aggregate value by date per row, with each row possibly containing more than one position_holder.

When at least one of the position holders change position, it will lead to update in the aggregate short position, even though the rest of the position holders in that row might not report any changes. With that background in mind, when expanding the position_holders one by one, the ones that do not change positions will get repopulated day after day and therefore appear in duplicate records.

And the final output for Norway looks like the following:

Done with all countries raw data scraping