Millions of leaked documents and the biggest journalism partnership in history have uncovered financial secrets of 35 current and former world leaders, more than 330 politicians and public officials in 91 countries and territories, and a global lineup of fugitives, con artists and murderers.
The leaked records reveal that many of the power players who could help bring an end to the offshore system instead benefit from it – stashing assets in covert companies and trusts while their governments do little to slow a global stream of illicit money that enriches criminals and impoverishes nations.
Let’s take a deep dive into what exactly these documents are and how the International Consortium of Investigative Journalists (ICIJ) did it with the help of technology!
What are the Pandora Papers?
The Pandora Papers’s 11.9 million records arrived from 14 different offshore services firms; a 2.94 terabyte data trove exposes the offshore secrets of wealthy elites from more than 200 countries and territories.
It contains data for 330 politicians and public officials, from more than 90 countries and territories, including 35 current and former country leaders, as well as celebrities, fraudsters, drug dealers, royal family members and leaders of religious groups around the world.
It involved more than 600 journalists from 150 media outlets in 117 countries.
It took ICIJ more than a year to structure, research and analyze the data, which will be incorporated into the Offshore Leaks database: The task involved three main elements: journalists, technology and time.
Still, ICIJ estimates that they have only a small fraction of the universe of provider data, as the 14 providers, which offered services in at least 38 jurisdictions, are part of a larger industry of offshore services operating around the world.
What is ICIJ?
The International Consortium of Investigative Journalists is a U.S.-based nonprofit newsroom, fully funded by donations, with its own reporting team, as well as a global network of reporters and media organizations who work together to investigate the most important stories in the world.
Its network of trusted members encompasses 280 of the best investigative reporters from more than 100 countries and territories and they also partner with more than 100 media organizations, from the world’s most renowned outlets, including the BBC, the New York Times, the Guardian and the Asahi Shimbun, to small regional nonprofit investigative centers.
In addition to the U.S. staff, they have team members in Australia, France, Spain, Hungary, Serbia, Belgium and Ireland and they provide the tools and guidance needed to successfully pull off unprecedented reporting collaborations.
What data did ICIJ have and how it differed from the Panama papers?
The Pandora Papers information derives from 2.94 terabytes of data in more than 11.9 million records and comes from 14 providers that offer services in at least 38 jurisdictions. In comparison, the famous 2016 Panama Papers investigation was based on almost the same volume of documents, but all came from one and single provider, the now-defunct Mossack Fonseca law firm.
The new challenge with the 11.9 million-plus Pandora records was that they were largely unstructured. More than half of the files (6.4 million) were text documents, including more than 4 million PDFs, some of which ran to more than 10,000-pages. The documents included passports, bank statements, tax declarations, company incorporation records, real estate contracts and due diligence questionnaires. There were also more than 4.1 million images and emails in the leak.
Spreadsheets made up 4% of the documents, or more than 467,000. The records also included slide shows and audio and video files.
How were the data processed?
To explore and analyze the information in the Pandora Papers, ICIJ identified files that contained beneficial ownership information by company and jurisdiction and structured it accordingly. Each provider’s data required a different process.
In cases where information came in spreadsheet form, ICIJ removed duplicates and combined it into a master spreadsheet. For PDF or document files, ICIJ used programming languages such as Python to automate data extraction and structuring as much as possible.
In more complex cases, ICIJ used machine learning and other tools, including the Fonduer and Scikit-learn softwares, to identify and separate specific forms from longer documents.
Some provider forms were handwritten, requiring ICIJ to extract information manually.
Once information was extracted and structured, ICIJ generated lists that linked beneficial owners to the companies they owned in specific jurisdictions. In some cases, information about where or when a company was registered wasn’t available. In others, information was missing about when a person or an entity had become the owner of the company, among other details.
After structuring the data, ICIJ used graphic platforms (Neo4J and Linkurious) to generate visualizations and make them searchable. This allowed reporters to explore connections between people and companies across providers.
To identify potential story subjects in the data, ICIJ matched information in the leak against other data sets: sanctions lists, previous leaks, public corporate records, media lists of billionaires and public lists of political leaders.
ICIJ’s partner in Sweden, SVT, generated spreadsheets containing data extracted from passports found in the Pandora Papers.
ICIJ shared records with media partners using Datashare, a secure research and analytical tool developed by ICIJ’s technical team. Datashare’s batch-search function helped reporters match some public figures with the data.
How were the data organised and investigated?
Having identified documents that contained information on the owners of offshore entities and structured the information by provider, ICIJ unified the data in a centralized database.
This provided ICIJ and its media partners with a unique data set of beneficial owners of companies in secrecy jurisdictions.
ICIJ eliminated duplications in the data and identified key elements, such as nationality of the owner, country of residence and place of birth.
ICIJ and its media partners used keyword searches to identify politicians in the data, using passport information to help with the identification. ICIJ used public records to verify details related to the companies and to be sure the people named in the data were actually the political leaders identified with those names.
ICIJ structured the information in a spreadsheet and put it through two rounds of fact-checking. Data gathered on politicians was also visualized in the profiles in our Power Players feature.
ICIJ matched Forbes’s billionaires lists against the Pandora Papers to find more than 130 who had entities in secrecy jurisdictions. More than 100 of them had a combined fortune valued at more than $600 billion in 2021.
Why call them Pandora Papers?
Pandora was the first mortal woman in Greek mythology, a sort of an Ancient Greek Eve. Following the instructions of Zeus, she was molded by Hephaestus and endowed with gifts by all the other Olympian gods.
One of these gifts was a sealed box with all evils and diseased in the world, trapped to prevent them from spreading to humanity, with specific instructions never to open it; but Pandora could not resist and eventually opened it, spreading all evils… but one: hope, that was the last one in the box and was kept there as Pandora managed to put the lid back.