The Most Trustworthy Data in the World

May 16, 2024
Download this as pdf
Whoever is careless with the truth in small matters cannot be trusted with important matters ~ Albert Einstein


At the end of the day, the most important component in information is whether you can trust it or not. There are multiple reasons why you should be skeptical of information when you receive it, even if it is delivered to you by a company specialized in collecting it:

  • Maybe not all the data is there. Depending on the coverage of the supplier, you may not be seeing all the data related to the topic you are interested in, and the subset you are looking at could give you a flawed perception of reality. This is especially true if your supplier is only looking at major Western platforms and leaving out information written in other languages in further parts of the World.
  • Maybe the data is miss-classified. If the strategies employed to process the data are flawed, you may end up looking at data that has nothing to do with the topic that interests you, without even knowing it.

What makes information trustworthy is the fact that you have the ability to check it at any given time. How can you trust that which you can’t verify?

“Trust, but verify.” ~ Ironically enough, this proverb is estimated to be a paraphrase of Vladimir Lenin and Joseph Stalin

The Internet

The Internet has become a gigantic space. It started off with the adoption of the TCP/IP communication protocol back in 1983, exploded in the early 2000s, and has only been growing since.

It is estimated that 67% of the world’s population uses the Internet today. This translates in an enormous amount of information being created and shared every day, roughly 328.77 million terabytes daily.

If we assume a HD movie takes up around 4Gbs’ worth of disk space, 328.77 million terabytes worth of data would be like storing 82.192 million movies. And that’s only a daily volume.

Capturing the Data

There are two major obstacles in the way of capturing the full scope of the Internet’s Data today: reach of collection and frequency of collection.

  1. Reach

Internet’s architecture is distributed by default. This means that information can’t be accessed on any single endpoints. The information present on the Internet is very literally, everywhere, both physically and virtually.

In other words, if you want to access the most amount of data from the Internet you possibly can, you need to crawl through every single place where it can possibly be.

And this is where it gets tricky, because as mentioned earlier, the Internet’s data is very literally everywhere. In other words, you won’t be able to access it all if your Internet Address (or IP address), stays in the same place.

For example, some social networks, forums, services and more can are geo blocked. Geo-blocking is a technique used to block access from a service to a user based on that user’s location. This happens due to legal compliance reasons like it did for Threads and Vkontakte but also for ideological reasons in the case of Truth Social.

On top of that, several companies have started monetizing the access to their data, even if that data is actually publicly accessible. This is the case for X (previously Twitter). We talked about the dangers of making data harder to access in a previous story here if you want to learn more about it.

2. Frequency

Frequency of collection here refers to the ability to capture data posted on the Internet frequently enough so that you don’t miss out on important bits of information.

On top of the very large volumes of information that are being created and uploaded to the Internet, there is also the question of information being edited, removed or otherwise altered once it has been initially uploaded.

Collecting such information every 5 minutes, every day or every week can actually have a very significant impact on the quality of the information collected, especially when dealing with social networks.

One of the use cases we are working on establishing lies in using Exorde’s data to monitor unusual/ill-intentioned activity from social network users attempting to propagate false or altered narratives.

We made a complete report on this actually, when we carefully studied our data over a month-long period over the Ukraine/Russia conflict. Amongst other things, we realized that a good many posts written in French aiming to promote Vladimir Putin were actually getting deleted very fast after having posted… but not fast enough for our protocol to miss out on them *wink*. You can read the full report here.

Processing the Data

Capturing the data was very literally, only ever half of the problem. As astounding as that may sound, processing the data to make something useful out of it, is very much a whole different subset of problems.

Think of raw data like wood. Having wood is nice, but you can’t really do anything with it if you don’t transform it one way or another. You can set wood on fire to shed some light and heat, you can sculpt it to build structures and tools, you can even refine it and use it as decoration nowadays. Just as you can transform wood in many ways, so can you transform data.

Transforming data is like forging steel. To make steel you essentially need two base components: iron and carbon. However, you also need a precise amount of each and just the right temperature. If you get any of the proportions, the timing or the temperature wrong, it all goes to waste. The same applies to data. Without the right kind of processing techniques, your data is useless. Except, unlike forging steel, it won’t be as obvious whatever transformation you applied was useless.

This is essentially why Data Scientists and Data Analysts exist, and why this sector is gaining in momentum day by day.

Every company has a different way of processing the data they are collecting, but in ours, we want to look at specific metrics that allow us to quickly classify how people are talking about these topics: an emotion-based analysis.

In other words, we look at the way people talk about topics online, and from there, we are capable of rebuilding a narrative on the direction global conversations are taking, how fast they are spreading, and where.

Imagine you were trying to push a false narrative on social networks today. You make a clickbaity headline, some catchy text, maybe an AI-generated image to captivate your audience’s attention. Once the post is out, your only objective is to get people to share it, so it can propagate far and fast.

This is exactly what Exorde was built to measure: how fast your narrative is spreading, and where. As everything is timestamped, it’s easy to retrace a narrative to its origin, even if the original content was deleted.

This means that Exorde is capable of analyzing information online using the same KPIs (Key Performance Indicators) that anyone attempting to spread misinformation would be using to measure the efficacy of their work.

Our Solution

Exorde Labs’ team has been working relentlessly for over 4 years now to develop a technical solution to all the issues stated earlier. Complex problems require complex solutions nonetheless, especially when it comes to monitoring the information on the Internet today.

Here’s what makes our approach unique:

  1. We use a distributed network to collect all the public information on the Web. Anyone can join the community, run a node, and start processing the daily information of the Internet. This network guarantees we can reach any content in the World, in any language, at any given time.
  2. We use smart contracts and blockchain technology to handle the entire data processing pipeline. This means we are capable of pinpointing precisely who was involved in processing a data point, when, and how. This process guarantees full transparency and auditability in the way the information is processed.
  3. We have a deterministic approach to collecting and processing data. In other words, a single data point will always yield the same end analysis and can be verified as such. It is for this reason we are capable of including a validation system within Exorde which serves to verify that information submitted in the network is valid and has not been tampered with.
  4. A scalable solution for an Internet that keeps on scaling upwards: the Exorde network serves as a DePIN network. DePIN networks are Decentralized Physical Networks that leverage the physical decentralization of their network to yield value. In our case, as mentioned earlier, our physical decentralization is a significant asset in reaching the furthest parts of the Internet.

This approach guarantees the best and most reliable coverage of the Internet, in the most transparent and neutral way technically conceivable today.