Frederic Kaplan - The Venice Time Machine
I would like to take you on a new scientific journey: the reconstruction of big data of the past. This journey would take us to Venice but, before we go, I need to explain a couple of things: 10 million new photos are shared every hour on Facebook; 400 million tweets are sent every day. We have entered the age of big data so our lives and our world is now documented in real time. Big data will surely transform our life, the way we work and maybe the way we think. Big data may help us anticipate our future but, for this, there is one problem: the big data we have, are essentially only big data of our present time and we cannot anticipate our future without knowledge of our past.
Imagine if you had a magic rewind button, that you could stick on internet services.
Take Google Earth and imagine to see a place and to use your magic rewind button; you’ll see how it looked like 5 - 10 - 50 - 500 years ago. Imagine if you could surf on the Facebook of the Middle Age with the same density of informations as you have in Facebook today. So, how can we build this magic rewind button? What do we need?
We need data.
This is the information diagram: vertically you have time and horizontally you have the amount of digital informations we have; so obviously in the last ten years we have plenty of digital informations and this is forming a large plateau but, going down the curve and back in time, the curve shrinks. If we want to build this magic rewind button, we need to transform this mushroom into a square. So how can we do this? One way is through DGT section. There are plenty of extraordinary archives around administrative records, personal archives, newspapers so imagine if we could extract all the informations contained in the daily newspapers for the last 200 years! That's a lot of informations! However, that will never be enough; if we want to create a past as dense as present we need to do something else, we need to do what historians do: extrapolate in computer science a simulation.
Imagine if I have the logbook of a boat captain travelling the Mediterranean; of course I will learn a lot of things about a particular trip that captain has done but I can also obtain informations about maritime networks of that period. That information is not exactly the same as a primary sources but it can nevertheless be used.
We are applying this methodology in a new international project called “Venice Time Machine”.
It's launched by EPFL in partnership with University Ca’ Foscari and with the State Archive of Venice and with the support of the Lombardi Foundation.
So, why choose Venice? Venice is a wonderful city with a very rich past: it was the biggest maritime Empire of its time but there's something more which is very exciting for us. Venetians were obsessed with archives and they kept track of everything: every boat entering the city, every boat leaving the city, every change made on buildings or on canals. This was the Google of the Middle Age.
As a result, in the State Archive of Venice you have 80 kilometers of documents describing every aspect of the Venetian history over a thousand years; we have decided to turn this huge archive into a digital information system. As I'm speaking, we are installing new efficient scanners in the archive and creating a digital pipeline which should be able to produce about several thousand of images per hour, progressively transforming the documents into terabytes of high-definition images.
But of course if you want to extract informations, you need to do something with these images and you need to read them, to transcribe them and to transform them into text; this is notoriously complex.
We are dealing with several language (as Latin and Venetian dialect) and we're dealing with thousands of handwriting styles; this is challenging but this is maybe an opportunity. Actually, because we have such big amounts of data, you can look very precisely how handwriting styles are changing over time and we discovered that there is a continuous evolution. Thank’s to the discovery, we have been capable of building Machine Vision algorithms, that are capable to spot on each scanned image similar words. That means that when you transcribe one of these words, its thousands of other transcriptions can be inferred. The next step is to extract informations from these documents; we need to spot people, names, places and to connect them together, creating progressively a giant graph and a social network of the past, containing informations about thousands of lives. With such a graph, we can also ask ourselves new kind of questions: we can ask “how many painter’s workshops were in 1622 in Venice?”, “what was the average salary of an apprentice?” and we can ask even more. We can actually deploy this graph in space and time by analyzing the numerous ancient maps; we are on Venice so we can progressively reconstruct a possible evolution of the Venetian urban development and this evolution is data driven, that means that when does the new data coming in, it actually changes.
For instance if I'm taking this map from the early 16th century and I'm crossing the informations it contains with tax declaration of the same period, I can reconstruct a model of the Rialto neighborhood in which I know who was living in that particular house.
At another scale, if I'm taking the informations about maritime goods, I can create a maritime simulator. I can ask for instance: if I am in Corfu in 1322 and I want to go to Constantinople, when can I take a boat? how much will it costs? what are the risk of encountering pirates?
This information is interesting for historians but it can also lead to very new exciting museographic experiences; we can set up installation in which we explain how we go from sources to this type of reconstructed version of the past or we can create immersive experiences. This is a change of scale, this is a change of methodology and Venice is just the starting point. What we have started to do with Venice, you can do it with Geneva, with New York or with Baghdad. What I envision, in less than 20 years, is a global time machine, permitting to reconstruct the space of possible past at a worldwide level. There are of course numerous technological challenges to achieve this goal, but, more importantly, we need to foster a new generation of researchers, capable of creating and handling this big lost past. This new generation, at the crossroad of Information Science and the Humanities, will be the one designing the tool that will allow us to navigate in our past, to anticipate our future and to adapt and drive in this new world of data.