Introduction
Ancient writings on papyrus are invaluable for our knowledge of history.
The writings may be literary texts or official records with
various dates and names, sometimes family records or
letters, or lists of ownerships. But whatever they
contain, they always offer us first-hand knowledge of the ancient
times. Because of the unique
and fragile nature of the writings, historians, papyrologists and
archaeologists prefer to avoid unnecessary physical handling. So
far, the primary method of recording has been photography.
The current technology provides many new means to the
specialists; there are new ways to obtain the images, to distribute
them, and to process them. The new technologies are especially
needed in cases like carbonized papyri, which is the topic of
this report.
This work has been set forth to support the deciphering of
recently found Petra scrolls. The Petra scrolls, found in
December 1993 in Jordan, are typical examples of carbonized
papyri, where the lampblack text is almost indistinguishable
from the carbon black background. The conservation work was
led by professor Jaakko
Frösén, and took several months, being a very
tedius job. It was finished by spring 1995. Professor
Frösén has kindly provided us samples of similarly
carbonized papyri fragments from another finding.
Our primary object in this project has been to find the best
possible way of producing digitized images from carbonized papyrus
fragments for further image processing, research and archiving
The secondary object has been to optimize
the photographic methods, which will probably continue
to stay as one of the most important ways to record carbonized
papyri.
Papyrus
Papyrus, manufactured from the papyrus plant (Cyperus
papyrus), has been widely used in Egypt, Greece, Middle East
countries and Roman Empire; the usage dates from ca.
3000 B.C. to the beginning of European Middle Ages. The material
is quite durable, but, being organic, it degrades gradually
unless kept in dark and micro-organism-free conditions, and at
suitable humidity.
The ink used for writing was a composition of water and plant
fluids with lampblack as pigment. In many findings the papyrus
scrolls have been carbonized, whereupon the organic compounds have
been charred out and only the most chemically stable material has
been left. The process of carbonization is not as rapid and mechanically
wearing as open combustion, and so the scrolls have had their
chances to survive. As a result, carbonized papyri have
preserved even better than undamaged!
Figure [1] shows the sample plate we used in our tests.
The carbonized fragments of a papyrus scroll from Bubastos, Egypt,
were kindly provided to us by professor J.Frösén
from University of Helsinki. Most of the numerous tests were
concentrated on sample no. 12 [2], in particular to the first three
characters of one line of it [3]. The first figure has been scanned
from a photograph. The figures 2-4 are directly scanned.
Figure 1.
The conserved papyrus fragments on a 40x25cm
plate (640x383).
Top row: Fragments 1,2,3,4,5,6,10.
Lower row: Fragments 7,9,11,12,13.
Figure 2:
Fragment 12, 6.5x93 cm
Figure 3:
The test image from fragment 12, 2.8x0.9 cm (640x208).
Figure 4:
A detail from fragment 12, 0.9x0.9 cm (447x447).
Digitizing Images
There are several ways to digitize images. The main issues with
digital images and image processing are spatial resolution and grayscale
depth. Good spatial resolution is needed for statistical methods to be
efficient, and sheer enlargening has proven to be very helpful
to the human eye. This is because the human eye can not
distinguish small gray scale differences of small objects. The
smaller the grey scale difference the larger areas are needed.
Good amount of different grayscales again makes it possible to analyze
the dynamical properties - for example, noise and edges. Good dynamics
is essential in finding the best contrast area from a picture. Without
good enough contrast we simply don't have readable text.
The following sources for digital images have been tested
more or less thoroughly:
- Photograps; negatives or paper copies
- Digital cameras
- CCD cameras
- Scanners
- X-ray photographs
Photographs can be digitized with a variety equipment:
- Slide scanners
- Drum scanners
- PhotoCD scanners
- Flat scanners
- Hand scanners
- Video grabbers
- Digital cameras
Flat scanners, hand scanners, video grabbers or digital
cameras can be used to direct recording of sample plates.
The methods themselves are not directly comparable.
They all have their own benefits and weaknesses, which we have
tried to find out in our tests. Before judging any method one must
be familiar enough with it. For example, for taking best possible
photographs one must spend lots of time in experimenting, as the
amount of variables to control is quite large. The object in
these tests has been one plate of carbonized papyrus fragments
shown in figure [1]. The plates themselves set certain
limits; e.g. one cannot put a plate into a drum scanner,
the glass plate effectively filters out long wavelength
IR-radiation, and U.S.letter sized flat scanners cannot
be easily used, because the plates are typically larger.
Image Processing
The black writing on a black, rough background of the carbonized
papyri is a challenging image processing problem.
There are several aspects that make it very difficult to extract the
characters from background: minimal contrast between writing
and background, messy background texture partly visible through
the text, unclear character edges, noise, cracks in the
material, etc. There is no simple feature that could be
used as the perfect classifier. In principle, the utopistic result of
processing should be a binary (black and white only)
image showing only the characters, leaving the background out,
and doing that with absolute certainty. In practice, that is hardly
possible, but one can always enhance the images to a more readable
form, and search for any features that could be extracted with
image processing methods.
Image processing takes easily vast amounts of processor time even
on the fastest computers. Our small test image consists of
131.120 pixels. A simple algorithm may require, say, a 5x5
matrix to be applied to every pixel. With maximum resolution,
the size of the test image is almost 1.000.000 pixels.
Complex algorithms may need larger matrices,
derivatives, variances, sorting, heuristic methods and dozens
of iterations to be applied to a single pixel; and they must
be exhaustively tested to find a suitable set of boundaries
and values for their variables. So, one should keep in mind
the limits when searching for a good algorithm.
It should also be remembered that the image should not be
modified too much and that by manipulating the images one can
easily create artifacts that show up as parts of characters.
It is also very easy to loose essential information. For best
results one should combine human expertise and intelligence to
powerful computational methods.
There are some basic methods such as histogram equalization
which might be classified as 'non-altering' or
'non-manipulative' methods when not used to the extreme.
Manipulated images should always be presented with the originals
or with 'non-manipulated' images to maintain the credibility.
At best, manipulations help to find new ways to see the
images while the character recognition suits best
for the human specialists. It is recommended to use two or more
differently processed images of difficult-to-read objects.
Hypermedia
The papyrus scrolls and the context of the find itself contain
many types of information. The context contains everything about
how the find was related to its surroundings: exact location,
environment, surrounding buildings, structures and other objects,
the depth of the find, possibly what layers was above and below it.
The context provides the base for deducing miscellaneous
information about the find, one of the most important being the dating.
The scrolls also contain miscellaneous information. They may
contain dates, names, locations, lists, religious texts,
descriptions of local life and many
things about the culture that all can be related to the
previously known or suggested understanding of the ancient world.
The publications of these scrolls [viite] discuss the writing
and spelling itself, contain interpretations, notes and related
information. Typically only parts of the whole scrolls can
be saved, and one must take great care to keep the fragments
in correct context; their relation to each other must be
preserved.
The essence of hypermedia is the ability to naturally link
different kinds of information together. The multi-layered
information structure of the papyri writings and the whole context
is inherently difficult to be contained in an ordinary document.
A book can contain lots of links and lists to related information,
typically to previuos documents and or other material not included
in the current document. The physical format of a book is also
set, and to follow links to other pages and back is somewhat
restricting to the train of thought. Hypermedia offers at least
a partial solution to free the format of a document. All pictures,
descriptions and interpretations of current work can be linked
freely to previous knowledge and background material. A good
example of the possibilities of hypermedia concerning papyri
is the Duke University Papyrus Archive
(http://odyssey.lib.duke.edu/papyrus).
To benefit from the possibilities that hypermedia offers one
should be able to form a link directly to the information needed.
This can be done in an information network such as Internet
where a unified standard of hypermedia language (html) is set,
but it will be efficient only after very large amounts
of data is made available by transferring it to databases
and archives.
It should now be set as a standard to provide all documents
through Internet.
Back to Abstract
Back to Contents
Next: Previous Work
Antti Nurminen, 34044T, andy@cs.hut.fi