
[ Home ]
[ Extended SQL syntax ]
[ Available data types and extractors ]
[ Available datasets ]
[ Try the demo! ]
[ Authors ]
[ GBDI-ICMC-USP ]
SIREN - (SI)milarity (R)etrieval (EN)gine - is a command language interpreter that adds similarity query capabilities in SQL. Web-SIREN is a web-based interface aiming exporting SIREN resources for internet access (complete language syntax).
SIREN requires a user identification to allow access. In order to try our demo site, you may use:
The current prototype is under development. Althougth it already supports the STILLIMAGE, the AUDIO (as examples of the MONOLITHIC data type) and the PARTICULATE data types, some features of the language are still being implemented (data types and extractors available).
There are several datasets already loaded at Web-SIREN site. One of them, called Cars (available at http://lib.stat.cmu.edu/), is composed of the description of 392 cars. This dataset is constituted by nine attributes that describe the following variables: MPG (miles per gallon), number of cylinders, engine displacement (cu. inches), horsepower, vehicle weight (lbs.), time to accelerate from 0 to 60 mph (sec.), model year (modulo 100), origin of car (American, European or Japanese) and also the car names.
Another dataset, called MedImages, is made up by Computerized Tomographies (CT) from three human body parts: abdomen, cranium and thorax. Each tuple of this dataset is constituted by an image id, the image, the description of the body part and an attribute that specifies whether or not the image identifies a pathological condition. There are two similarity measures that can be used to query this dataset by similarity: the first one is the Manhattan (L1) distance function over normalized gray-scale histograms and the second one is based on a texture extractor (description of the available datasets). All information that can identify a patient (such as name, birth date, place and date of the exam) is omitted.
For the AUDIO data type, none dataset is available due to copyright rules.
Users can use the CREATE METRIC, and the CREATE INDEX to create his/her own indexes for these data sets.
Some examples of similarity queries that can be posed over the datasets described above are:
SELECT carname, horsepower, consumption, acceleration, origin
FROM Cars
WHERE car near (
67 as hp,
38 as mpg,
15 as sec
) STOP AFTER 3
SELECT carname, horsepower, consumption, acceleration, origin
FROM Cars
WHERE car NEAR (
SELECT horsepower AS hp, consumption AS mpg, acceleration AS sec
FROM Cars
WHERE carname = 'ford mustang'
) STOP AFTER 10
AND origin <> 'American'
SELECT americancars.carname, europeancars.carname
FROM americancars, europeancars
WHERE americancars.car NEAR europeancars.car STOP AFTER 3
SELECT BodyPart, Pathology, Img
FROM MedImages
WHERE Img NEAR 'D:\Images\sk_11424_0.jpg'
BY Texture Range 0.0265
SELECT BodyPart, Pathology, Img
FROM MedImages
WHERE Img NEAR (
SELECT Img
FROM MedImages
WHERE Id = 3948
) STOP AFTER 5 AND Pathology = 'N'
One can argue that the first two queries could be solved with a function written in procedural SQL. Please see this example. The problem is that this approach does not allow optimizations, such as the use of indexes, as the function will be executed for every row of the table.
This site does not allow users to upload data, and therefore it restricts its usage to the images already stored in the core database. If you want to upload an image database in our site, please contact the webmaster for instructions.
You may try SIREN demo now!
Siren is a prototype that runs an Oracle 10g database and its purpose is evaluation/development only. The available datasets are research examples, and their purpose are to exemplify the language extension proposed.