Lab of Advanced Algorithms and Applications
HighlightsWe are developing a new semantic-annotation technology for short textual fragments, called TAGME. This tool has been applied succesfully to many contexts concerning with the clustering, the classification and the similarity-comparison of short texts. We are also studying the design of of “Multi-objective” data compressor, a novel paradigm in which optimization techniques are plugged into the design of data compressors. For example, in the Bicriteria Data Compression, the compressor solves the following problem: "return a compressed file which minimizes the compressed space provided that the decompression time is less than T". A third line of research is a classic one for our group lasting for 20 years now and regarding the design of compressed indexes for big data (such as String B-tree [JACM 1999] and FM-index [JACM 2005]). Recently we have designed cache-oblivious compressed version of those indexes for dictionaries of strings [ESA 2013], and distribution-ware compressed indexes for data collections [Algorithmica 2013].
- Google Zurich
- Tiscali Italia
- [2013-2015] Regione Toscana, Net7, StudioFlu, SpazioDati
- [2013-2014] Bassilichi
- [2013-2014] Google Research Award
- [2012-2014] Italian MIUR-PRIN Project "ARS-Technomedia"
- [2011-2012] Telecom Italia Working Capital
-  Google Research Award
- [2010-2012] Italian MIUR-PRIN Project "The Mad Web"
- [2006-2011] Yahoo! Research
- [2009-2012] Italian MIUR-FIRB Project "Linguistica"
“Bicriteria Data Compression” Accepted at SODA14! » December 1, 2013
We are pleased to announce that our paper “Bicriteria Data Compression” has been accepted at the ACM/SIAM Symposium On Discrete Algorithms (SODA) 14! The work will be illustrated on January 7, 4PM, Galleria North – Ballroom Level, Hilton Portland & Executive Tower.
TAGME service reaches 100M queries! » November 12, 2013
We are pleased to announce that our TAGME API service has been hit more than 100 millions in about 2 years. We are currently providing access for more than 100 users and sometimes the service has been able to handle more than 1 million queries per day.
Thank you very much to all users that have provided their valuable feedback.
bc-zip dataset » October 3, 2013
Here we publish the dataset used in our “Bicriteria Data Compression” paper (arxiv version).
Each file is a chunk of 1GB (2^30 bytes) extracted from the following sources:
- Wikipedia: Natural data. Extracted from a dump of English wikipedia, in XML format.
- U.S. Census: Database, statistical metrics of the U.S. population. Extracted from the U.S. Census database.
- DBLP: Bibliographic database in XML format. Extracted from the The DBLP Computer Science Bibliography project.
- PFAM: Biological data. Extracted from the PFAM database of of protein families.
New google grant! » February 21, 2013
Our project proposal “A novel graph for social-network analysis and search built by entity-annotators, and its applications” has received a Google grant as part of Google’s Faculty Research Award program!