Computer Science > Distributed, Parallel, and Cluster Computing
[Submitted on 17 Feb 2005]
Title:Batch is back: CasJobs, serving multi-TB data on the Web
View PDFAbstract: The Sloan Digital Sky Survey (SDSS) science database describes over 140 million objects and is over 1.5 TB in size. The SDSS Catalog Archive Server (CAS) provides several levels of query interface to the SDSS data via the SkyServer website. Most queries execute in seconds or minutes. However, some queries can take hours or days, either because they require non-index scans of the largest tables, or because they request very large result sets, or because they represent very complex aggregations of the data. These "monster queries" not only take a long time, they also affect response times for everyone else - one or more of them can clog the entire system. To ameliorate this problem, we developed a multi-server multi-queue batch job submission and tracking system for the CAS called CasJobs. The transfer of very large result sets from queries over the network is another serious problem. Statistics suggested that much of this data transfer is unnecessary; users would prefer to store results locally in order to allow further joins and filtering. To allow local analysis, a system was developed that gives users their own personal databases (MyDB) at the server side. Users may transfer data to their MyDB, and then perform further analysis before extracting it to their own machine. MyDB tables also provide a convenient way to share results of queries with collaborators without downloading them. CasJobs is built using SOAP XML Web services and has been in operation since May 2004.
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.