Research and experimentation in various scientific fields are based on the analysis and benchmarking on datasets. But, without enough knowledge of relevant datasets, researchers usually have to go through a process of “manual dataset retrieval”. This manual search includes, 1) querying academic search engines for papers, and 2) spending a lot of time on reading the searched papers to find the datasets used. This process even when familiar with the research field might require a significant amount of time and effort due to the unprecedented rate of scholarly publications.
We present Delve, an online dataset driven system that provides a medium for dataset search, visual analysis of the citation relations among documents and datasets, and online document analysis. More specifically, Delve offers users a simple and easy to use interface for Finding a set of benchmark datasets for
a research topic/field interest;
Finding a set of research papers that used the same datasets;
Visually analyzing the citation relations of academic documents and datasets;
Instantly online analyzing an academic document and showing its citation relations w.r.t. other documents and datasets, when a PDF of the document
With these above-mentioned features, Delve is useful for different purposes, e.g., finding relevant papers for literature review, finding appropriate datasets for a specific research interest, understanding document and dataset citation relationships.
A FUTURE DIRECTION
Delve is already launched in public for noncommercial free use. However, it is still young. There are several directions to promote the system and extend the database. There are different areas of algorithmic improvement. One important area is in the citation relationship inference. We plan to apply a more sophisticated inference method. Currently, Delve shows a binary citation relationship (dataset related and nondataset related). We also plan to extend this to include different types of relationships. Another plan of our future research direction is to generate structured abstracts from documents texts which will provide a quick summary of an uploaded document.
When a user inputs a query using the user interface, the search phrase is parsed and sent to the dataset query analyzer and the document query analyzer for processing. A dataset node can also be a paper if it contains descriptions of dataset used in other papers. Delve is capable of handling queries based on snippets of the dataset name.
EDGE LABEL ASSIGNMENT
We express a paper or dataset source in our system database as an entity. From this information, a citation network G = fV;Eg is built through linking two entities if one cites the other. An edge between vi and vj are labeled positive (dataset related) if vi cites vj because vi uses the dataset in vj , or negative (not dataset related) otherwise. One of the principal challenges that arise in Delve is to develop an efficient and effective method to assign labels to a large number (millions) of unlabeled edges. To infer labels for the unlabeld edges, we restructure our graph to G0 = fV 0;E0;W0g, where the set of nodes V 0 is the set of edges E in graph G and E0 is the set of generated edges whose weight W0 show the calculated context similarities between two edges corresponding to nodes (v0i
; v0j ); 8v0i ; v0j . With the constructed graph G0 = fV 0;E0;W0g where a small portion of V 0 have verified labels, label propagation algorithm  is run to propagate the given labels to unlabeled V 0. Please refer to  for more details.
Delve is available in public for noncommercial use at: https: //delve.kaust.edu.sa
The document analysis provides a medium where researchers can quickly analyze a scholarly document regarding how it is relevant to other documents, without
checking the references and searching and reading each of them. Delve allows users to upload the PDF file of the paper for analysis. Delve analyzes the PDF, translates the results into a query, processes the query, and displays the result as a visual citation graph. We plan to provide more information from the document analysis. Further additions will be made to the system later.
 Uchenna Akujuobi and Xiangliang Zhang. Delve: A dataset-driven scholarly search and analysis system. SIGKDD Explor. Newsl., 19(2):36–46, November
 Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University,