Scholar User Guide

See also: About Scholarly Search

This service provides fulltext searching over research publications archived in Internet Archive's various collections. It includes content from the natural sciences, humanities, biomedicine, art, history, industrial research, government reports, and more.

Reader access to the content is provided when possible. Sometimes this access is to a "pre-print" or other version of the work, and this is indicated in the search results.  In other cases, depending on search filters, results are included for which there is only a bibliographic catalog entry. It may still be possible to obtain access through a public library or from the publisher directly.

Query Syntax

In addition to the basic filtering and sorting options, this search interface also allows the use of Lucene query syntax in the search box. You can restrict term queries on multiple metadata fields using colon statements like journal:Science, set filters like lang:de, and apply range queries like year:>1989 year:<2000.

While this syntax allows for relatively complex and powerful queries, at some point advanced users may run into limits on the size or complexity of queries.  For the time being we recommend systems like lens.org for a more powerful interface.

Example Queries

Search for digitized pages about a topic from specific years:


Search for papers in Chinese matching a term:


Conference papers with an author name query:


Citation Queries

As an experimental feature, if the search query "looks like" a formal citation, as found in the bibliography of a research paper, the service will attempt to parse the citation and do a match against our catalog of known works. When this happens, any filters are ignored.


Metadata Fields

You can restrict to records where the field exists with an asterisk like doi:*, and negate any term like !type:article-journal.

In-depth documentation of the query syntax is available from the Elasticsearch project.

The complete current search document schema is available (as JSON) in the source code.

title:
author:
journal:
year:
issue:
volume:
doi:
tag: eg, "tag:oa"
type: eg, "article-journal", "dataset", "book"
stage: eg, "published", "submitted", "accepted", "draft"
lang; value is a 2-character lower-case ISO lanuage code)
country: value is a 2-character lower-case ISO country code
access_type: "wayback", "ia_file", "ia_sim"

Search Results

Access Links

All Internet Archive preservation copy links have the same style and icon. Content from the Wayback Machine looks like this.
If the preserved copy of the work is from a pre-print, author manuscript, or other alternative version of the work, the access link has an indicator. You can get details and view all versions by clicking on the primary title link
Some preserved content, particularly older Public Domain works, may be stored in general Internet Archive digital collections (as opposed to the web archive)
Digitized copies of works on microfilm may be linked to experimentally. Access may be limited to controlled lending
A publisher landing page is the authoritative source for the "version of record" of a research publication, but content is not always accessible to the general public
When the work is from an Open Access publication (sometimes known as "Gold" or "Diamond" OA), and the publisher is expected to provide access to all readers, the button has an orange "unlocked" icon
If the work is archived in full on a reliable, open platform, we will sometimes provide additional links

Tags

Search results may have tag labels which provide additional context about the work. For example, indexes the journal is included in, or open platform technology used for publications.

Multiple Versions There are multiple released "versions" or "editions" of this work, and bibliographic metadata for the "primary" is being shown. Click the title to see other versions
lang:en The primary language of this work is different from the search interface language. The ISO two-letter language code is indicated
DOAJ Published in a Directory of Open Access Journals publication, which implies that this is an Open Access work
Szczepanski Publication indexed in Szczepanski's List of Open Access Journals, which implies that this is an Open Access work
Open Access The work is believed to be "Open Access" for any other reason
SciELO Published on a SciELO national platform
OJS Published using Open Journal Systems software
Wordpress Published using WordPress software
JSTOR Preserved and/or hosted on the JSTOR digital preservation platform

Persistent Identifiers

Underneath search results, and alternate version listings, are any known "persistent identifiers" that uniquely identify the specific version of the work. These are usually hyperlinks.

doi: Digital Object Identifier (DOI), provides a redirect to the publisher's landing page
pmid: PubMed/MEDLINE
pmcid: PubMed Central
arxiv: arXiv pre-print service
dblp: The DBLP Computer Science Bibliography
doaj: Article-level identifier for works in DOAJ, particularly those with no DOI
fatcat: fatcat.wiki "release" identifier. Scholar is built on top of the fatcat catalog

Work In Progress

This is a new service. Metadata is being improved and features have not been finalized.

Some known bugs and issues: