Photo Credit: Etienne Boulanger on Unsplash
Photo Credit: Etienne Boulanger on Unsplash

If you ask any researcher which online outlets they use to find relevant journal articles, there’s a good chance that Google Scholar will be at the top of their list.

In the latest “How Readers Discover Content in Scholarly Publications“ report from Renew Publishing Consultants, respondents rated academic indexes as “the most important discovery resource when searching for articles,” with Google Scholar among the preferred options. And there’s good reason — according to a study on the scope of Google Scholar published in Scientometrics, the freely available search engine from Google now houses over 389 million records, making it among the most comprehensive scholarly indexes worldwide, if not the most.

With so many researchers using Google Scholar, it’s a search engine that all journals should seek to get into. Inclusion in Google Scholar can help expand the accessibility, reach, and, consequently, impacts of your journal articles.

In this blog post, we’re rounding up answers to Google Scholar indexing FAQs for journal publishers and editors, covering the benefits of Google Scholar indexing, how it works, and tips to improve the chances of your articles showing up in Google Scholar search results. To jump to a specific section, click the links below!

  1. What are the benefits of Google Scholar indexing for journals
  2. How does Google Scholar indexing work?
  3. What are Google Scholar’s technical indexing criteria?
  4. How long does Google Scholar indexing usually take?
  5. How can I tell if Google Scholar is indexing my journal articles?
  6. How can I improve the chances of Google Scholar indexing my journal articles?
  7. Putting it all together

What are the benefits of Google Scholar indexing for journals?

In short, when Google Scholar indexes your journal articles, whenever someone uses it to search for keywords related to the topics you publish on, there’s a chance some of your content will show up in the search results. In this way, Google Scholar indexing can help expand the reach of your articles and improve the chances of them being read, shared, and cited.

There’s a good chance researchers in your journal’s discipline and related ones will be using Google Scholar since, as noted, it’s one of the most popular academic indexes in the world. Google Scholar includes myriad scholarly content across disciplines ranging from journal articles to preprints to conference abstracts and more, in all languages and from all countries. It indexes metadata from scholarly literature and can index entire texts when copyright/file structure permits.

Getting indexed in Google Scholar will help:

  • Increase the reach of your articles because scholars will be more likely to find them (and consequently increase the chances of researchers reading and citing your articles)
  • Create new links between your journal articles and related literature via the Google Scholar “Cited By“ feature, which displays a list of articles and documents that have cited the content it returns in search results
  • Resurface your past articles since Google Scholar takes citations into account and shows more frequently cited works earlier in search results

If you publish open access (OA) articles, seeking Google Scholar indexing is especially important. For content to be truly accessible, making it free isn’t enough. You have to ensure that anyone can find your articles on the web and that they aren’t only available to scholars with access to subscription-based abstracting and indexing (A&I) databases or prior knowledge of your journal. Google Scholar enables anyone to freely search for and find relevant academic content online from anywhere worldwide.

How does Google Scholar indexing work?

Google Scholar search view

Since you’re reading this blog post, you likely know about Google Scholar as an academic search tool. But you may not be sure of how it processes content or compares to Google’s general search engine. Let’s break it down.

Like Google, Google Scholar is a crawler-based search engine. Crawler-based search engines can index machine-readable metadata or full-text files automatically using “web crawlers,” also known as “spiders” or “bots,” which are internet programs that systematically “crawl” websites to identify and ingest new content.

Google Scholar has access to all the crawlable content on the web, with the ability to index entire publisher and journal websites and to use the citations in the articles it has indexed to find related research. However, a misconception about Google Scholar is that it indexes all the content it has access to regardless of the type or quality — this is not the case.

In actuality, as explained in “Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar & Co.,” published in the Journal of Scholarly Publishing, Google Scholar is an “invitation-based search engine.” That means only articles from trusted sources and articles that are “invited” (or cited) by articles already indexed are eligible for inclusion.

On its website, Google Scholar states, “we work with publishers of scholarly information to index peer-reviewed papers, theses, preprints, abstracts, and technical reports from all disciplines of research and make them searchable on Google and Google Scholar.”

To be eligible for inclusion in Google Scholar, your journal must first meet two basic content criteria:

  1. The content hosted on your website must consist primarily of scholarly articles (i.e., journal articles, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts).
  2. You must make either the full-text or the complete author-written abstract for all articles easy to see when users click on your URLs in search results and freely available (without requiring human or search engine robot readers to log into the site, install specific software, accept any disclaimers, etc.). The Google Scholar Inclusion Guidelines for Webmasters note, “sites that show login pages, error pages, or bare bibliographic data without abstracts will not be considered for inclusion and may be removed from Google Scholar.”

In general, Google Scholar should find your journal articles and index them automatically so long as they’re on a website that meets the above content inclusion guidelines and Google Scholar’s technical criteria. So you don’t have to apply for Google Scholar indexing. We cover Google Scholar’s technical specifications for web crawling that journals must implement to be included and the usual indexing timeframe below.

A few important caveats: It is important to note that Google Scholar’s indexing process is proprietary and somewhat unpredictable. Google Scholar tends to index content more slowly than Google, and it may choose to index any available version of an article. For example, if you publish PDF and HTML versions of articles, Google Scholar may index just the HTML, the PDFs, or a combination. And if you publish OA articles that are available from multiple sources in addition to your journal website (e.g., institutional repositories), Google Scholar may also index those other versions. In that case, Google Scholar will decide which to display as the primary version in search results (which may or may not be the version of record). Publishers and hosting platforms ultimately have no direct control over this.

What are Google Scholar’s technical indexing criteria?

Example of Google Scholar compliant HTML meta tags for article hosted via Scholastica

In addition to meeting the general content inclusion guidelines above, your journal website must fulfill certain technical specifications for Google Scholar to be able to discover and fetch the URLs of all your articles and then “crawl” the content.

Be sure to closely read ALL of Google Scholar’s “Inclusion Guidelines for Webmasters” available here. Key criteria include:

  • Articles must be in HTML or PDF format.
  • PDF files must have crawlable and searchable text (i.e., you must be able to copy and paste the text from the PDF into another file and search for and find words in the PDF document using Adobe Acrobat Reader).
  • Each file must not exceed 5MB in size. To index larger files or scanned images of pages that require OCR, you must upload them to Google Book Search.
  • All your articles must be hosted on separate web pages (i.e., each article should have its own URL instead of hosting multiple articles in one PDF or HTML file on the same web page).
  • If your website uses a robots.txt file, e.g., www.example.com/robots.txt, it must not block Google’s search robots from accessing your articles or browse URLs.
  • Article pages must include exportable machine-readable bibliographic metadata as HTML meta tags.

HTML meta tags help Google Scholar accurately index and categorize content, so the quality of the metadata you add to your journal articles will directly impact your indexing outcomes. Incorrect identification of bibliographic data or references will lead to poor indexing.

The most common Google Scholar meta tags that all articles should include as applicable are:

  • “citation_title”: the title of the article
  • “citation_author”: the name of the author or authors of the article
  • “citation_journal_title”: the name of the journal in which the article was published
  • “citation_volume”: the volume number of the journal in which the article was published
  • “citation_issue”: the issue number of the journal in which the article was published
  • “citation_firstpage”: the first page number of the article
  • “citation_lastpage”: the last page number of the article
  • “citation_abstract”: the abstract or summary of the article

Google Scholar’s indexing guidelines can get pretty technical. So if you have a journal website built via custom code or a general-purpose content management system like WordPress that doesn’t include the above HTML metadata, you may want to move to a dedicated journal hosting platform that offers that kind of academic indexing support. For example, Scholastica’s OA Publishing Platform includes a customizable website template structured to meet Google Scholar’s indexing criteria, including exporting bibliographic metadata in HTML meta tags (e.g., citation_title), helping to improve journals’ chances of Google Scholar indexing.

Some journal databases, such as JSTOR and Project MUSE, also support Google Scholar indexing. So adding articles to them can be another avenue to show up in Google Scholar search results.

How long does Google Scholar indexing usually take?

It’s important to note that Google Scholar indexing is anything but immediate. Once your journal website meets Google Scholar’s inclusion guidelines, it will take some time for its crawlers to find and index your content. According to Google, you’ll have to wait 6-9 months before your articles appear in Google Scholar search results.

Once Google Scholar determines your website is a trusted source, it should index your new articles every few weeks. However, if you update articles that were already indexed, re-indexing can take 6-9 months.

Sometimes articles that meet all of Google Scholar’s guidelines fail to be indexed in the expected timeframe. If this happens, you may need to wait a little longer. We cover how to tell if Google Scholar is indexing your content and best practices to improve the chances in the sections below.

How can I tell if Google Scholar is indexing my journal articles?

Example of Google Scholar search results

As noted, Google Scholar doesn’t just index everything on the web. Your website and articles must meet the content and technical inclusion guidelines we covered above.

You can quickly check if Google Scholar is indexing your website by visiting scholar.google.com and searching for your journal’s domain. To check the coverage of your website in Google Scholar, search for titles of several dozen articles and see if those papers are returned.

If you find that Google Scholar isn’t indexing any of your articles, you’ll need to revisit the “Inclusion Guidelines for Webmasters“ to identify the reason. If Google Scholar has indexed some but not all of your content, it could be due to missing indexing criteria or errors on certain article pages. We cover troubleshooting tips below. It may also be due to Google Scholar needing more time to index your website (remember, its crawlers tend to work more slowly than Google’s).

How can I improve the chances of Google Scholar indexing my journal articles?

First, remember to be patient and wait the full 6-9 month timeframe for Google Scholar to start indexing your articles once you’ve implemented all the inclusion guidelines. After that, there are no guarantees that Google Scholar will index everything you publish. However, following the inclusion guidelines very closely will significantly improve the chances.

As noted, if you host content via Scholastica’s OA publishing platform, we can help you here. Our websites are structured to support Google Scholar indexing with exportable bibliographic article-level metadata in HTML meta tags.

If one or more of your articles aren’t showing up in Google Scholar or previously appeared and no longer appear, again, first check to ensure your journal website is fully-compliant with its indexing guidelines. You can learn more about Google Scholar’s troubleshooting recommendations on their website here.

If your journal website and articles meet all of Google Scholar’s criteria and you’re still experiencing gaps in indexing it may be due to inconsistencies in your article-level metadata. Common issues include:

  • Incorrect publication dates in meta tags (the publication meta tags should match the date of formal publication for the issue, as well as the publication date listed on the article)
  • Use of different languages in the meta tags (the meta tags should all be in the same language)
  • The language of metadata is different from the language of the article
  • Discrepancies in the author’s name in the metadata (e.g., incorrect/inconsistent name format or spelling errors)
  • Authors are listed in a different order in the meta tags than in the article

And again, remember, journals and publishers can’t control which formats or instances of articles Google Scholar chooses to index as the primary version (i.e., PDF vs. HTML article files or journal website vs. repository versions). So if another version of your article is already showing up, that may remain Google Scholar’s primary version regardless of any article/website updates you make.

A quick note on website migrations: If your journal is migrating or has migrated to a new hosting platform with different article URLs, it’s essential to redirect all your old article links to the new ones. These redirects must be HTTP 301s, which signals a permanent redirect from one URL to another.

During website migrations, it’s best to keep your old journal website up while you’re getting the new one ready to avoid access interruptions for indexes and the researchers using them. Then, when the new site goes live, put the article-level redirects in place and change the Domain Name System (DNS) lookup to the new server before taking down your old site. If you have questions about this, reach out to your different hosting providers for help.

Once you have any necessary redirects in place, per Google Scholar’s troubleshooting notes, “updates of papers that are already included usually take 6-9 months,” but could take longer, so be patient.

Putting it all together

As you can see from this blog post, Google Scholar indexing takes time, and the process can be unpredictable. Even once you meet all of Google Scholar’s criteria, there are no guarantees that it will index every article you publish or the exact versions you want. But following the requirements above will significantly improve the chances.

However you decide to go about getting your journal articles indexed by Google Scholar, now’s the time to start! Google Scholar indexing will help expand the accessibility and reach of the research you publish.

This post was originally published on February 4, 2016 and updated on April 20, 2023.

Webinar Connecting Scholarly Metadata