Google has this week published some details of its approach to increasing its Search services to a global audience. Although Google has been dominating the English and Spanish markets for some time, but it is aiming to improve the quality of search in all languages. Building a truly global search engine poses many problems, and is not simply restricted to the availability of a dictionary for each language.
Google continues to be openly ambitious about its future plans – maybe something that its rivals should take a look at.
“Our goal is to make Google’s search be relevant to all people, regardless of their language or country.”
Google’s search engine can no longer be maintained by single updates to its algorithmic process, due to the diverse nature of multi-lingual search characteristics. Rather than single updates to search engine, Google is having to make individual algorithmic modifications to cater for specific languages. In addition to this, even within languages differences can cause problems;
“And to make things really interesting, there are cases where the same language is different across countries. Obvious examples are “color” in the U.S. vs. “colour” in the U.K., or “camião” in Portugal vs. “caminhão” in Brazil.”
Google also relies on advanced AI (artificial intelligence) to train its search algorithms, as well as human intervention. To do this Google sources and feeds a large collection of documents for a particular language into the search engine. From this, the search engine “learns” the various nuances of that language, and improves the way it indexes and retrieves results for related searches.
Google also relies on us, the users, to provide information to improve its search engine.
“We learn some things from our users, so as people start using our search engine, we can improve the way we rank in that language.”
Google explains how users help with developing the search engine with these examples of real changes that have been made:
- Spell corrections: We recently launched spell corrections in Estonian. If your Estonian is rusty, and you don’t remember how to spell “smoke detector,” we can suggest a spell correction for [suitsuantur], leading to better search results.
- Diacritical marks: Many languages have diacritical marks, which alter pronunciation. Our algorithms are built to support them, and even help users who mis-type or completely ignore them. For example, if you’re a resident of Quebec, Canada and would like to know the weather forecast in Quebec City, we’ll serve good results whether you type with diacritical signs [Météo à Québec] or without [meteo quebec]. Czech users can read the same excellent results for a popular kids’ cartoon by searching for [krtecek] and [krteček]. On the other hand, sometimes diacriticals change the meaning of the word and we have to use them correctly. For example, in Thai, [ข้าว] is “rice,” with completely different results than [ข่าว], which is “news”; or in Slovakia, results for “child” [dieťa] are different than results for “diet” [diéta].
- Synonyms: A general case of diacritical support is the handling of synonyms in different languages. Korean searches showed that “samsung” can be viewed as a synonym of “삼성”, so that when users search for [samsung], they find results which have the company’s name in Korean.
- Compounding: Some languages allow compounding, which is the formation of new words by combining together existing words. You can see a nice example in Swedish, where we return documents about a Swedish credit card for both compounded [Visakort] and non-compounded [visa kort] queries.
- Stemming: Google has developed morphological models that can receive compound words as queries, and return pages which contain their stem, possibly as part of a different compound. For example, when searching for cars in Saudi Arabia, you can search for [سيارة] and [سيارات] because both are variants of the same stem, and both return many common results. A Polish user can search for “movie” [film], and get back results that contain other variants of the stem, such as “filmów,” “filmu,” “filmie,” “filmy.” A user from Belarus will find results for all word forms of the capital, Minsk [Мінск]: “Мінску,” “Мінска,” “Мінскага.” (Source)
However, it does not stop at the linguistic elements of a language. Also vital to a good search experience (i.e. one that brings back the best results for the user) is understanding how people enter a search query. Keyboard limitations and short cuts resulting from those limitations are also considered when building new search rules. In similar ways.
So, what does this mean for the webmaster? Understanding how Google builds its search engine in multiple languages does not really relate to the average webmaster, or the average online business. However, if your business can reach out to a global audience, then multiple languages on your site can increase the overall visibility of your website in the Google search engine. If you are already dominating a niche in one language, by dominating in an alternative language could provide “SEO benefits” to your site, as well as increasing your global audience and improving user experience. And you may be helping Google to become a better search engine for everyone else!
Source: Permalink to Google’s Blog