ВУЗ:
Составители:
Рубрика:
33
of hypertextual information consisting of link structure and link (anchor) text. Google
also uses proximity and font information. While evaluation of a search engine is
difficult, we have subjectively found that Google returns higher quality search results
than current commercial search engines. The analysis of link structure via PageRank
allows Google to evaluate the quality of web pages. The use of link text as a
description of what the link points to helps the search engine return relevant (and to
some degree high quality) results. Finally, the use of proximity information helps
increase relevance a great deal for many queries. Aside from the quality of search,
Google is designed to scale. It must be efficient in both space and time, and constant
factors are very important when dealing with the entire Web. In implementing
Google, we have seen bottlenecks in CPU, memory access, memory capacity, disk
seeks, disk throughput, disk capacity, and network IO. Google has evolved to
overcome a number of these bottlenecks during various operations. Google's major
data structures make efficient use of available storage space. Furthermore, the
crawling, indexing, and sorting operations are efficient enough to be able to build an
index of a substantial portion of the web – 24 million pages, in less than one week.
We expect to be able to build an index of 100 million pages in less than a month. In
addition to being a high quality search engine, Google is a research tool. The data
Google has collected has already resulted in many other papers submitted to
conferences and many more on the way. Recent research has shown a number of
limitations to queries about the Web that may be answered without having the Web
available locally. This means that Google (or a similar system) is not only a valuable
research tool but a necessary one for a wide range of applications. We hope Google
will be a resource for searchers and researchers all around the world and will spark
the next generation of search engine technology.
TEXT 9
Read the text and point out three Internet properties which make it hard to
simulate.
The Internet has several key properties that make it exceedingly hard to
characterize, and thus to s imulate. First, its great success has come in large part
because the main function of the Internet Protocol (IP) architecture is to unify diverse
networking technologies and administrative domains. IP allows vastly different
networks administered by vastly different policies to seamlessly interoperate.
However, the fact that IP masks these differences from a user's perspective does not
make them go away IP buys uniform connectivity in the face of diversity, but not
uniform behavior. Indeed, the greater IP's success at unifying diverse networks, the
harder the problem of understanding how a large IP network behaves.
A second key property is that the Internet is big It included an estimated 998
million computers at the end of 2000. Its size brings with it two difficulties. The first
is that the range of heterogeneity mentioned above is very large if only a small
fraction of the computers behave in an atypical fashion, the Internet still might
include thousands of such computers, often too many to dismiss as negligible.
Страницы
- « первая
- ‹ предыдущая
- …
- 32
- 33
- 34
- 35
- 36
- …
- следующая ›
- последняя »