Lexivote Community-Powered Relevance-Ranking

Excerpt from the patent specification:

USE Subsystem Lexivote Ranking Methodology

Allowing users to design their own algorithms and manipulate the power of multiple search methodologies, as demonstrated above, is a powerful tool. However, the power of the USE can be even more fully realized when combined with a more powerful relevancy ranking methodology than the related art provides.

The Lexivote ranking methodology essentially allows a search engine user to interview millions of other users to find documents most relevant to his or her query. Specifically, a database of “word-votes” is created, then populated over time through user input, and then searched in response to a search query so as to provide the most relevant documents pertaining to the term or terms searched.

A word-vote, in its simplest form, is a pairing of two data: (I) a word datum and (II) a URL datum. Thus, a given word-vote could be a group such as “music” and “http://www.performer.org”;. A word-vote database record, in its simplest form form, has two corresponding fields: the word datum field and the URL datum field.

Word-votes are cast one at a time by individual users. Specifically, the user inputs a word and inputs a URL that he or she believes to be the URL of the best resource pertaining to that word.

A query for use in the Lexivote system is, in its simplest form, a word. When a query consisting of a word is submitted for a search under the Lexivote methodology, the word-vote database is searched for matching word-votes. A matching word-vote is a word-vote record in which the word datum in the word-vote vote matches the query word.

Results of the search are ranked using matching word-votes.

In its simplest form, the score of a document A under a Lexivote search according to the present invention is r(A)=m where r(A) is the score of the document A and m is the number of matching word-votes in which the URL is the same as the URL of document A.

Thus, for instance, assume that users have cast exactly 1000 word-votes in which “music” is the word, and, of these 1000 word-votes, exactly 7 of them contain “http://www.performer.org”; as the URL. When a query on the word “music” is submitted, there are 1000 matching word-votes, and the score of the document appearing at the URL http://www.performer.org under the submitted query is 7.

Complexity and subtlety can be added to the process quickly. For instance, the word datum in a word-vote can be a phrase or almost any character string rather than just a single word. A word-vote can include more than one URL-datum field, such that there can be a one-to-many relationship between the word/phrase datum and the URLs associated with this datum in a single word-vote database record.

In such a one-to-many word-vote, the URL data fields can be weighted such that a user puts his or her favorite URL in first, his or her second favorite URL in second, and so on. Then, instead of the score of the document being simply the number of matching votes that include the URL of the document, such matching votes are weighted according to the priority of the given URL in each matching vote. Greater weight is assigned to the first URL than to the second URL, and greater weight is assigned to the second URL than the third URL. A formula for scoring under such an approach appears in FIG. 98B.

Additionally, word-vote derivatives can be included. A word-vote derivative is an additional datum derived from a word-vote. For instance, every word-vote database record can include an additional field that is automatically populated with simply the domain name appearing in the URL datum. This domain name field can then be used in a secondary ranking methodology: when the basic Lexivote ranking methodology yields rankings that are very close together, the domain name field is used in a sort of “tiebreaker” methodology; a URL that includes a more popular domain name ranks higher than the URL that includes a less popular domain name.

The Lexivote system is further explained in reference to the figures. FIG. 93 depicts an example of an excerpt from a web page 9301 that provides a search query submission form. Also included are the fields necessary for submission of a word-vote to a Lexivote system operated by the UET Company. Specifically, the user chooses a word or phrase to enter in the word datum submission field 9302 and then enters into the URL datum submission field 9303a a URL that identifies the web page that he or she believes to be the most relevant page associated with the given word or phrase. If the user wishes to submit more than one URL to be associated with the chosen word, he or she can enter them in the additional “Favorite Website . . . ” fields provided 9303b, 9303c. The user then submits both his search query and his word-vote by clicking the “search” button.

While FIG. 94 depicts a process by which the Lexivote system can be implemented without the use of registered user accounts, this approach is particularly vulnerable to abuse. The preferred embodiment therefore is that depicted in FIG. 95, which is based upon registered user accounts. Through this approach, several quality control measures can be implemented. For instance, by allowing only registered users to cast word-votes and preventing any registered user account from including a duplicate word-vote, i.e., a word-vote in which the word datum is identical to that of another word-vote, the given user account can be limited to one word-vote for any given term. This measure will help to reduce the likelihood of attempts to “stuff the ballot box.” (Note that the presence of multiple word-votes cast by the same user and including identical URLs as the URL data is not problematic; certainly, the same website can be a user’s favorite website pertaining to a variety of different words.)

back to Science | Technology main page

Advertisements