Term Frequency – Prioritizing documents that contain query words more times. The weight is composed of two components: So let’s try to understand how we find the magnitude of each dimension of our term vectors. The ranking then simply means sorting the list of matching documents in decreasing order of score. The result of the operations is our score representing how relevant the document is to the query. If we assign correct weights, then the problem of ranking documents reduces to performing some mathematical operations on the vectors of each document and the vector of our query. Now that we know what term vectors are, the natural question arises, how can this be used to rank results? The answer to this lies in assigning the correct magnitude a.k.a weight to each query or document vector. the query: “Eye for an Eye” can be represented as the following vector: 2*(Eye) + 1*(for) + 1*(an). Here the words in parenthesis are the dimensions of the vector and the numbers are the magnitude of each dimension which in this case is just the frequency of the word in our query. Our vector space is composed of all the vectors that can be formed using any of the words present in our documents.įor example. Once we have the dimension and magnitude known, then this query can be represented in our vector space. For eg, we can consider each term as a dimension of the vector and the frequency of each term as its magnitude. Term Vectors are used to represent statistics like the frequency of each term in our query or document. In the context of information retrieval, what we are more interested in is term vectors. In this space, we can add, subtract or multiply the vectors with each other Term vectorīut what does this mean in our context? How can vectors be used to rank results? Simply stating, we want a way using which we can represent our queries and documents in mathematical notation. Vector space can be considered as the imaginary space where all vectors that are being represented using similar dimensions can be thought to exist. E.g point (2,3) can be represented as 2*x + 3*y Vector Space The exact quantity of this length is called the magnitude of that dimension. The directions or dimensions can be its length in the x-axis and y-axis.
For eg, we can represent a point in geometry in vector notation. The direction component can be treated as a dimension of a vector. What are Vectors and Vector Space? VectorĪccording to Wikipedia, a vector is a quantity that has both magnitude and direction.
To understand this model, we first need to understand what vectors are. To achieve this we use the Vector Space Model. Once we have a list of documents that match our query, we try to rank these documents. This is explained in the previous article.
Apache lucene relevance models how to#
We know how to use an inverted index to find documents matching this boolean query. the query: “Vector Space Model” and be converted to the query “Vector” OR “Space” OR “Model”.
Apache lucene relevance models free#
When we get a free text query, it can be converted into a boolean query using any of the boolean operators. How the search scoring algorithm selects the documents that will be ranked