Rationale
UNIVERSITY OF LEEDS | SCHOOL OF COMPUTER SCIENCE
Assessment Brief
Web Services and Web Data COMP3011/XJCO3011 Search Tool
Module title
Module code
Assignment title
In this assignment, you will develop a search tool that finds pages containing certain search terms in a website.
This coursework will help develop your skills in building web crawlers, indexers, and web search algorithms.
As part of your submission, you should also submit a brief report that clearly, yet briefly, describes how you implemented each aspect of the search tool. The report should be restricted to a maximum of 4 A4 pages (excluding the title page). It should be typed using Times New Roman font size 11, single line spacing, and default margins.

Assignment type and description
    Word limit and guidance

Weighting
This coursework is worth 30% of the module mark. 6 May 2025
Electronic via Minerva
Electronic via Minerva
M. A. ALSALKA
Submission deadline
Submission method
Feedback provision

  Learning outcomes assessed

• Understand how search engines work.
• Acquire skills in writing efficient code for web crawling, indexing, and
query processing.
• Learn the various techniques of storing words in search indices.
• Acquire skills in imp代写XJCO3011 Web Services and Web Datalementing algorithms for ranking and retrieving
search terms from indices.
Module leader

  1. Assignment guidance

UNIVERSITY OF LEEDS | SCHOOL OF COMPUTER SCIENCE
In this coursework, you will develop a search tool that can:
1) Crawl the pages of a website.
2) Create an inverted index of all word occurrences in the pages of the website. 3) Allow the user to find pages containing certain search terms.

  1. Assessment tasks
    The website you will use for this project is https://quotes.toscrape.com/. This website contains a collection of common quotes. The website was purpose-built to allow people learn web scraping. You must observe a politeness window of at least 6 seconds between successive requests to the website. An inverted index that stores the frequency of occurrence of each word in each page must be created by the tool as it crawls the pages of the website.
    Using the search tool, the user should be able to find pages containing individual words such as ‘Jane’, or a combination of two or more words such as ‘Jane Austin’, or ‘admit human mistakes’.
    The search tool is to be command line-driven and must provide the following commands:
    build
    This command instructs the search tool to crawl the website, build the index, and save the resulting index into the file system. For simplicity you can save the entire index into one file.
    load
    This command loads the index from the file system. Obviously, this command will only work if the index has previously been created using the ‘build’ command.
    print
    This command prints the inverted index for a particular word, for example:
    print nonsense
    will print the inverted index for the word ‘nonsense.’
    find
    This command is used to find a certain query phrase in the inverted index and returns a list of all pages containing this phrase, for example:
    find indifference
    will return a list of all pages containing the word ‘indifference, while
    find good friends
    will return all pages containing the words ‘good and ‘friends.’
    For simplicity assume that the search is not case sensitive, so ‘Good’ is the same word as ‘good’.
    You should use Python to implement the search tool. It is also strongly recommended that you use the ‘Requests’ library (http://docs.python-requests.org/en/master/) for composing requests, and the ‘Beautiful Soup’ library (https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to parse HTML pages.
  2. General guidance and study support
    Detailed information and guidance on building search tools are in the learning resources of the module on Minerva.

    UNIVERSITY OF LEEDS | SCHOOL OF COMPUTER SCIENCE

  3. Assessment criteria and marking process
    Your implementation will be assessed according to both functional and quality requirements. As a minimum, your implementation should provide all the functionality mentioned above. In addition, the quality of the implementation particularly the data presentation and robustness of the application interface will affect the overall mark.
  4. Presentation and referencing
    The quality of written English will be assessed in this work. As a minimum, you must ensure:
    • Paragraphs are used
    • There are links between and within paragraphs although these may be ineffective at times
    • There are (at least) attempts at referencing
    • Word choice and grammar do not seriously undermine the meaning and comprehensibility of the
    argument
    • Word choice and grammar are generally appropriate to an academic text
    These are pass/ fail criteria. So irrespective of marks awarded elsewhere, if you do not meet these criteria, you will fail overall.
  5. Submission requirements
    You will submit your codebase to Minerva. Also, upload the index file that was compiled by the search tool and any other auxiliary files.
    As part of your submission, you should also submit a brief report that clearly, yet briefly, describes how you implemented each aspect of the tool. For example, the data structures, methods and algorithms you have used in 1) crawling the website, 2) creating the inverted index, and 3) computing the scores of pages when processing a search query. The report should also include brief instructions on how to invoke and use the tool. Please do NOT fill your report by copying text from online resources, such as tutorials or lecture slides, as we are only interested in understanding what you have done yourself in this assignment.
  6. Academic misconduct and plagiarism
    Leeds students are part of an academic community that shares ideas and develops new ones.
    You need to learn how to work with others, how to interpret and present other people's ideas, and how to produce your own independent academic work. It is essential that you can distinguish between other people's work and your own, and correctly acknowledge other people's work.
    All students new to the University are expected to complete an online Academic Integrity tutorial and test, and all Leeds students should ensure that they are aware of the principles of Academic integrity.
    When you submit work for assessment it is expected that it will meet the University’s academic integrity standards.
    If you do not understand what these standards are, or how they apply to your work, then please ask the module teaching staff for further guidance.

    UNIVERSITY OF LEEDS | SCHOOL OF COMPUTER SCIENCE
    Use of Gen AI (Generative Artificial Intelligence):
    This assessment is red category. AI tools cannot be used.
    By submitting this assignment, you are confirming that the work is a true expression of your own work and ideas and that you have given credit to others where their work has contributed to yours.

  7. Assessment/ marking criteria grid
    The tool successfully crawls all the pages of the website
    The tool successfully creates the inverted index for the whole website The tool can store then load the inverted index to/from the file system The tool prints the inverted list for a certain word
    The tool can correctly find pages containing search terms
    The quality of the report
    (6 marks) (5 marks) (4 marks) (3 marks) (8 marks) (4 marks)
    WX:codinghelp

wxdkkcb3
1 声望0 粉丝