Metadata Scraping Using Programmable Customized Search Engine | ||
IRAQI JOURNAL OF COMPUTERS, COMMUNICATIONS, CONTROL AND SYSTEMS ENGINEERING | ||
Articles in Press, Accepted Manuscript, Available Online from 08 May 2023 | ||
Document Type: Research Paper | ||
Authors | ||
Esraa Qasim* 1; Matheel Abdulmunim2 | ||
1Department of Computer Science, Technology University,Baghdad, Iraq | ||
2Department of Computer Science, Technology University, Baghdad, Iraq | ||
Abstract | ||
The World Wide Web (WWW) is a vast repository of knowledge, including intellectual, social, financial, and security-related data. Online information is typically accessed for instructional purposes. On the internet, information is accessible in a variety of formats and access interfaces. Because of this, indexing or semantic processing of the data via websites may be difficult. The method that seeks to resolve this issue is web data scraping. Unstructured web data can be converted into structured data using web data scraping so that it can be stored and examined in a central local database or spreadsheet. This paper offers a metadata scraping using a programmable Customized Search Engine (CSE) system, which can extract metadata from web pages (HTML pages) in the Google database and save it in an XML format for later analysis and retrieval. Documents that contain metadata are a relatively recent phenomenon on the web and increase the likelihood that users will find the information they need. | ||
Keywords | ||
Programmable (CSE),,; ,،,؛JSON API,,; ,،,؛API key,,; ,،,؛metadata scraping | ||
Statistics Article View: 4 |