Metadata Scraping Using Programmable Customized Search Engine | ||
IRAQI JOURNAL OF COMPUTERS, COMMUNICATIONS, CONTROL AND SYSTEMS ENGINEERING | ||
Article 2, Volume 23, Issue 3, September 2023, Pages 10-25 PDF (1.11 M) | ||
Document Type: Research Paper | ||
DOI: https://doi.org/10.33103/uot.ijccce.23.3.2 | ||
Authors | ||
Esraa Q. Naamha* 1; Matheel E. Abdulmunim2 | ||
1Department of Computer Science, Technology University,Baghdad, Iraq | ||
2Department of Computer Science, Technology University, Baghdad, Iraq | ||
Abstract | ||
The World Wide Web (WWW) is a vast repository of knowledge, including intellectual, social, financial, and security-related data. Online information is typically accessed for instructional purposes. On the internet, information is accessible in a variety of formats and access interfaces. Because of this, indexing or semantic processing of the data via websites may be difficult. The method that seeks to resolve this issue is web data scraping. Unstructured web data can be converted into structured data using web data scraping so that it can be stored and examined in a central local database or spreadsheet. This paper offers a metadata scraping using a programmable Customized Search Engine (CSE) system, which can extract metadata from web pages (HTML pages) in the Google database and save it in an XML format for later analysis and retrieval. Documents that contain metadata are a relatively recent phenomenon on the web and increase the likelihood that users will find the information they need. | ||
Keywords | ||
Programmable (CSE); JSON API; API key; metadata scraping | ||
Statistics Article View: 83 PDF Download: 43 |