ART TICKER-DISCOVERING EMERGING ARTISTS ON THE WEB ART TICKER-DISCOVERING EMERGING ARTISTS ON THE WEB
Saad Patel, Saad Patel
unpublished
Thesis Director: Tomasz Imielinski Considering the large number of artists that exist, there is valuable talent to be discovered. But the question arises, how to find promising and emerging artists from hundreds of thousands of names listed among many different aggregations websites such as art-facts.org, and thousands of art galleries? We introduce an application named ArtTicker which uses many features of Machine Learning, Information Retrieval, Data Mining and Text Mining to crawl, rank, and
more »
... analyze artists and their popularity on the web. We start by identifying names of artists who are not yet listed in large aggregate directories (such as artfacts) but are already represented by some galleries. This task requires crawling and extraction of artist names from thousands of art galleries. These web sites share a lot of common structures, however there is also significant variety among them and artist name extraction requires complex heuristics. We harvest thousands of artist names this way. Then we enter the second phase of the project ranking this artists by their web presence. ii Since the wealth of any data mining model is the actual data, the data collection period consisted of extensive crawling from a vast number of news publication websites. To this end we gather and cluster news from several leading art related news websites and also use many signals to rank and classify these art news sources. The artistss score is based on how significantly an individual artist was featured in the art news stream of articles. The final objective of finding the emerging artists is met by identifying the names which are present on gallery web sites, have high media presence (high score) and are not listed yet on the artist aggregate sites. The working prototype analyzes over 150 sources in English language but can be easily extended based on automatically crawling and analyzing related sources. It currently holds over 250,000 artists and over 70,000 articles from all these news sources. In essence, this is a streaming application for which given any geographic area (say Lower Manhattan) identifies the hottest artists who are not yet known. iii Acknowledgements Firstly, I would like to express my sincere gratitude to my advisor Prof. Tomasz Imielin-ski for the continuous support of my Masters study and related research, for his patience, motivation, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor for my Masters study.
fatcat:5ybwivocc5hebi7fjuanjd27si