WHO ARE WE?
The MAGICS LAB was founded in 2018 by Drs. Brizan (Computer Science), Intrevado (Data Science), and Malensek (Computer Science) at the University of San Francisco. We meet weekly with undergraduate and graduate students to work on applied research.
WHAT DO WE DO?
We work on projects that have great personal interest to us, including:
- Detecting Bias is News Organisations
- Classifying Songs by Genre
- Improving upon the Yelp recommendation engine for restaurants
- Analysing personal and cultural/demographic information transmitted during speech and typing
- Helping predict the spread of wildfires
HOW DO WE DO IT?
Machine learning, natural language processing, deep leaning, and several other techniques using Python, Java, C/C++/C#, SQL/MySQL/MariaDB, MongoDB, Spark and our very own GPUs.
David Guy Brizan, Ph.D.
David graduated with a Ph.D. in Computer Science from CUNY's Graduate Center. His research interests are natural language processing and machine learning, specifically on the personal and cultural/demographic information transmitted during speech and typing. This research may lead to more accurate speech recognition systems. Prior to joining USF, David was a research assistant in the Speech Lab at Queens College, an instructor at Hunter College, and he has previously worked for the City of New York, IBM, United HealthCare and other public and private institutions.
Paul Intrevado, Ph.D.
Paul is an Assistant Professor of Data Science at the University of San Francisco. He holds a Ph.D. in Operations Management, and enjoys working on applied research. He traditionally works in Python, R and SQL, and is interested in all methods data science, from traditional inferential statistics to machine learning to deep learning.
Matthew Malensek, Ph.D.
Matthew Malensek received his Ph.D. in computer science from Colorado State University. His research interests are centered around systems approaches to data science, with a focus on scalable analytics, storage and management of voluminous data streams, and cloud/edge computing. These projects span domains such as atmospheric science, epidemiology, and geographic information systems.
Quantum Criticism: Exploring Sentiment of Named Entities in News Articles [2018, Poster]
6th International Conference on Statistical Language and Speech Processing. P. Intrevado, T. Sun, A.Z. Wong, D.G. Brizan. 
Scraping RSS news feeds from five media outlets traditionally identified at different points on the political continuum---The Atlantic, the BBC, Fox News, the New York Times and Slate---we explore, examine and quantify differences in coverage. Using the Stanford named entity recognizer (NER) , we identify people, organizations and locations across articles, and compute a micro-average F-score of 69% for both TEXT and TYPE. Initial observations indicate that the precision and recall rates of the Stanford NER are significantly higher when applied to American-produced media (both approximately 80%) in comparison to internationally-produced media (both approximately 50%) , suggesting that the NER could benefit from training on additional international media corpora. We intend to further explore and quantify this behavior with a larger and more diverse set of international media corpora. Moreover, precision and recall seem to be clustered based on the editorial style of a given new organization.
We also ran custom entity resolution algorithms on all articles and sampled articles for cluster completeness and homogeneity. Likely due to the relatively regular way in which news articles are written, we find that a trivial algorithm is able to achieve perfect homogeneity and, with a total of 3 errors, a v-measure score of 0.9863.
Performing sentiment analysis on the corpora using VADER, interesting counter-intuitive results percolate. It is common knowledge that traditional media outlets tend to write largely about negative events (e.g., kidnappings, war, political fights, etc.) and far less about good deeds or events. At scale, this implies that a news organization such as the New York Times (NYT) would generate orders of magnitude more critical new articles about New York City (NYC) than other national or international news organizations might, owing to their geographic, political and cultural ties to the city. Yet the sentiment associated with NYC is strongly positive for the New York Times, compared with international organizations that generate almost exclusively negative sentiment when writing about NYC. We hypothesize that this behavior is attributed to the significant number of positive articles in the NYT associated with theater, museums, etc., which are infrequently or not at all reported on by international media outlets. Future research will parse the corpus in an effort to explicitly compare the sentiment on a particular city across multiple media outlets, while controlling for local or municipal topics, ensuring a more a more robust matching of media reporting.
The Seven Critical Axes of Information For Yelp Restaurant Reviews [2020, Full Length Paper]
International Conference on Machine Learning and Applications. K. Sonar, P. Intrevado, D.G. Brizan. 
There are currently two classes of user-generated information posted on Yelp for a given establishment: free-form textual reviews, as well as an overall star rating—a review system that is simultaneously overly-aggregate and overly-granular. Aiming for a user-friendly middle-ground, this research combines topic modeling, part-of-speech matching and filtering, similarity scoring and sentiment analysis to create a pipeline whereby free-form textual restaurant Yelp reviews are used to dynamically build a set of categories which are of highest importance to restaurant patrons. With seven dynamically generated critical categories established—food, service, price, ambiance, occasion, booking and recommend—restaurants are rated across those categories to better inform future patrons about the desirable and less pleasant features of a given dining establishment. The model is trained using 1.25m Yelp restaurant reviews for 7,137 restaurants in Las Vegas, and validated with reviews from two other metropolitan areas. Modeling results are validated, correlating strongly with Yelp star reviews (0.76). An improved customer-facing Yelp interface is also designed and proposed.
Quantum Criticism: A Tagged News Corpus Analyzed for Sentiment and Named Entities [2020, Full Length Paper]
nInternational Conference on Natural Language Computing Advances. A. Badgujar, S. Chen, A. Wang, K. Yu, P. Intrevado, D.G. Brizan. 
We continuously collect data from the RSS feeds of several traditional online news sources. We apply several pre-trained implementations of named entity recognition (NER) tools, quantifying the success of each implementation. We also perform sentiment analysis of each news article at the document, paragraph and sentence level, with the goal of creating a corpus of tagged news articles that is made available to the public through a web interface. Finally, we show how the data in this corpus can be employed to identify bias in news reporting.
Created with images by Shahadat Shemul - "untitled image" • 6689062 - "business computer mobile" • Mr Cup / Fabien Barral - "untitled image" • Mgg Vitchakorn - "foods on the table" • Markus Winkler - "News"