Petros Zerfos

Overview

Petros Zerfos

Title

Principal Research Staff Member & Manager - Data Engineering for Generative AI

Location

IBM Research - Yorktown Heights Yorktown Heights, NY USA

Bio

I am principal research scientist and manager of the data engineering for watsonx group at IBM Research, consisting of research staff members and senior software engineers that conduct basic and applied research in large-scale data engineering for large language models (Generative AI), BigData platforms & applications, and applied machine learning. In 2016, I have been designated as IBM Master Inventor.

Open source Data Prep Kit framework, toolkit and transforms for data preparation of unstructured data (text & code) for LLM application developers.

  • Production-ready and scales from a laptop to a datacenter
  • Pure-Python / Ray / Spark runtimes
  • Low-code/no-code pipeline orchestration through cloud-native KubeFlow Pipelines

Used in the preparation of data for the training of IBM's Granite language & code models: https://www.ibm.com/granite

Transferred and productized technology that my group and I have developed in the following products / SaaS offerings:

  1. IBM Watson IoT Platform - Edge Analytics Agent (transferred the software library for embeddable, streaming time series predictive analytics)
  2. IBM Cloud Operations Analytics - Predictive Insights (developed the software library with time series forecasting models used for dynamic baselining and prediction: video)
  3. IBM InfoSphere Streams, Time Series Analysis Tookit for Streams (developed streaming time series forecasting models including Holt-Winters, ARIMA, BATS, seasonal-trend decomposition for Streams: link)
  4. IBM Regulatory Compliance Analytics (we developed the cognitive service that analyzes regulations and detects obligations/requirements: video) and IBM Watson Compare & Comply
  5. IBM Cloud Infrastructure for Analytics - Secure Hadoop (link)

My research interests lie in the areas of:

  • BigData and Machine Learning Platforms and Applications, Analysis of Time Series and Unstructured Data
  • Cloud Computing and Service Management
  • Mobile Systems, Applications and Services

List of my patents and publications can be found here

- our recent work on 'The Squawk Bot: Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering' has been accepted for publication at the 29th International Joint Conference on Artificial Intelligence (IJCAI-PRICAI 2020), Special Track on 'AI in FinTech', in Jun. 2020! 

- our recent work on 'seq2graph: Discovering Dynamic Dependencies from Multivariate Time Series' has received the Best Application Paper Award at the IEEE International Conference on BigData 2019 (BigData 2019)! A pre-print can be found in arXiv here.

- Our paper 'Dependency Analysis of Cloud Applications for Performance Monitoring using Recurrent Neural Networks' has been accepted for publication at IEEE BigData 2017 conference.

- Out paper 'Ranking the Importance of Ontology Concepts Using Document Summarization Techniques' was presented at IEEE BigData2017 conference.

- Our paper 'Data-at-Rest Security for Apache Spark' (with Syed Yousaf Shah and Brent Paulovicks) received the Best Application Paper Award at the IEEE International Conference on BigData 2016.

- In 2009, I co-founded and served as General co-Chair of the International Conference on Mobile Computing, Applications and Services (MobiCASE 2009), for which I currently serve in its steering committee (2010-2016).

- I served as Guest co-Editor of the ACM/Springer Mobile Networks and Applications (MONET) Journal, Special Issue on Advances in Mobile Applications and Provisioned Services (2011) and Associate Editor of ACM/Springer Wireless Networks Journal (2010-2011).

- I received a Ph.D. and a M.Sc. in Computer Science from UCLA in 2005 and 2003 respectively, and a M.Eng. in Electrical Computer Engineering from the National Technical University of Athens (NTUA), Greece in 1999.

More details can be found in my CV (last update: Apr. 2020)

Publications

Patents

Top collaborators

AV
An Vo

An Vo

Staff Research Scientist - AI4Code - AI for IT Automation
AS
Amith Singhee

Amith Singhee

Director, IBM Research India; CTO, IBM India / South Asia
YZ
Yi Zhou

Yi Zhou

Research Staff Member, Master Inventor