About Me
I am a Senior Software Engineer at Couchbase Inc..
My work targets the storage engines in Big Data Management Systems. In particular, I work on reducing the storage size to accelerate scan-based analytical workloads for document store systems.
Education
PhD, Computer Science
2017 - 2022
University of California, Irvine
Thesis: Towards Analytics-optimized Document Stores
Supervisor: Michael J. Carey
MSc, Computer Science
2011 - 2013
University of California, Irvine
Thesis: NUMA-aware Multicore Matrix Multiplication
Supervisor: Isaac D. Scherson
BSc, Computer Science
2004 - 2008
King Saud University
Experience
Senior Software Engineer
August 2022 - Present
Couchbase Inc.
www.couchbase.comRole: Working on several optimizations for storage and query processing
Software Engineer Intern
June 2021 - Septemeber 2021
Couchbase Inc.
www.couchbase.comTask: Added the support to query Parquet files from various sources (e.g., S3) in Couchbase Analytics with the support to pushdown field accesses.
Committer
2016–Present
Research Associate
Research Affiliate
2014–2017
King Abdulaziz City for Science and Technology (KACST)
Massachusetts Institute of Technology (MIT)
cces.kacst.edu.saCenter for Complex Engineering Systems (CCES) – Institute for Data, Systems and Society (IDSS)
Role: Developing tools capable of harnessing and analyzing large-scale data
Projects: AsterixDB-Spark Connector, CityDynamics, Connected Intelligence Platform, Integrated Transportation Systems, Innovation Space
Associate Software Engineer
2008–2009
Advanced Electronics Company (AEC)
Research and Development Department (R&D)
Role: Developing components that connect electric and water smart-meters to the data collection units
Projects: ADDAD4, Water Smart Meter
Projects
LSM-based Tuple Compaction Framework
We proposed a new mechanism to leverage LSM-lifecycle events to infer the
schema and semantically compact self-describing semi-structured records
automatically. We also introduced a novel semi-structured record physical
format for efficient construction and compaction. Using Apache AsterixDB, we
were able (combined with our implementation of page-level compression) to
reduce the storage size by 9.8x and improve the query performance by the same
factor.
Paper: Extended version in arXiv
Columnar Formats for Schemaless LSM-based Document Stores
In this project, we propose several techniques based on piggy-backing on Log-Structured
Merge (LSM) tree events and tailored to document stores to store document data in a
columnar layout. We first extend the Dremel format,
a popular on-disk columnar format for semi-structured data, to comply with document stores’ flexible data model.
We then introduce two columnar layouts for organizing and storing data in LSM-based storage.
Paper: Extended version in arXiv
A Code Generation Technique for Schema-on-read Databases using Truffle
In this project, We shed light on the possibility of using query compilation techniques for document stores, where value types are not known until runtime. We utilize the Oracle Truffle to implement an internal language for processing data stored in a Java-based document store. Even though we only translate part of a query plan, our evaluations show a tremendous improvement over AsterixDB’s Vectorized model (or batch-at-a-time model).
Awards
Ph.D Schoalrship
2017–2022
Awarded full graduate scholarship from the King Abdulaziz City for Science and Technology.
MSc Schoalrship
2010-2013
Awarded full graduate scholarship from King Abdullah Scholarship Program.
Second Class Honor
2004-2008
Awarded Second Class Honor for high GPA from King Saud University
Publications
W. Alkowaileet & M. J. Carey.
Columnar Formats for Schemaless LSM-based Document Stores
PVLDB, 15(10), 2085-2097, 2022, Extended Version
W. Alkowaileet, S. Alsubaiee & M. J. Carey.
An LSM-based Tuple Compaction Framework for Apache AsterixDB
PVLDB, 13(9), 1388-1400, 2020, Extended Version
W. Alkowaileet, S. Alsubaiee, M. J. Carey, C. Li, P. Sinthong & W. Wang.
End-to-End Machine Learning with Apache AsterixDB
In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018
W. Alkowaileet, S. Alsubaiee, M. J. Carey, T. Westmann & Y. Bu.
Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark
PVLDB, 9(13), 1585-1588, 2016 (Demo)
W. Alkowaileet, D. Carrillo-Cisneros, D. Lim & I. D. Scherson.
NUMA-aware Multicore Matrix Multiplication
Parallel Processing Letters, 24(04), 1450006, 2014