Guilherme S. Imai Aldeia, MSc, PhD
Guilherme earned his PhD in December 2025 and researches genetic programming and large language models for health applications. He completed three majors at a top Brazilian federal university known for academic quality and international partnerships. His work includes developing methods for symbolic regression, applying machine learning to fMRI data, and advancing explainable AI. He has experience in digital signal processing and has led computational experiments resulting in publications. His interests include evolutionary computing, model interpretability, healthcare prediction, brain data analysis, and projects that improve quality of life.
EDUCATION
Universidade Federal do ABC
Universidade Federal do ABC
Universidade Federal do ABC
Universidade Federal do ABC
Universidade Federal do ABC
EXPERIENCE
Boston Children's Hospital, Harvard Medical School
- Postdoctoral researcher at Cavalab, BCH, and Pediatrics, HMS. Contributed to multiple research projects.
Universidad de Valparaíso
- Attended a three-week summer school. Developed a project simulating neuron populations in the rat visual cortex, based on literature review of neuron types and pathways.
FIAP
- Taught AI and ML courses online and in person. Prepared materials, exams, and mentored annual projects in partnership with tech companies. Emphasized hands-on, practical learning.
Boston Children's Hospital
- Participated in lab meetings, journal clubs, and health science projects. Worked with data, received HIPAA/PHI training, contributed to grant reports, and gained SLURM experience.
Universidade Federal do ABC
- Assisted in Algorithm Complexity Course. Prepared exercises and answer keys, taught mathematical proofs, and provided weekly student support.
Universidade Federal do ABC
- Developed programming exercises and answer keys in Python and Java. Created test cases for automated grading. Collaborated weekly with the professor on course content.
Universidade Federal do ABC
- Developed Arduino code for sensor integration and telemetry in rocket embedded systems. Managed flight data collection, communication, and recovery system activation.
Universidade Federal do ABC
- Conducted research in genetic programming for symbolic regression. Published and collaborated on multiple projects. Designed experiments and analyzed results.
RESEARCH SUMMARY
- 14 publications, 190+ citations since 2020.
- Two journal articles (IEEE Trans. Evol. Comput., Genet. Program. Evolvable Mach.)
- Presented at international conferences (GECCO, ML4HC, WCCI CEC).
- 12 journal and 13 conference reviews since 2022.
- Journals: IEEE Trans. Evol. Comput., Royal Society.
- Conferences: IEEE WCCI, IEEE CAI.
- Evaluator for three years at Feira Brasileira de Ciência e Engenharia (FEBRACE), the largest Latin American science fair.
- Contributed to Met@Aprendizagem, a project to improve programming skills for undergrad students of different majors.
- Volunteered for two years at UFABC para todos, encouraging students to pursue higher education.
TECHNICAL SKILLS
- Fluent in Python, C, C++, Julia, Java, JavaScript, Arduino.
- Experienced in Haskell, Matlab, R, Octave, TypeScript, HTML, CSS, SASS, PHP, C#, .NET, SQL.
- Comfortable with object-oriented, functional, imperative, descriptive, meta-programming, and multiple dispatch paradigms.
- Linux user for 10+ years (Bash, system management, virtual environments).
- Git for collaborative development, project management, code review.
- LaTeX for writing and formatting; cloud computing with SSH, Docker, SLURM.
- Quick learner, critical thinking, problem-solving, adaptability, proactive, self-motivated.
- Data analysis and visualization (advanced plotting of multimodal data).
- Statistics (t-test, ANOVA, p-value correction, sample size, confidence intervals, Fisher information).
- Big data: mining, pipelines, standardization, harmonization, structured and unstructured databases.
- Integrations: building and consuming REST APIs, webhooks, CI/CD.
- Complex projects: contributed to codebases with 100+ files and 30k+ lines. Experience with test-driven development and large test suites. Developed and published Python libraries.
- Reproducibility: reading papers, source code, and replicating computational experiments.
- Portuguese (native), English (fluent), Spanish (basic).
PROJECTS
- Developed computable phenotypes and medical calculators from EHR data using large language models. Evaluated both proprietary and open-source models for medical tasks. Combined black-box optimization with LLM-generated code to improve sensitivity and specificity.
- Generated computable phenotypes for hypertension from EHR data using symbolic regression and language models. Designed and evaluated prompts, and combined symbolic regression with LLMs for code generation. Assessed phenotype performance on EHR data.
- Simulated neural populations to estimate Fisher information and study how correlations encode auditory stimuli. Modeled brain pathways using neuron populations with varying firing rates.
- Trained and interpreted machine learning models on pediatric fMRI data to study pain mechanisms. Used data augmentation, various processing pipelines, and explainable AI to identify key brain regions. Validated findings by predicting brain age and pain scores.
- Enhanced accuracy and interpretability in symbolic regression. Developed new methods for parent selection, parameter optimization, and function simplification. Conducted large-scale benchmarking to assess computational cost and solution quality.
- Developed new explanation methods for symbolic regression using partial effects. Created a Python library for symbolic regression and robust explanations. Benchmarked explanatory methods for stability and reliability in feature importance.
- Predicted brain development three years ahead using graph theory and fMRI data. Applied machine learning to functional connectivity and graph centralities. Prepared fMRI data using standard preprocessing techniques. Predicted future psychopathology scores.
- Applied seismic signal processing to localize sound sources using an array of eight microphones. Designed algorithms to identify victims in hard-to-access areas for drone-assisted rescue. Performed sensitivity analysis and submitted to IEEE DSP competition.
- Proposed the Interaction-Transformation Evolutionary Algorithm (ITEA) for symbolic regression. Generated closed-form expressions and recovered known physics equations from data. Developed an online interface for model editing and visualization.
AWARDS
Association for Computing Machinery
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Association for Computing Machinery
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Universidade Federal do ABC
Brazilian Symposium on Information Systems (CTCCSI)
Universidade Federal do ABC
PUBLICATIONS
Aldeia, G. S. I., Romano, J. D., de França, F. O., Herman, D. S., & La Cava, W. G. (2025). Towards symbolic regression for interpretable clinical decision scores.
Aldeia, G. S. I., Herman, D. S., & La Cava, W. G. (2025). Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models. Proceedings of Machine Learning Research, 298, 1–31. https://proceedings.mlr.press/v298/aldeia25a.html
Aldeia, G. S. I., Zhang, H., Bomarito, G., Cranmer, M., Fonseca, A., Burlacu, B., La Cava, W. G., & de França, F. O. (2025). Call for Action: towards the next generation of symbolic regression benchmark. Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ’25 Companion), 2529–2538. https://doi.org/10.1145/3712255.3734309
Aldeia, G. S. I., Moon, C., Shulman, J., Sethna, N., Smith, A., Lebel, A., La Cava, W. G., & Holmes, S. (2025). Application of Artificial Neural Networks and Functional Brain Connectivity to Inform Pediatric Headache. The Journal of Pain, 29. https://doi.org/10.1016/j.jpain.2025.105140
Aldeia, G. S. I., de França, F. O., & La Cava, W. G. (2024). Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’24), 896–904. https://doi.org/10.1145/3638529.3654147
Aldeia, G. S. I., de França, F. O., & La Cava, W. G. (2024). Minimum variance threshold for epsilon-lexicase selection. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’24), 905–913. https://doi.org/10.1145/3638529.3654149
Aldeia, G. S. I., & de França, F. O. (2022). Interpretability in symbolic regression: a benchmark of explanatory methods using the Feynman data set. Genetic Programming and Evolvable Machines, 23, 309–349. https://doi.org/10.1007/s10710-022-09435-x
Aldeia, G. S. I., & de França, F. O. (2022). Interaction-transformation evolutionary algorithm with coefficients optimization. Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ’22), 2274–2281. https://doi.org/10.1145/3520304.3533987
de França, F. O., & Aldeia, G. S. I. (2021). Interaction–Transformation Evolutionary Algorithm for Symbolic Regression. Evolutionary Computation, 29(3), 367–390. https://doi.org/10.1162/evco_a_00285
Aldeia, G. S. I., & de França, F. O. (2020). A Parametric Study of Interaction-Transformation Evolutionary Algorithm for Symbolic Regression. IEEE Congress on Evolutionary Computation (CEC), 1–8. https://doi.org/10.1109/CEC48606.2020.9185521
Spadini, T., Aldeia, G. S. I., & others. (2019). On the application of SEGAN for the attenuation of the ego-noise in the speech sound source localization problem. 2019 Workshop on Communication Networks and Power Systems (WCNPS), 1–4. https://doi.org/10.1109/WCNPS.2019.8896308
Aldeia, G. S. I., & de França, F. O. (2018). Lightweight Symbolic Regression with the Interaction-Transformation Representation. IEEE Congress on Evolutionary Computation (CEC), 1–8. https://doi.org/10.1109/CEC.2018.8477951