Giorgio Strano

CS PhD student at Sapienza University

./whoami

Hi! I’m Giorgio. I am a first year PhD student at Sapienza, in Rome, at the Gladia lab under supervision of prof. Emanuele Rodolà.

Currently, I am working on audio generative models, with special focus on controllability, conditioning sources and strategies, and human interpretation of audio latent spaces.

I am also working as machine learning engineer at TrueSound AI, a deep tech startup founded by my professor, focused on generative models for musicians.

After dark, I am a passionate music maker and listener, lover of photogaraphy and experimental art.

I love logic a bit too much, reverse engineering, solving problems and putting my skills to the test whenever I can.

~/uni

After completing with honors my Bachelor’s in Computer Science at Sapienza in 2021, I continued with a Master’s degree especially focused on theoretical computer science and artificial intelligence, with a particular interest in deep learning.

I graduated with honors from my Master’s in June 2024, and as of November 2024, I just started my PhD.

Bachelor’s thesis: Automated generation of optimized nural networks through reinforcement learning. Master’s thesis: Transformers that listen: latent autoregressive accompaniment generation.

Some of the topics that I care about the most in computer science include:

  • Multimodal deep learning, with particular interest on latent space analysis and interpretation, fully convolutional architectures, VQ-VAEs, diffusion models and transformers
  • Advanced algorithms
  • Graph theory
  • Intensive computation and quantum computing
  • Natural language processing
  • Computer graphics, especially physically based volumetric path-tracing

Here follows a list of some of my favorite projects from the last couple of years, with a brief description. Code, results, and other projects are on my GitHub.

~/dev

LAG - Latent Accompaniment Generation

private code

Started as my Master’s thesis project, I adapted and fine-tuned a transformer model to generate musical accompaniments. Currently, it vastly outperforms state-of-the-art open source models on the same task, and I am currently perfecting the model and preparing experiments for a publication.

Tripod is a small and portable, fully-convolutional deep learning model to sharpen and correct the focus of real-world photographs.

A physically based volumetric path tracer written in Julia from scratch with my friend and colleague Antonio Gargiulo. It is inspired from Yocto/GL, the rendering engine developed by our professor, Fabio Pellacini. It renders complex 3D scenes accurately, with almost negligible slowdown compared to a fairly optimized equivalently capable C++ implementation.

This is a re-implementation, with more experiments, and extended to different videogame environments of the paper World Models.

During my NLP course, held by Roberto Navigli, I tackled the challenge of GAP-coreference, achieving results very close to state-of-the-art with a distilled transformer that could fit in 8GB of VRAM.

A transformer-free approach to named entity resolution, using bidirectional LSTMs on different type of non-contextualized embeddings (W2V, Glove), improved with character embedding and a Conditional Random Field (CRF).

An implementation from scratch (…meaning from vanilla pytorch) of the Double Deep Q Learning algorithm, applied to the classic first ever Super Mario Bros videogame for the NES.

ANNRL

private code

For my Bachelor’s thesis I devloped this project with the Vision Lab of Sapienza: Adaptive Neural Networks Via Reinforcement Learning. It explores the idea of networks able to automatically resize during training to maximize the efficiency of crucial neurons and reduce the overhead of inactive ones, achieving automatic in-itinere pruning and extension of networks based on the complexity of the task.