AI Safety

Addressing Catastrophic Risks in AI Development

Addressing Catastrophic Risks in AI Development As artificial intelligence continues to advance at an unprecedented rate, it’s crucial that we address the potential catastrophic risks associated with its development. At Trajectory Labs, we’re committed to exploring and implementing safeguards to ensure that AI remains aligned with human values and interests. Key Areas of Concern Alignment Problem: Ensuring AI systems’ goals and actions align with human values. Robustness: Developing AI that performs reliably in various environments and scenarios. Interpretability: Creating AI systems whose decision-making processes can be understood and audited by humans. Our Approach At Trajectory Labs, we’re tackling these challenges through: ...

An Overview of the Frontier AI Labs

Link to sign up Getting here: Enter the lobby at 100 University Ave (right next to St Andrew subway station), and message Giles Edkins on the meetup app or call him on 647-823-4865 to be let up to room 6H. What are the different frontier AI labs’ attitudes towards safety? What kinds of safety agenda (if any) are they prioritizing? Annie will guide us. We welcome a variety of backgrounds, opinions and experience levels. ...

Epoch AI: Can AI Scaling Continue Through 2030?

Link to sign up Getting here: Enter the lobby at 100 University Ave (right next to St Andrew subway station), and message Giles Edkins on the meetup app or call him on 647-823-4865 to be let up to room 6H. Epoch AI recently released a report estimating the amount AI training runs will continue to scale up by 2030. They could get big, and expensive, and deliver a lot in terms of increased capabilities. ...

Hallucination Detection & Interpretability

Link to sign up Getting here: Enter the lobby at 100 University Ave (right next to St Andrew subway station), and message Giles Edkins on the meetup app or call him on 647-823-4865 to be let up to room 6H. We all know that Large Language Models (LLMs) can confidently emit falsehoods, a phenomenon known as hallucination. Joshua Carpeggiani will tell us about some interpretability methods - peering into the insides of the model and making sense of what we see - that might help detect and correct hallucinations. ...

How to Contribute to AI Risk Reduction

Link to sign up Getting here: Enter the lobby at 100 University Ave (right next to St Andrew subway station), and message Giles Edkins on the meetup app or call him on 647-823-4865 to be let up to room 6H. Some of you want to help out, and that’s great! Giles will guide through some of the best resources available today for those of us wanting to address the most severe harms from AI. The focus will be on how to get doing cool stuff quickly, and how to stay networked with the community. ...