Attention Is All You Need
A lot of the conversation around AI centers on two questions.
What is this going to replace?
How do I keep from falling behind?
Those questions matter. But the question that’s been on my mind lately is different.
What is this teaching us about ourselves?
One of my first AI projects was a custom search engine for SQSPThemes. This was in 2023. I thought I was building a better way for people to find tutorials, products, and answers. But the project opened another door. Suddenly, I was reading papers, tracing ideas, and watching technical conversations unfold in real time.
It made me realize the tools were not the most interesting part. The ideas underneath them were.
The first idea that really grabbed me was embeddings.
I had used search my whole life, but this was different. Traditional search looked for matching words. Vector search looked for patterns of similarity. It could find things that belonged near each other even when they didn’t use the same language.
A phrase, a paragraph, a document, a question — all of it could be turned into an embedding. A point in a high-dimensional space.
That idea stayed with me because it suggested something bigger than search.
Meaning could be mapped.
Similar ideas could live near each other.
A search engine could return something relevant because it understood the neighborhood of meaning, not just the exact words.
That felt like more than a technical shift. It felt like a philosophical one. Once I started thinking about meaning as relationship, I kept seeing it everywhere.
That thread eventually led me back to the 2017 paper Attention Is All You Need.
The breakthrough
At the time, one of the big problems in AI was machine translation.
How do you get a machine to take a sentence in one language and produce the equivalent in another?
That sounds simple until you think about what translation actually requires.
Take the word “bank.”
Next to “loan,” it points to money.
Next to “river,” it points to land.
Same word. Different meaning.
The word didn’t change. The relationship changed.
That is how language works. A word gets its meaning from what surrounds it. To translate well, the machine has to understand those relationships.
Before the Transformer, many language models moved through language in sequence or searched for meaning through smaller windows. That worked, but it had a limitation. A word near the beginning of a sentence can change the meaning of a word much later, and the system could lose that relationship along the way.
Then came Attention Is All You Need.
The paper proposed a different approach. Instead of treating language like a line, the model could treat it more like a field. Each word could be understood by weighing its relationship to the other words around it.
Some words mattered more than others. Some changed the meaning completely.
The paper called this mechanism attention.
Attention, in this sense, is weighted perception. A way of assigning importance to relationships inside the input.
That is what turned language from a line into a field.
That architecture became the Transformer, and the Transformer became the foundation for much of the modern AI we use today.
Here is where it gets interesting.
The machine became impressive because it got better at perceiving relationships.
It learned that meaning does not live inside a word alone. Meaning lives in context. Meaning lives in relationship.
And if that is what made the machine look intelligent, it is worth asking what we have been calling intelligence all along.
The wrong definition
Most of us were taught to recognize intelligence by output.
The correct answer. The finished deliverable. The shipped project. The thing you could point to and say, “See? I know what I’m doing.”
Think about the early response to AI. One of the first things people said was, “It hallucinates.” It gets things wrong. It makes stuff up.
We judged the machine the same way school judged us.
Did you produce the correct output?
That is the definition of intelligence most of us were handed and never questioned. And why would we? It worked. It got us through school, got us hired, got us clients. We learned to tie our value to what we could produce.
Then something else came along that could produce.
That is the unsettling part.
The lesson is not that intelligence disappeared.
The lesson is that intelligence was never only in the output.
The missing Transformer
If intelligence is the ability to perceive relationships, then a lot of modern businesses are less intelligent than they look.
The issue is not a lack of activity.
A lot is happening.
The offer exists. The sales calls are happening. The support requests are coming in. Search data is available. Testimonials are sitting in folders. Client results are buried in old projects. Objections keep repeating. Praise keeps pointing somewhere.
The business is producing output.
But the parts are not informing each other yet.
That is the missing Transformer.
One reason this happens is borrowed intelligence.
We follow the step-by-step course. We copy the funnel. We borrow the launch sequence. We imitate what seems to be working nearby.
Those moves can help. They give us traction. They give us clues. But they do not give us the field.
Someone else’s sequence was built from someone else’s terrain — their market, their timing, their audience, their constraints.
When we imitate what is visible, we cannot see the hidden relationships that made the move work. We copy the move we can see and miss everything underneath it.
That is how a business can produce a lot of activity without becoming more intelligent.
The parts only become intelligent when they start informing each other.
An offer only means something next to a problem.
A price only means something next to risk.
A sales page only works when its language matches the buyer’s lived experience.
That is what business intelligence actually is.
Attention.
Seeing what matters in relation to everything else.
In business, attention becomes a practice.
Delivery teaches marketing. Sales reveals what the offer really is. Search shows demand. Praise points to value.
A repeated objection becomes positioning. A support request becomes product development. A search phrase becomes content strategy. A client result becomes proof.
The intelligence appears when these things stop living in separate rooms.
That is what is at stake right now.
Output is cheap. Perception is not.
AI can help surface patterns. It can generate options. It can summarize what happened, draft the page, rewrite the email, name the offer, organize the notes, and produce more variations than you could ever use.
But deciding which relationships matter here still belongs to the person leading the work.
Where is the relationship between the offer and the problem?
How does the price change once the risk is understood?
Why is the sales page missing the buyer’s lived experience?
What is the objection revealing about the promise?
What did delivery teach you that marketing still has not absorbed?
Those are not production questions.
Those are perception questions.
The machine can help you produce.
It cannot be responsible for your attention.
And maybe that is the old belief AI has finally made impossible to keep.
Output was never the whole of intelligence.
Production is becoming easier to access, easier to automate, and easier to imitate.
But perception is still the moat.
The work now is not to make more noise.
The work is to recover your attention.