Attention Is All You Need

A lot of the conversation around AI centers on two questions:

What is this going to replace?

How do I keep from falling behind?

Those questions matter. But the question that’s been on my mind lately is different:

What is this teaching us about ourselves?

One of my first AI projects was a custom search engine. This was in 2023.

I was building it for SQSPThemes, but it opened another door. Suddenly, I was reading papers, tracing ideas, and seeing technical conversations unfold in real time.

It got me curious about the ideas underneath the tools.

The first idea that really grabbed me was embeddings.

I had used search my whole life, but this was different. Traditional search looked for matching words. Vector search looked for patterns of similarity. It could find things that belonged near each other even when they didn’t use the same language.

A phrase, a paragraph, a document, a question — all of it could be turned into an embedding. A point in a high-dimensional space.

That idea stayed with me.

Meaning could be mapped.

Similar ideas could live near each other.

A search engine could return something relevant because it understood the neighborhood of meaning, not just the exact words.

That felt like more than a technical shift. It felt like a philosophical one.

Because once I started thinking about meaning as relationship, I kept seeing it everywhere.

That thread eventually led me back to the 2017 paper Attention Is All You Need.

The breakthrough

At the time, one of the big problems in AI was machine translation.

How do you get a machine to take a sentence in one language and produce the equivalent in another?

That sounds simple until you think about what translation actually requires.

Take the word “bank.”

Next to “loan,” it points to money.

Next to “river,” it points to land.

Same word. Different meaning.

The word didn’t change. The relationship changed.

That’s how language works. A word gets its meaning from what surrounds it. To translate well, the machine has to understand those relationships.

Before the Transformer, many language models moved through language in sequence or searched for meaning in small windows. That worked, but it had a limitation. A word near the beginning of a sentence can change the meaning of a word much later, and the system could lose that relationship.

Then came Attention Is All You Need.

The paper proposed a different approach. Instead of treating language like a line, the model could treat it more like a field. Each word could be understood by weighing its relationship to the other words around it.

Some words mattered more than others. Some changed the meaning completely.

The paper called this mechanism attention.

Attention, in this sense, is weighted perception. A way of assigning importance to relationships inside the input.

That’s what turned language from a line into a field.

That architecture was the Transformer. And it became the foundation for much of the modern AI we use today.

Now here’s where it gets interesting.

The machine became impressive because it got better at perceiving relationships.

It learned that meaning doesn’t live inside a word alone.

Meaning lives in context.

Meaning lives in relationship.

And if that’s what made the machine look intelligent, it’s worth asking what we have been calling intelligence all along.

The wrong definition

Most of us were taught to recognize intelligence by output.

The correct answer.

The finished deliverable.

The shipped project.

The thing you could point to and say, “See? I know what I’m doing.”

Think about the early response to AI. The first thing people said was, “It hallucinates.” It gets things wrong. It makes stuff up.

We judged the machine the same way school judged us.

Did you produce the correct output?

That’s the definition of intelligence most of us were handed and never questioned. And why would we? It worked. It got us through school, got us hired, got us clients.

We carried that training with us. We learned to tie our value to what we could produce.

Then something else came along that could produce.

That is the unsettling part.

The lesson is not that intelligence disappeared.

The lesson is that intelligence was never only in the output.

The missing Transformer

A lot of our businesses are full of parts that don’t relate.

The offer lives in one room.

Sales calls live in another.

Support requests live somewhere else.

Search data, objections, testimonials, client results — all of it exists.

But none of it is informing the rest.

The business is producing output. But it’s not yet producing intelligence.

That’s the missing Transformer.

One reason this happens is borrowed intelligence.

We follow the step-by-step course. We copy the funnel. We borrow the launch sequence. We imitate what seems to be working nearby.

Those moves can help. They give us traction. They give us clues.

They don’t give us the field.

Someone else’s sequence was built from someone else’s terrain — their market, their timing, their audience, their constraints.

When we imitate what’s visible, we can’t see the hidden relationships that made the move work.

We copy the move we can see and miss everything underneath it.

That’s how a business can produce a lot of activity without becoming more intelligent.

The parts only become intelligent when they start informing each other.

An offer only means something next to a problem.

A price only means something next to risk.

A sales page only works when its language matches the buyer’s lived experience.

That’s what business intelligence actually is.

Attention.

Seeing what matters in relation to everything else.

And in business, attention becomes a practice.

Delivery teaches marketing.

Sales reveals what the offer really is.

Search shows demand.

Praise points to value.

A repeated objection becomes positioning. A support request becomes product development. A search phrase becomes content strategy. A client result becomes proof.

The intelligence appears when these things stop living in separate rooms.

That’s what’s at stake right now.

Output is cheap. Perception isn’t.

It can’t be fully outsourced because it depends on your specific context — your market, your buyers, your history, your constraints.

Where is the relationship between the offer and the problem?

How does the price change once you understand the risk?

Why is the sales page missing the buyer’s lived experience?

What is the objection revealing about the promise?

What did delivery teach you that marketing still hasn’t absorbed?

AI can surface patterns and generate options.

But deciding which relationships matter here is still your job.

The machine can help you produce.

It cannot be responsible for your perception.

The better question is what old belief AI has already made impossible to keep.

Output was never the whole of intelligence.

The machine learned to attend to relationships in language.

A business compounds when the person leading it learns to attend to relationships in context.

Omari Harebin

Omari Harebin is the founder of SQSPThemes.com — a curated hub of tools, templates, and mentorship for Squarespace designers and developers. With over a decade in the ecosystem and nearly $2M in digital product sales, he helps creatives turn client work into scalable assets and more freedom in their business.

https://www.sqspthemes.com
Previous
Previous

You Can’t Design a Web You Can’t See