If you want to solve an abstract dynamic decision problem, in principle you need to know all the primitives, and everything you’ve done and observed in the past.

Think of all of this as your "state" variables.

Of course, very many of these you won’t need to keep track of explicitly: for example, if your objective is just a present discounted sum of period utilities, then your period utility and discount factor will remain fixed, so you can just keep them in the background without making explicit the dependence of your value function on these objects.

Ok, now that you’ve stopped keeping track of all the things that remain fixed, you have the stuff that changes over time (e.g. the history of actions and things you’ve observed).

Typically, your problem will have a special structure that will allow you to reduce this data into a few summary statistics. For example, if your payoff in each period depends only on your action and an unknown persistent state of the world (over which you have a prior), then the history of actions and observations will allow you to compute a posterior over that state which is a sufficient statistic for computing the optimal thing to do next.

These sufficient statistics are what you want to keep as your state variables. You want this set of state variables to be as small and simple as possible, without throwing out data you actually need to compute the optimal action in each period.