Stress testing is an important task in software testing, which examines the behavior of a program under a heavy load. Symbolic execution is a useful tool to find out the worst-case input values for the stress testing. However, symbolic execution does not scale to a large program, since the number of paths to search grows exponentially with an input size. So far, such a scalability issue has been mostly managed by pruning out unpromising paths in the middle of searching based on heuristics, but this kind of work easily eliminates the true worst case as well, providing sub-optimal one only. Another way to achieve scalability is to learn a branching policy of worst-case complexity from small scale tests and apply it to a large scale. However, use cases of such a method are restricted to programs whose worst-case branching policy has a simple pattern. To address such limitations, we propose PySE that uses symbolic execution to collect the behaviors of a given branching policy, and updates the policy using a reinforcement learning approach through multiple executions. PySE's branching policy keeps evolving in a way that the length of an execution path increases in the long term, and ultimately reaches the worst-case complexity. PySE can also learn the worst-case branching policy of a complex or irregular pattern, using an artificial neural network in a fully automatic way. Experiment results demonstrate that PySE can effectively find a path of worst-case complexity for various Python benchmark programs and scales.