This paper proposes a method of speaker estimation using two potential maps, human position and sound position, generated from a wide angle camera and superdirective microphones on the ceiling. When we observe natural conversation around the room for long periods, the sensor system should be constructed as a part of the room. Therefore, our method is adaptive for such situation. Using potential maps makes it able to estimate speaker position using sound data with some noise. The result of comparing speakers estimated by our method with manually generated correct speakers of an actual conversation shows an effectiveness of our method.