A Personal How-To

Like other faculty, I often find myself explaining the process of research and how to get started to graduate students embarking on research. Different people have different views - again like other researchers, I advise my students with the best I know how.

Here is a brief presentation of that same talk that I give my students, which may be useful to prospective or continuing students; if you do read it, remember that it is a personal view. It also necessarily relates a little to my own research area, because in very different fields, the process may be different also (as in fields where empirical studies is a large component of research).

It is probably easier to view the material below if you resize your browser window to the widest possible size.

See also my tips on research reading after the presentation, or download it here.

Slide 1 image: text reads:
Introduction to the Process of Research: A Personal View.
Rudra Dutta, Computer Science, North Carolina State University.
 
Slide 2 image: text reads:
Research.
Ultimate step in self-motivated enquiry.
Increase one's own knowledge, and that of the community.
Motivation behind specific research question.
Applied - existing unsolved problem in state of practice.
Perceived hole in field of knowledge.
Expert - one who knows more about less.
To reach the frontier of knowledge, you have to specialize.
Therefore research question must be specific.
Group effort can address larger area.
As we go down the long road from first grade to Ph.D., the two basic changes are: (a) we take more and more responsibility for our own learning, instead of assuming it is on teachers and parents, and (b) more and more, the balance of our learning shifts to how to learn, rather than learning specific material.

As many people have pointed out, a funny thing happens if you extend the definition of an expert to the logical extreme after the manner of the Dirac Delta function: the best expert is somebody who knows "everything" about "nothing"! Realistically, we would stop short of this ideal, while there is still something we actually are an expert on!

Slide 3 image: text reads:
Stages of Research Project.
Literature.
Problem Definition.
Modeling.
Contribution.
Validation.
Presentation.
Archival.
The forward direction is obvious - you go through these stages successively. As the backward arrows indicate, it is often necessary to backtrack from the modeling to the problem definition or even the literture review stage. There is also a possible backtrack from the validation stage - if you find your wonderful new algorithm is 10 times worse than the well-known dumb one, you have to re-think your contribution. A particularly unpleasant backtrack is the one leading back from the archival stage, which can lead all the way back to square one, much like the biggest snake in "Snakes and Ladders". This happens when you submit your perfect paper to a conference or journal, and a reviewer points out some elementary mistake which invalidates the whole thing; or maybe they just say "this exact thing has already been published - see paper such-and-such." Unpleasant as it is, this happens on occasion; this is why we do not consider any research project finished until the results have been archived in some peer-reivewed forum.
Slide 4 image: text reads:
Literature Review.
Final stage in specialized reading/learning.
Several needs:
Learn about general research domain.
Identify open research questions.
Obtain background, knowledge of context.
Avoid unprofitable lines of enquiry.
Exhibit awareness of peer research, required for acceptance in broader community.
Reading research papers:
Understand structure of papers.
Broaden and narrow search for literature.
 
Slide 5 image: text reads:
Anatomy of a Research Paper.
Title, Abstract.
Introduction, Context, Background.
Prior work - literature survey.
Problem identification, definition, (formulation).
Problem modeling, formulation.
Approaches, solution, contribution, insight.
Validation, verification, experimentation, results.
Conclusion, speculation.
References, bibliography.
There is a line just before the "Problem modeling" part because essentially the part of the paper before that is tutorial in nature. If you read several papers on the same topic, the content of each upto this point are going to be very similar to each other. This helps in understanding a research area, and saves on the reading effort.
Slide 6 image: text reads:
Making a Reading List.
Start with broad general area.
Static traffic network design, for example.
Start with initial list of papers.
Reputed journal, recent papers, classic papers.
Narrow down.
Eliminate papers outside general area.
Expand selectively.
Keyword search on databases (with elimination).
Trace reference tree backward (identify overlaps).
Trace reference tree forward (difficult).
Find survey paper/site (shortcut, must be definitive).
Keep up.
See also the "Reading for Research" section just after this presentation.
Slide 7 image: text reads:
Problem Definition.
Describe problem precisely.
Specify context of problem.
Broad context - less articulation.
Specific context - more complete description.
Specify scope of problem.
Define aspects of the problem that are not being addressed.
Specify motivation of problem.
State research question as briefly yet as specifically as possible.
There are two basic considerations that must be honored in finding a research problem:
  1. It must represent an actual problem somewhere - a "pain point" for some practitioner of some useful activity. Otherwise you will end up with a solution looking for a problem, even if your own research is very successful. In the academia, we are comparatively relaxed about this; our goal is not to be able to affect the bottomline in six months (as it often is in the industry). Nevertheless, we must not completely lose sight of this ultimate goal of all research - especially in a College of Engineers. This is what "motivation" basically refers to.
  2. It must be something that has not yet been found, obviously. Remember the perceived "hole" in the field of knowledge. Re-inventing the wheel, no matter how nice a color your paint it, counts as development, not as invention or discovery.

A couple of techniques that can help are:

  1. Think like an engineer, not a researcher. Ask yourself: "how can I do this now? If the answer is completely available by tying together several pieces of existing knowledge and technology, there might not be a research problem there. If, on the other hand, there is a missing link in the chain, that might be your research problem.
  2. Try to describe the problem using text at various scales; try to express it in one paragraph, one page, one sentence, etc.,.
Slide 8 image: text reads:
Modeling.
Specify representation of the problem you will use.
What details will be represented, what details will be abstracted away.
What is method of representing problem aspects - set up correspondence between real world problem entities and model entities.
Notational details.
Restate problem in terms of model.
Specify if any aspect of the problem changes by the use of your representation.
 
Slide 9 image: text reads:
Contribution.
The new knowledge stage.
Usually involves the insight, inspiration, spark, aha!, lightbulb phenomenon.
Not possible to do on order, but not rare !
Never underestimate the role of perspiration.
Identify possible solution approaches.
Rule out unprofitable approaches, prioritize.
If not known previously, and is insightful enough, may constitute original contribution.
Lay a path to complete the approach.
Answers some question, original question or part thereof.
Continue! 
Sharpen the saw.
This is often the scary part. Many students back away from research because of this very thing, that the "spark" is not possible produce on order. My favorite example here is that of lawn mowing. Suppose you have to mow your lawn, sometime in the next month. You cannot mow the lawn if it is raining, or if it rained in the last 24 hours. You obviously cannot produce two successive rain-free days on order. Will you despair?

The point is that things which are not possible to produce on order may nevertheless not be rare, especially if one puts oneself in a receptive position. In research, you can do this by reading in that area, and trying out simple ideas which you can produce without any "sparks". Remember the dictum that "luck" is just opportunity meeting preparednesss. Without a lot of preparation, you will always have bad luck. But hard work is guaranteed to change your luck. Abraham Lincoln is credited as having said: "If I had eight hours to cut a tree, I would spend six of them sharpening the saw." Whenever you think you have nothing to try, sharpen the saw, don't underestimate the role of perspiration, and the lightbulb will come on, sooner or later.

Slide 10 image: text reads:
Validation.
Often the contribution consists of theoretical predictions, claims, etc..
For engineering/applied problems, real-world problem exists separate from abstract model.
Must run sanity check, some of which may also be theoretical.
At some point, you must put your implementation where your research is - experiment.
Sometimes possible to carry through on actual system/testbed.
Sometimes validation is through simulation.
Results of experimentation must be reported with scientific accuracy/rigor.
Very few of us are lucky enough to make intellectual advances that are so pure and abstract that it stands by itself in the mathematical space. Most of us engineers do things that require validation - the ultimate proof of the pudding is in the eating.

Knowledge is not useful if it is not shared. Perhaps it is not even knowledge. Also, sharing provides a necessary step in validating - no matter how much validation you yourself carry out, you could be consciously or unconsciously fooling yourself. In science, we must honor the concept of experimental reproducibility - ideally, another researcher reading your paper should be able to reproduce your experiments just from the information in your paper.

Slide 11 image: text reads:
Presentation.
Dissemination of research.
Usually aimed at peers in community.
More focus on high level view.
Motivation, definition, approach, contribution.
Some details may be appropriate, depending on audience.
Objective: to communicate your specific understanding of the research work.
Not to substitute for reading the report/paper.
Related to archival stage.
May contain unpleasant feedback to initial stages.
Attacks (in the clinical, vigorous discourse sense) must be expected in both these stages.
People don't have as much time as they like. You yourself have very little time for others. Some of the others have even less time than that for you. If you will get only 20 minutes to present the research work that took you a year, it is worth a week or two to get ready to utilize those 20 minutes as best as possible.

Remember that in a technical presentation, the audience is expected to challenge you, and you are expected to address the issue raised clinically and correctly. Do not put anything on the slides or say anything that you cannot defend. You will be challenged and tested on your understanding of the topic you are presenting. In fact, if you get absolutely no questions during the presentation or at the end, this means you have completely failed, because nobody listened to what you said.

Note: Even the most experienced speaker flounders when trying to speak accompanying slides without preparation and practice. Plan what you will say for each slide you have; practice if possible. Have speaker notes and/or the source papers handy if you think they will help.

Slide 12 image: text reads:
Archival.
Documentation of entire research project.
Report, paper.
Objective: completeness and unambiguity.
Must follow prevalent technical style.
Usually a few iterations required.
Structure: see slide 5 !
Usually goes through review process.
Job of reviewer: evaluate contribution as well as quality of exposition, play role of adversary.
Assume reviewer will misunderstand if it is possible to do so.
Job of author: make the description complete, motivating, impossible to misunderstand yet easy to understand.
Publication is often held as final validation of research.
Similar considerations as above. Language, grammar, spelling, all matter. It is not other people's privilege to read about your research - it is your privilege to have them read it. Make it as easy for them as possible, and as difficult to mis-understand as possible.
Slide 13 image: text reads:
The Role of Skepticism.
Skepticism vs. gullibility.
Keep an open mind, but not so open that your brains fall out.
Open mind - do not reject without verification.
Skepticism - do not fail to verify.
Baloney Detection Toolkit.
A scientist's skepticism aid.
Adapted from The Demon-Haunted World: Science as a candle in the dark, Carl Sagan, Random House, NY, 1995.
The last few slides, starting with this one, is a general overview on the scientific method. For a more detailed exposition, I can do no better than point to the book by Sagan that I have cited. I recommend it to anybody that has anything to do with science.

I hope this helped you. If it did, feel free to point links to it or use the material otherwise; but please do credit the source.

Thank you.

Slide 14 image: text reads:
Baloney Detection Kit - Do...
Seek independent confirmation of so-called facts.
Encourage debate on the evidence.
Authority carries no weight in argument.
Consider multiple working hypotheses.
Avoid NIH syndrome.
Quantify.
Insist on complete chain of evidence.
Apply Occam's razor.
Set predefined failure standards for hypothesis.
Slide 15 image: text reads:
Baloney Detection Kit - Avoid...
Ad hominem (to the man).
Argument from authority.
Argument from adverse consequences.
Appeal to ignorance.
Special pleading.
Begging the question.
Observational selection.
Statistics of small numbers.
Slide 16 image: text reads:
Baloney Detection Kit - Avoid...
Inconsistency.
Non sequitr (it does not follow).
Post hoc, ergo propter hoc (after, therefore because).
Confusion of correlation and causation (together, therefore because).
Meaningless question.
Excluded middle (false dichotomy).
Short term - long term.
Slippery slope.
Straw man.
Suppressed evidence.
Weasel words.

Reading Literature for Research

This is a personal view on reading research papers with the purpose of getting started with some research area. It seems to work for some of my students - feel free to use it to get started in your own reading, but know that you may have to come up with your own techniques in addition to these.

Testing Your Comprehension

(Note: For those interested: The questions below were obtained by consideration of the "Anatomy of a Research Paper" presented above, and Bloom's Taxonomy of the cognitive domain.)

When we take the effort to read a paper, we want to gain knowledge from the exercise. To test whether you have gotten your effort's worth from reading a paper, you can ask yourself the following questions - you should be able to answer these at satisfying level of details. If your answer is very general, and does not satisfy you, then you have some more reading to do. Perhaps you need to read other background material first, or just invest more effort in reading the paper itself. At times, going on to read other related papers can help, because you may understand something when it is explained by some other author in a slightly different way, or something else, perhaps a conversation with a colleague, might explain it for you. But be sure to keep track of what you have "skipped over" like this in any paper, so that you always know what you have and have not mastered in a given paper. (The annotated bibliography mentioned above is a good place to do this.) Be sure to come back to such unresolved questions, if the next few papers do not help you.

Answering these questions in sequence is a good way to develop a short precis of the paper:

As you develop answers to these questions while you are reading the paper, make sure to record them - writing in the margin with a pencil is okay, but nothing beats typing it into an annotated bibliography.

  1. What is the problem the authors address in the paper?

    • What is the domain of the problem? What is the aspect of the problem that the authors want to focus on? Why; is there a reason to believe this aspect is more or less important than others? What other papers do we know of that deal with the same problem, and how does the particular flavor or aspect of the problem in this paper compare with those?

  2. How do the authors formulate/model the problem?

    • Does it fall into any common class - graph model, ILP, etc.? Any sub-class, e.g. layered graph formulation? Is it clear why the problem is modeled this way? Is this formulation the only one or obviously the most appropriate one possible, or could it have been modeled equally easily as something else? Does the rest of the paper utilize or depend heavily upon the formulation?

  3. What is the solution proposed or result offered?

    • What is the nature of the solution - an algorithm, a formula that is derived, a mathematical proof of some assertion, something else? How can the solution approach be demonstrated on small-scale or "toy" problem instances? How easy is it to apply the solution for large or practical instances - how scalable, how quick? How do these metrics and characteristics compare with those of other solutions or solution approaches to the same or similar approaches in other papers? Does the solution use composition of various approaches? Can the solution approach be used in conjunction with or combined with other existing or conceivable solution approaches?

  4. What evidence have they provided in favor of their solution approach?

    • Has the performance of the approach been measured against absolute or relative known or provable results about the solution? Are there any guarantees of the solution obtained using the proposed approach - is it exact/optimal, or provably good, or provably probably good, or any other type of performance guarantee? Do the authors offer results of the performance of the approach, from either simulation or experiment? Was the approach implemented or realized in some realistic domain?

  5. How does the solution compare with other possible approaches?

    • Do the results show whether the proposed approach performance compares better or worse with existing ones? What other approaches should the performance be compared with? How can one obtain such a comparison? Do there appear to be any issues with the performance that have not been touched upon by the results presented?

  6. How dependable do you find the evidence advanced?

    • Are the conditions under which the results were obtained described sufficiently thoroughly? Are they repeatable? Are they realistic? What measures did the authors take to attain objectivity? How do they compare with the conditions and measures offered by other papers dealing with the same or similar problems?

Additional Resources