Jump to content

Working with a really stubborn classmate.


Recommended Posts

So we had to carry out this field survey, and we had a sample size of 291, which is decent, and a response rate of 61% - which is really good.

 

Then we're getting the data together, and my classmate who put herself in charge is really really annoying. Anyone who's done stats before knows in statistical analysis of data, you can stratify the results rather than excluding the data altogether. You set the exclusion criteria at the start, and then work with what you got. So we are studying the pacific island population and she wanted to exclude all the people who are non-pacific island ethnicity from the study altogether, when in reality we can just separate them into another group and analyse the data in parallel to that of the PI population.

 

BUT SHE DOESN'T UNDERSTAND!!! So now she's cutting into our sample size, which is the most important thing determining the power of the study!! And waht's more, she is inflexible as hell. For some reason she hasn't read enough statistics to understand what i'm talking about, and she isn't open to new ideas. I really want to choke her because what she's doing is turning a really good study into a cr@p one because she is going to take out about 40 ppl from our 178 who responded (who are non-PI), which leaves us with 130 ppl who can enter the analysis.

 

My god I want to choke her to death right now.

Link to comment

Just a suggestion, but it's probably not worth choking her to death over a stats assignment.

 

Excluding the non-natives might actually increase your power if only the natives show the effect you're interested in. Do an omnibus first and see if you find a main effect of population (native, non-native) to see whether this is worth fighting about. If there is a main effect of population, you can do separate analyses on each group. If not, keep them together.

 

This isn't life or death. Just run the numbers in SPSS or whatever you're using and see what you find. Part of stats is figuring out how different techniques yield different results (and whether those techniques are defensible). Treat this as a learning opportunity.

Link to comment

I know right? "Part of stats is figuring out how different techniques yield different results and whether those techniques are defensible".

 

She seems fixated on excluding 47 (~1/3 of our sample size) ppl from the data even before we analyse it. That to me is losing valuable information before we even begin. Like you said, "you can do separate analyses on each group" - but she's trying to exclude the other groups altogether. I don't see why we couldn't have a table which stratified the ethnic groups into separate columns. WTH?

Link to comment
I know right? "Part of stats is figuring out how different techniques yield different results and whether those techniques are defensible".

 

She seems fixated on excluding 47 (~1/3 of our sample size) ppl from the data even before we analyse it. That to me is losing valuable information before we even begin. Like you said, "you can do separate analyses on each group" - but she's trying to exclude the other groups altogether. I don't see why we couldn't have a table which stratified the ethnic groups into separate columns. WTH?

 

Do you need her permission to run the numbers yourself? Then you can discuss them with her and show her what you'd like to do.

 

In your corner.

Link to comment
Do you need her permission to run the numbers yourself? Then you can discuss them with her and show her what you'd like to do.

 

In your corner.

 

 

Yeah, I would suggest you do the numbers yourself and discuss with her. She may have self-appointed herself as leader but you are equals and you need to protect your grade. So do it your way, come up with the answer your way and make sure you also present it that way to the teacher to ensure that you get credit.

Link to comment

I'll also endorse the suggestion of doing it yourself. If you've got the data already, it will literally only take ten minutes, so what's the big deal? In terms of the data analysis approach, I would make three observations

 

(1) Data analysis is not, and almost never should be, a fishing trip. You should not just keep cheerfully running tests using all sorts of different exclusion criteria or variable setups until you find something significant. It happens a lot in practice, but that doesn't make it principled or useful.

 

(2) Excluding data should only be done according to a priori criteria, which essentially fall into two categories. One is the removal of outliers, which must be defined in advance (most commonly outside three standard deviations from the mean for univariate data; for multivariate data the situation is a bit more complicated). The other is identification of contrasts of interest, which should also be chosen before you look at the data. It is in this category that the question of whether or not to include the non-PI people at all falls. Whether or not to include them is primarily a question of what you're interested in, which you haven't said, so it's not possible to judge on the basis of what you've written.

 

(3) Where possible, knowledge about the data set, including the PI and non-PI labels for each participant, should be included as factors or covariates in the analysis. In your case, assuming that the non-PI people are relevant in some way (see (2) above), then the obvious thing would be to include them and set up a dummy variable that codes for PI and non-PI. You can then either run separate analyses on the two groups, as you suggest, or even include it as a factor within a single analysis (which would actually be a better way of doing it if whatever you're measuring is of interest regardless of PI or non-PI status).

Link to comment

Wow! Thank you guys for the very helpful advices! I've never posted academically related stuff on ENA before - this is such an awesome source of information!

 

I've calmed down a bit after fuming all afternoon. Went for my daily run and felt a lot better after that

 

Very true Karvala. We should consider the aim of the study before deciding whether to exclude data or not.

 

The aim of the study was to determine the health perception of PI ppl (eg. how do they think their daily function is limited by physical or emotional health), and to correlate this to their basic health parameters - BP, BMI, Waist circumference, height, weight, and blood sugar level.

 

The problem is, we ran this project in conjunction with a general practice organisation, who did an extremely poor job of collecting basic health parameters (

 

So I am in a position of dilemma because on one hand our questionnaire has a response rate of >60%, we are suppose to correlate our data with their health screen data with

 

So it led me to certain thoughts such as ditching the correlation concept altogether, and re-directing the research aim to purely defining the health parameters of the PI population (which is equally valid since we used a standard SF-12 Questionnaire).

 

The reason why I can't accept at 25% response rate is because we're hoping to publish and any self-respecting journal isn't going to accept a study with a 25% response rate - unless we're talking about the journal of obscurity or something (when, for the questionnaire part we achieved a response rate of 60% - which is good enough for BMJ).

 

I wouldn't feel there would be any problem with publishing a purely descriptive questionnaire crossectional data set. Describing the health perception of an ethnic minority is important in itself.

 

So the aim of the study would ideally be - defining the health perception within the PI population.

 

We used consecutive non-random sampling at a PI sports day, and our only exclusion criterion was ppl 18 and turned up to the sports day and filled the questionnaire.

 

So really excluding this group of ppl serves exactly the same function as stratifying these ppl into another group, except that when we exclude the data altogether the sample size and hence power of study would decrease. I mean, how are we supposed to write this in a paper, "Out of 178 ppl who responded we excluded 47 because they were not PIs." Dropping from 178 to 139 turns this from a good study to a pilot study! We would have more data to work with, and more information if we were to analyse these guys separately but not exclude them. I mean, n=47 is a good control/comparison sample! Eg. "Compared to Europeans, PI generally perceived their health to be worse". This would also answer the question of how PIs perceive their health status, but in addition it would be in comparison to the European, which is what is politically interesting anyway.

Link to comment

I second Karvala on fishing expeditions.

 

WRT the correlation: I don't do Epi or public health research, but in my field a response rate of 25% would be a pretty good thing. That's why we pay our subjects a small amount--a token of appreciation can improve response rates tremendously.

 

You don't say what your hypothesis is.

 

You know...you'll have to do descriptive statistics anyway, regardless of whether you then correlate perception to actual health. Pragmatically, then, this is initially an issue of whether to include the non-natives in your database for later posthoc comparisons. My rule of thumb is that data are data. You may not use all your data (and HOW you use your data will be a function of many things!), but I see no reason to throw anything out. Posthoc findings make great jumping off points for future research. Of course, it will take a lot of time to enter all this data into your database, which is why assume you have not just already done these analyses and are instead grumbling about your partner.

 

Some issues that I see right away with the proposed correlation:

 

--Do the basic health parameters of the 25% who chose to respond to the "general practice organization" represent those of the larger population, or are they different than the non-responders in some way--i.e., higher SES, better access to services like, I don't know...a post office and a doctor's office.

 

--Presumably the basic health info is from the native population, not the non-natives. So if you try to including non-natives as an unplanned control group in THIS part of the study, you'll have to collect their basic health info as well--but "non-native" is quite an underspecified population! It will be hard to find this information with such a diverse group.

 

If you find that the correlational results are not interesting, then I would definitely stick with your descriptives. Incorporating the non-natives after the fact will require that you use a posthoc test--probably just a Tukey's.

 

Here's a big question, though: why would you think that a group of people (natives or otherwise) who turn out at "sports day" would be representative of the larger population? I don't know what "sports day" is, but it sounds...sporty. If I wanted to study the health perceptions of Americans, I'd want to make sure my survey reached as wide a population as possible, and I'd make sure to collect information on their socioeconomic status, education levels, access to fitness, access to health care, amount of time spent working, doing sports, doing sedentary activities, etc.

Link to comment

It sounds like there are lots of experienced ppl on ENA with stats!

 

Yes I was able to get a complete set of data from the girl in charge, but that was AFTER I asked her for it. What we did was had everyone in the group enter a portion of the data, and send it to her who hashed it all together. But instead of giving us all the completed database, she excluded 1/4 of the population and sent us what she called the "Clean data" - that is, after filtering out the people she assumed to be non-relevant, ie. the non-PIs, and then sent us the filtered results. I mean, it's not like we're google in China.

 

And the other question - the sports day, yes this would be the biggest limitation of our study. Our sample would not be representative of the general PI nor non-PI population because it is a sports day. (We didn't get to choose the event at which we did this project - our supervisor gave this to us). Hence there was no point doing randomised sampling - we opted for non-randomised consecutive sampling to max out the sample size instead. However, it would appear that if anything our results would be skewed to the healthy side - eg. only 10% of the PIs smoke in our population - so if we do find any suboptimal perception of health, we could probably reasonably assume that the rest of the population would be worse-off. Eg. only ~50% of the PIs at our sports day actually percieve their daily function to not be limited by physical health. Which would probably mean this figure in the rest of the PI population would be lower.

 

Post Hoc analysis sounds viable.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...